Optimizing MLB Lineups w/ Plane Points

A data-driven approach to constructing the best possible lineups...

Jan 17, 2025

MLB lineups have evolved significantly over the years. Traditionally, teams prioritized placing speedy players at the top of the order, followed by batters known for their ability to advance runners with precise bat control. This often meant that the team’s best hitters didn’t appear until the 3rd or 4th spot. Today, there is a more deliberate effort to maximize the number of plate appearances for top players, leading to many teams placing their best hitters in the 2nd or even the leadoff spot. Despite this trend, lineup construction still varies widely across managers. Some rely heavily on advanced data to optimize their lineups, while others blend analytics with more traditional approaches when filling out their lineup card.

For this exercise, I utilized the data from my scoring system, Plane Points, to construct optimized lineups for all 30 MLB teams under different scenarios. If you're unfamiliar with Plane Points, click here for a quick refresher. Instead of creating a single lineup for each team, I developed two lineups: one tailored for facing left-handed pitchers (LHPs) and another for right-handed pitchers (RHPs). Additionally, I took it a step further by experimenting with constructing lineups designed to counter individual pitchers.

For this exercise, we are assuming the 2024 MLB regular season is ongoing, with the current date set to October 1st, 2024. The goal is to construct lineups for each of the 30 teams, based on the assumption that all teams are playing on this date. To maintain accuracy, only players who are on their team’s roster as of October 1st, 2024, have been included in the lineup construction

How Does it Work?

This lineup optimizer is built around four core components: Recent/Overall Performance, On-Base Ability, Power/Clutch, and Splits. Players on each team are assessed and ranked against one another within these components based on various attributes, using a weighted scoring system. Points are awarded to players according to their relative rankings within their team for each attribute.

Following the ranking process, each player is assigned a composite score—one for matchups against RHPs and another for matchups against LHPs. This score is calculated by summing the points earned in the Recent/Overall Performance and Splits sections. The other two components, On-Base Ability and Power/Clutch, are exclusively used to optimize player placement within the batting order and do not factor into the composite score.

Each component and its associated attributes are weighted differently to reflect their importance in constructing the optimal lineup. While most measurements are based on Plane Points per Game (PP/G), plate appearances (PA) are also factored into certain evaluations. Notably, for the On-Base Ability and Power/Clutch components, attributes are evaluated solely based on the PP/G players scored within those specific attributes. The detailed weights assigned to each attribute within each component are outlined below:

Recent/Overall Performance

In this component, I assessed each player’s performance over four distinct time frames: the last 7 days, the last 14 days, the current season, and the past three years. Each time frame was assigned a different weight, reflecting its relative importance. To account for small sample sizes, I incorporated the number of plate appearances (PA) each player had during each time frame. As a result, players earned points not only for their performance in terms of PP/G but also for their plate appearances. For example, here’s how Tigers outfielder Riley Greene performed in this component:

On-Base Ability

In this component, I evaluated each player’s performance during the current season and over the past three years across seven key attributes from my scoring system that reflect a hitter’s ability to get on base while working the count. These attributes include: Productive PA (More Than 4 Pitches), Start Ahead and Finish Ahead in Count (Productive PA), Start Behind and Finish Ahead in Count (Productive PA), Walks, Start Behind and Finish Even in Count (Productive PA), Ahead in Count, and Start Ahead in Count.

After applying the relevant weights to each category, each player receives a total for this component. While this total does not contribute to a player’s composite score, it plays a crucial role in determining the leadoff hitter. Continuing with the example of Riley Greene, here’s how he performed across these attributes in this component:

Power/Clutch

This component focuses on identifying which players should be positioned in the middle of the batting order. Players are evaluated based on their performance during the season and over the past three years across five key attributes that highlight their ability to drive the ball and perform in critical situations. These attributes include: Productive PA with RISP, Runs Generated, Home Runs, Doubles, and Triples.

After applying the appropriate weights to each attribute, each player receives a total score for this component. These scores are then used to determine the placement of players in the 4th and 5th spots of the batting order. Continuing with Riley Greene as an example, here’s how he performed across these attributes in this component:

Splits

This final component plays a critical role in crafting pitcher-specific lineups. It allows for a deeper understanding of player performance in a variety of situations, including against pitchers of different handedness, the specific starting pitcher on a given night, similar pitchers to the one on the mound, and the opposing team as a whole. To assist with this analysis, I used Baseball Savant's Similarity Scores, which consider pitch speed and movement, to identify the 10 starting pitchers most similar to the one starting that night.

As always, a set of weights is applied to evaluate the significance of each attribute. Splits against pitchers of different handedness are factored into all lineups, while more detailed splits against specific teams and pitchers are used for pitcher-specific lineups. For example, a player’s PP/G performance against the Cleveland Guardians reflects his entire career performance up to this point.

Continuing with the Riley Greene example, let’s assume the Tigers are facing RHP Gavin Williams of the Cleveland Guardians. According to Baseball Savant, the 10 pitchers most similar to Gavin Williams in 2024 include Yoshinobu Yamamoto, Gerrit Cole, Bobby Miller, Jared Jones, Tylor Megill, Hunter Brown, Shane Baz, Sean Burke, Jon Gray, and Taj Bradley. Using these splits against a specific RHP, below is a summary of the points Riley Greene earns in this component.

For more pitcher-specific splits, I considered a player’s career-long performance, while for more general splits, I focused on his performance during the current season and the past three seasons.

Building the Lineups

Now that we’ve discussed the components that shape the lineup, it’s time to apply the numbers and construct a lineup. We’ll continue using the Detroit Tigers as an example. For this scenario, we’ll assume the Tigers are facing Gavin Williams and the Cleveland Guardians. I will provide two lineups: one for general matchups against RHPs and one specifically designed for facing Gavin Williams.

The first step in creating the lineup is identifying the players with the two highest composite scores against RHPs. These players will be placed in the 2nd and 3rd spots in the lineup, respectively. To demonstrate how this process works, I will outline the composite and component scores for each Tigers player to construct the general lineup against RHPs.

As shown in the table above, Riley Greene and Kerry Carpenter will be placed in the 2nd and 3rd spots in the lineup, respectively. Greene and Carpenter will also occupy the left field (LF) and right field (RF) positions, as these are their primary positions and remain unfilled.

The next position to be filled is the leadoff spot. To determine this, we select the player with the highest point total in the “On-Base Ability” component, provided they haven’t already been assigned a spot in the lineup. Below is a list of each Tigers player and the number of points they earned in this component.

As seen in the table above, Riley Greene leads all Tigers players in the “On-Base Ability” component. However, since Greene is already filling a spot in the lineup, we move down to the next player, Spencer Torkelson. He will take over first base and the leadoff spot in the lineup.

Next, we need to fill the 4th and 5th spots in the lineup. To do this, we refer to the “Power/Clutch” component. Below is a table showing how each Tigers player ranks in this component.

As shown in the table above, Riley Greene and Kerry Carpenter lead all Tigers players in the “Power/Clutch” component. However, since they already occupy spots in the lineup, we move to the next highest-ranked players, Spencer Torkelson, Trey Sweeney, and Matt Vierling. Torkelson is already slotted into the leadoff spot, so Sweeney and Vierling will take the 4th and 5th spots in the lineup, respectively. Sweeney will play shortstop (SS) and Vierling will play center field (CF).

To complete the rest of the lineup (6th-9th), we’ll select the remaining players with the highest composite scores from those who haven't been placed in the lineup yet. Below is a table showing the Tigers players ranked by their composite scores.

As shown in the table above, Wenceel Pérez (DH), Colt Keith (2B), Jace Jung (3B), and Dillon Dingler (C) occupy the remaining spots in the batting order.

Below, you'll find the lineup we just constructed, followed by a more specific lineup tailored for facing Gavin Williams.

As you can see, there are a few slight adjustments in the lineup tailored specifically for facing Gavin Williams. Colt Keith now takes the spot over Wenceel Pérez, due to his greater success not only against the Cleveland Guardians but also specifically against Gavin Williams. Dillon Dingler moves ahead of Jung primarily because Dingler has faced the Guardians before, while Jung has yet to do so.

Before I present the lineups my system generated for each of the 30 MLB teams against both RHPs and LHPs, I would like to briefly walk you through how I utilized Python to efficiently produce these lineups. Below is the code I used to generate the lineup for the Detroit Tigers when facing RHPs.

#Tigers vs RHPs
import pandas as pd

# Load your data (replace 'Lineup Optimizer - Plane Points.xlsx' with actual path)
data = pd.read_excel('/content/Lineup Optimizer - Plane Points.xlsx', sheet_name="Tigers vs Guardians")

# Step 1: Filter and sort for the highest composite scores for batting order spots 2 and 3
max_composite_score = data['R_Composite_Score'].max()
composite_candidates = data[data['R_Composite_Score'] == max_composite_score]

# Tie-breaking for second batter
second_batter = composite_candidates.loc[composite_candidates['PP/GS'].idxmax()]

# Remove selected player
data = data.drop(second_batter.name).reset_index(drop=True)

# Recompute for third batter
max_composite_score = data['R_Composite_Score'].max()
composite_candidates = data[data['R_Composite_Score'] == max_composite_score]

# Tie-breaking for third batter
third_batter = composite_candidates.loc[composite_candidates['PP/GS'].idxmax()]

# Step 2: Remove selected players and find the player with the highest PPM4 + SAFAC + SBFAC + WALK for leadoff
remaining_data = data.drop(third_batter.name).reset_index(drop=True)
remaining_data['Sum_OB'] = remaining_data[['PPM4_P', 'SAFACP_P', 'SBFACP_P', 'Walk_P',
                                           'SBFECP_P', 'AIC_P','SAIC_P','PPM4_P_3', 'SAFACP_P_3', 'SBFACP_P_3', 'Walk_P_3',
                                           'SBFECP_P_3', 'AIC_P_3','SAIC_P_3', 'PA_14W', 'PA_7W', 'PA_SW', 'PA_3W'
                                            ]].sum(axis=1)

max_sum_ob = remaining_data['Sum_OB'].max()
ob_candidates = remaining_data[remaining_data['Sum_OB'] == max_sum_ob]

# Tie-breaking for first batter
first_batter = ob_candidates.loc[ob_candidates['PP/GS'].idxmax()]

# Step 3: Remove the selected player and find players with the highest and second highest PPAWR + RGEN for spots 4 and 5
remaining_data = remaining_data.drop(first_batter.name).reset_index(drop=True)
remaining_data['Sum_MO'] = remaining_data[['PPAWR_P', 'RGEN_P', '2B_P', '3B_P', 'HR_P','PPAWR_P_3', 'RGEN_P_3', '2B_P_3', '3B_P_3', 'HR_P_3', 'PA_14W', 'PA_7W', 'PA_SW', 'PA_3W'
                                           ]].sum(axis=1)

max_sum_mo = remaining_data['Sum_MO'].max()
mo_candidates = remaining_data[remaining_data['Sum_MO'] == max_sum_mo]

# Tie-breaking for fourth batter
fourth_batter = mo_candidates.loc[mo_candidates['PP/GS'].idxmax()]

# Remove selected player for fifth batter
remaining_data = remaining_data.drop(fourth_batter.name).reset_index(drop=True)
max_sum_mo = remaining_data['Sum_MO'].max()
mo_candidates = remaining_data[remaining_data['Sum_MO'] == max_sum_mo]

# Tie-breaking for fifth batter
fifth_batter = mo_candidates.loc[mo_candidates['PP/GS'].idxmax()]

# Step 4: Remove selected players and sort remaining by composite score (highest first)
remaining_data = remaining_data.drop(fifth_batter.name).reset_index(drop=True)

# Sorting for spots 6-9 with tie-breaking
remaining_data['Sort_Key'] = remaining_data['R_Composite_Score']
sorted_candidates = remaining_data.sort_values(by=['Sort_Key', 'PP/GS'], ascending=[False, False])

# Step 5: Define the required positions (C, 1B, 2B, 3B, SS, LF, CF, RF, DH)
required_positions = {"C", "1B", "2B", "3B", "SS", "LF", "CF", "RF", "DH"}
used_positions = set()

# Function to assign the first available position for a player (ensuring no duplicates)
def assign_position(player_positions):
    for position in player_positions:
        if position not in used_positions and position in required_positions:
            used_positions.add(position)
            return position
    return None  # Should not happen due to constraints

# Step 6: Assign positions for players in spots 1-5 (ensure each gets a unique position)
lineup = []

for player in [first_batter, second_batter, third_batter, fourth_batter, fifth_batter]:
    player_positions = player['Position'].split(", ")  # List of eligible positions
    assigned_position = assign_position(player_positions)
    player['Assigned_Position'] = assigned_position
    lineup.append(player)

# Step 7: Assign positions for players in spots 6-9 (based on sorted composite scores and positional eligibility)
assigned_players = []

for _, player in sorted_candidates.iterrows():
    player_positions = player['Position'].split(", ")
    assigned_position = assign_position(player_positions)

    if assigned_position:
        player['Assigned_Position'] = assigned_position
        assigned_players.append(player)

    if len(assigned_players) == 4:  # Stop once spots 6-9 are filled
        break

# Step 8: Combine players for spots 1-9
lineup += assigned_players

# Step 9: Create final lineup DataFrame
lineup_df = pd.DataFrame(lineup)
lineup_df['Position'] = lineup_df['Assigned_Position']

# Display the final lineup
print("Optimal Lineup:")
print(lineup_df[['Player', 'R_Composite_Score', 'Position']])

# Step 10: Verify all required positions are filled
print("\nUsed positions:")
print(sorted(used_positions))

# Check if any required position is missing
missing_positions = required_positions - used_positions
if missing_positions:
    print(f"\nMissing positions: {missing_positions}")
else:
    print("\nAll required positions are filled!")

This process mirrors what I outlined above, so I won't go into the details again, but I wanted to clarify some additional checks in the code. In case of a tie in composite scores, the player with the higher PP/G during the current season is given priority. Additionally, the code ensures that no player is placed in multiple spots within the lineup, and no two players occupy the same position. Each player has a set of possible positions they can play, and if a player with a higher composite score cannot fill a needed position, the code moves to the next available player who can. This process continues until all nine lineup spots and positions are filled according to these standards.

Now, without further delay, here are the completed lineups for all 30 MLB teams, for both RHPs and LHPs.

VS. RHPS

VS. LHPS

As a reminder, any MLB player who was not on their active roster at the close of the 2024 regular season has been excluded from these lineups, including high-profile players such as Mike Trout, Austin Riley, Ronald Acuña Jr., and others. While this article doesn’t cover specific pitcher matchups (e.g., Chris Sale vs. the Phillies or Justin Verlander against the Rangers), I am happy to provide any lineup (using 2024 season rosters and data) upon request.

Looking ahead to the 2025 MLB season, I plan to release updated lineups for all 30 teams once the final rosters are solidified. In addition, I will provide detailed pitcher-specific matchups throughout the season, focusing on some of the more high-profile games. I welcome any feedback or suggestions, so please feel free to leave comments with ideas for further clarification or ways to improve this process. Thank you for following along, and I look forward to sharing more insights once the 2025 season begins!

Unleashing Statistical Marvels in MLB's Data Universe

Discussion about this post