Forecasting Legends: Building Classification Models to Predict MLB Hall of Famers (Part 2)
In Part 2 of this 7-part series, I examine the Hall of Fame prospects of MLB players (hitters only) after their first seven seasons.
Introduction
Welcome back to my series, "Forecasting Legends: Building Classification Models to Predict MLB Hall of Famers." In the first part of this series, I delved into the fascinating realm of a player's first five years in the league, meticulously analyzing their probability of ascending into the illustrious Hall of Fame. But now, I am embarking on an even more thrilling phase, where I stretch the timeline to encompass an additional two seasons. By doing so, I aim to uncover deeper insights and further refine my predictions as I continue my quest to forecast the legends of the game.
In the second part of this series, I integrate a new set of classification models that aim to examine a fresh dataset spanning the first seven years of a player's career. My goal is to uncover new insights, amplifying my predictive prowess as I unravel the secrets behind a player's journey to the hallowed grounds of the Hall of Fame.
If you happened to miss any of the articles in this series or would simply like a refresher on how this series works, I invite you to click on any of the links below:
o First 7 Years
Model 2- First 7 Years
Results
Group 1 included 5,509 records while Group 2 consisted of 1,231 records. After partitioning Group 1, a new dataset of 2,919 records was created (146 of which were placed in the training dataset while 2,773 were placed in the validation set). This data was fed through a Logistic Regression classification model as it proved to be the most accurate model of all the ones that were tested. The training set produced an accuracy of 92.47% (135/146) while the validation set produced an accuracy of 92.03% (2,552/2,773). The area under the curve of the validation set is 0.95812.
Group 1
When predicting whether a player would make it into the Hall of Fame, this model does a better job of predicting those that did not end up in the Hall of Fame (based on the validation set). Of the 2,669 players in Group 1 that are not in the Hall of Fame, the model misclassified 206 of them (7.63% error rate).
An example of one of these misclassifications is Raul Mondesi. Over the first seven years of his career, Mondesi produced a WAR of 21.6 (T-248th among players in Group 1). During this time, he posted a slash line of .288/.334/.504 to go along with 163 HRs and 518 RBIs. He also appeared in one All-Star Game and was awarded Rookie of the Year. At the time, this model suggested that he had a 92.87% chance of getting into the Hall of Fame. Over the last six seasons of his career, he played with six different teams and struggled to recapture the success he had early on in his career. In that stretch, he accumulated 7.9 WAR while posting a slash line of .251/.326/.456 and adding 108 HRs and 342 RBIs. At the conclusion of his 13-year career, Mondesi had a 53.22% chance of getting into the Hall of Fame (a 42.69% decrease from the point his seventh season ended). When Mondesi first became eligible for the Hall of Fame in 2011, he failed to capture any votes leading to his removal from the ballot the next year.
On the flip side, of the 74 players in Group 1 that are in the Hall of Fame, 15 were misclassified (20.27%). One example is David Ortiz. In his first seven seasons, he produced a WAR of 5.9 (T-1,313th among players in Group 1) while posting a slash line of .271/.353/.491. He added 89 HRs and 339 RBIs. At the time, this model suggested he had a 12.41% chance of getting into the Hall of Fame. However, his career took off once he joined the Boston Red Sox in 2003 (his seventh season). In his time with the Red Sox (14 seasons), Ortiz accumulated 52.7 WAR with a slash line of .290/.386/.570. He also added 10 All-Star Game appearances and seven Silver Sluggers during this time. When his career was all said and done after 20 illustrious seasons, he had a 99.50% chance of getting into the Hall of Fame (a 701.64% increase from the point his seventh season ended). Ortiz was inducted into the Hall of Fame in 2022 (his first year on the ballot) when he received 77.9% of the vote.
Group 2
This model predicts that 213 players (17.30%) will make it into the Hall of Fame based on their first seven seasons. An example of someone who is no longer active is Ryan Braun. In his first seven seasons, he accumulated 34.8 WAR (15th among players in Group 2). To go along with that, he posted a slash line of .312/.374/.564 with 211 HRs and 681 RBIs. He was also awarded an MVP and Rookie of the Year. This model suggested that he had a 99.76% chance of making it to the Hall of Fame at the time. In his final seven seasons, he accumulated 12.3 WAR to go along with a slash line of .276/.338/.492. While Braun had a great start to his career, his drop-off in performance and PED usage will hurt his chances of getting in. Currently, he has a 93.41% chance of getting into the Hall of Fame (a 6.37% decrease from the time his seventh season ended). Braun will first be eligible for Hall of Fame consideration in 2026.
An example of a player who is active is Aaron Judge (currently of the New York Yankees). He completed his seventh season in 2022. He accumulated 37.0 WAR (11th among players in Group 2) to go along with a slash line of .284/.394/.583 in this time. He also added 220 HRs and 497 RBIs. In addition, he had the honor of taking home both an MVP and Rookie of the Year. This model suggested that Judge had a 99.30% chance of getting into the Hall of Fame based on his first seven seasons of work. But, if his career were to end today, he would have a 71.77% chance of getting in (he wouldn’t even be eligible however).
This model also predicts that 1,018 players (82.70%) will not make it into the Hall of Fame. One of those players who is no longer active is Brandon Phillips. Through seven seasons, Phillips accumulated 7.0 WAR (397th among players in Group 2) with a slash line of .262/.308/.425. He added 74 HRs and 285 RBIs. At that point, this model gave Phillips a 23.36% chance of making it into the Hall of Fame. He would go on to accumulate 24.5 WAR while producing a slash line of .279/.323/.421 in his final 10 seasons. He would also add three Gold Gloves and three All-Star Game appearances in that stretch. Currently, Phillips has a 61.85% chance of getting into the Hall of Fame (a 164.79% increase from the time his seventh season ended). He will be first eligible in 2024.
What about active players? Teoscar Hernandez (currently with the Seattle Mariners) completed his seventh season in 2022. In that time, he accumulated 10.7 WAR (T-277th among players in Group 2) while posting a .262/.319/.499 slash line with 133 HRs and 380 RBIs. This model suggested that Hernandez had a 40.64% chance of getting into the Hall of Fame based on his first seven seasons of work. But, if his career were to end today, he would have a 1.87% chance of getting in (he wouldn’t even be eligible, however).
HOF Probabilities from Year 5 to Year 7
As we witnessed in previous installments of this series, certain players embarked on a swift trajectory toward the Hall of Fame, while others encountered significant challenges in solidifying their presence in the major leagues. To gain a deeper understanding of the players who bolstered their chances of entering the Hall of Fame and those who faced hurdles along the way, I will analyze and compare their probabilities after five seasons to their probabilities after seven seasons.
Risers
Over the course of his first five seasons, Jose Bautista spent time with four different teams and struggled to establish himself as an MLB player. In this time, he accumulated -2.9 WAR while posting a slash line of .239/.325/.398 with 46 HRs and 171 RBIs. His odds of getting into the Hall of Fame at this point were slim (1.51%). But something clicked when he joined the Toronto Blue Jays. In his second full season will the team (his seventh season overall), he had a monster year where he posted a slash line of .260/.378/.617 with 54 HRs (league leader) and 124 RBIs. He was also honored at his first All-Star Game and would go on to finish fourth in MVP voting. At the conclusion of his seventh season, Bautista would have a 27.92% chance of getting into the Hall of Fame (a 1,750.01% increase from the point his fifth season ended). Bautista retired in 2018 and currently has an 81.91% chance of getting into the Hall of Fame. He will be first eligible in 2024.
Fallers
Through his first five seasons, Jose Lind averaged 1.5 WAR (slightly below an average player) while posting a slash line of .259/.303/.329 with 8 HRs and 210 RBIs. At this point, Lind had a 66.75% chance of getting into the Hall of Fame. In his next two seasons, he accumulated -2.7 WAR while posting a slash line of .241/.273/.278 with 0 HRs and 76 RBIs. His production had taken a hit in this small window of time. His odds of making it into the Hall of Fame were now at 8.50% (an 87.27% decrease from the point his fifth season ended). When Lind’s career ended in 1995, his odds of getting into the Hall of Famer were at 6.62%. He was never eligible for consideration as his career only spanned nine seasons.