Forecasting Legends: Building Classification Models to Predict MLB Hall of Famers (Part 7)
In Part 7 of this 7-part series, I review the models created in this series and look ahead to the 2024 Hall of Fame Ballot.
Welcome to part seven of “Forecasting Legends: Building Classification Models to Predict MLB Hall of Famers.” Throughout this series, I put several classification models to work to analyze the chances of a player getting into the Hall of Fame at various stopping points in their career. I also used these models to predict the future and assess which players are destined for the Hall of Fame. In the final part of this series, I review all the models I worked with and look ahead to next year’s Hall of Fame candidates.
For those who may have missed some of my previous work in this series, please click on any of the links below. Each link focuses on a different time frame. Join me one final time as I look to unravel the secrets behind the road to baseball immortality!
· Review & 2024 Hall of Fame Ballot
Review of Models (Validation Set)
Of all the models that were tested, the best model turned out to be the one that examined the first 12 years of a player’s career. The validation set produced an accuracy of 95.02% with an area under the curve of 0.97577. As I expanded the time period for evaluation, the accuracy began to rise. This makes sense because while a player could be producing early on in their career, it doesn’t mean those numbers will stick throughout the duration of their career. For example, we saw someone like Darryl Strawberry who had a 97.38% chance of making it into the Hall of Fame after five seasons. His production cratered as his career went along and he eventually would miss out on a spot in the Hall of Fame. Therefore, the more games a player has under their belt, the easier it will be to predict whether they will be a Hall of Famer. There is one exception to this rule, however. The “Entire Career” model was the fifth most accurate model. It is my assumption that this occurred because of how vague the term “Entire Career” is. For many players, an entire career is five seasons while for others it is 15. This had a direct impact on how the model evaluated each of the records in the dataset and ultimately how it decided to make its predictions.
2024 Hall of Fame Ballot
Per Baseball Reference, there are 22 hitters that will be eligible for the Hall of Fame (all players who played at least ten seasons and have a score of at least 10 in the HOF Monitor) in 2024.[1] This list consists of players who fell short of the Hall of Fame a year ago but received enough votes to stay on. An example is Todd Helton who tallied 72.2% of the vote (just short of the 75% threshold). This will be his sixth year on the ballot. This list also consists of players who will be on the ballot for the first time. This includes Adrian Beltre, Chase Utley, Joe Mauer, and others. Overall, there are 12 first-timers and 10 returners.
Among this group of players, Alex Rodriguez has the highest probability of getting into the Hall of Fame. Rodriguez hit the ground running as soon as he got to the big leagues. In his first ten seasons, he accumulated 63.6 WAR while posting a slash line of .308/.382/.581 with 345 HRs and 990 RBIs. At this point, Rodriguez had a 99.99% chance of getting into the Hall of Fame. In his final 12 seasons, he accumulated 54.0 WAR while posting a slash line of .283/.378/.523 with 351 HRs and 1,096 RBIs. At the conclusion of his career, Rodriguez had a 100% chance of getting into the Hall of Fame. Overall, in 22 seasons with three different teams, Rodriguez was honored at 14 All-Star Games and was awarded 3 MVPs, 2 Gold Gloves, and 10 Silver Sluggers. This upcoming year will mark his third year on the ballot. So far, he has peaked at 35.7% of the vote. While his numbers suggest he is a lock for the Hall of Fame, his use of steroids may ultimately hurt his chances of joining the best of the best.
Chase Headley has the lowest probability of any player in this group (10.05%). In his first five seasons, Headley accumulated 7.9 WAR while posting a slash line of .269/.343/.392 with 36 HRs and 204 RBIs. At this point, Headley had a 47.79% chance of getting into the Hall of Fame. In his final seven seasons, he did not help his chances. He accumulated 18.0 WAR while posting a slash line of .259/.342/.403 with 94 HRs and 392 RBIs. He was also rewarded with a Gold Glove and a Silver Slugger during his career. From the conclusion of his fifth season to the final day of his career, his chances of getting into the Hall of Fame decreased by 78.98%. This year will mark Headley’s first year on the ballot, but it is unlikely he will get enough votes to either stay on the ballot or get elected in his first year.
Gary Sheffield is slated to be on the ballot for the tenth and final time this upcoming year. Some notable players to get selected in their tenth year on the ballot are Larry Walker (2020), Edgar Martinez (2019), and Tim Raines (2017). Last year (his ninth year on the ballot), he received 55.0% of the vote which is the most he has received thus far. To begin his career, it was anything but smooth sailing. In his first five seasons, Sheffield accumulated 7.7 WAR while posting a slash line of .283/.341/.444 with 54 HRs and 233 RBIs. At this point, he had a 34.58% chance of getting into the Hall of Fame. In his final 17 seasons, he accumulated 52.9 WAR while posting a slash line of .294/.404/.529 with 455 HRs and 1,443 RBIs. Throughout his career, he was honored at 9 All-Star Games and was awarded 5 Silver Sluggers. At the conclusion of his career, Sheffield had a 98.93% chance of getting into the Hall of Fame (a 186.07% increase from the point his fifth season ended). We will see if he can capture the attention of more voters this year and get over the hump.
While it is far from a guarantee that all 22 of these players will get into the Hall of Fame, the numbers suggest that 19 of these players have a great shot at joining an elite class.
Conclusion
Can any insights be gained from a player’s first several years in the big leagues? Yes and no. While a player may begin their career on a high note, there is never a guarantee that that player will sustain that success. In fact, most players who succeed in their first several seasons find it hard to replicate that success as numerous factors come into play. Whether it has to do with pitchers learning their tendencies or an inability to stay healthy, flourishing in that batter’s box is no easy task. There is a reason the average career length of an MLB player is 5.6 years.[2] Sustained success is simply hard. Therefore, as a player moves into his tenth season and beyond, there is a bigger sample size to draw from. While it is still not a given that a player will make it to the Hall of Fame based on that achievement alone, they already have a leg up on the hundreds of players who failed to lengthen their careers. Again, it was determined that the most predictive power can be gained after a player’s 12th season. For players such as Mike Trout, Paul Goldschmidt, and Anthony Rizzo who recently completed their 12th season in 2022, a case can be made for or against their Hall of Fame candidacy since they have been in the league and have produced for quite some time. While players such as Ronald Acuna and Juan Soto have already cemented themselves as faces of baseball and put themselves on the Hall of Fame highway, baseball is a tricky sport, and the future is full of the unknown. At the end of the day, the Hall of Fame status of players everywhere is determined by the BBWAA, not a comprehensive machine learning model.
[1] https://www.baseball-reference.com/about/leader_glossary.shtml#hof_monitor
[2] https://mlbrun.com/average-career-length-of-mlb