What makes a valuable shortstop?
The shortstop position in the MLB has been widely regarded as one of the most vital positions defensively. This demanding position has held some of the best players in MLB history, such as Derek Jeter, Cal Ripken Jr., and Ozzie Smith.
However, the current era of baseball has given rise to new archetypes of shortstops. To examine which archetype is the best, I used K-means clustering to identify the different archetypes of the shortstop position. Once I collected information regarding the various archetypes, I used Wins Above Replacement (WAR) to quantify each archetype’s respective value. WAR factors in a player’s baserunning, defensive, and offensive abilities into one single number. By quantifying the value of each archetype, we can accurately estimate which type of shortstop is the most valuable, which is a valuable insight for front offices to assess players during the draft.
The two charts below represent the context I will use in assessing the value of the shortstop archetypes.
Context: WAR
Context: wRC+
Methodology
K-means clustering is an unsupervised machine learning technique used to partition n observations (in this case, shortstop players) into k clusters (where k = 5, in this case) in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Through clustering, I identified five unique archetypes for the shortstop position, then used WAR to identify the most valuable/successful archetype. I used data from the 2021 MLB season from Baseball Savant (https://baseballsavant.mlb.com/), and Fangraphs (https://www.fangraphs.com/).
To determine the clusters, I used seven stats: OAA (Outs Above Average), sprint speed, average exit velocity on balls in play, launch angle, batting average, hard hit ball percentage, and strikeout percentage. Because of the varying scales in which these stats are calculated, I standardized the data on a scale from -2 to 2. Performing clustering uncovered interesting correlations between the various attributes and success metrics such as WAR. For instance, I found that offensive production trumps defense. This makes sense considering that a team full of above average hitters would produce much better than a team of the best defensive players.
With that said, let’s get into the different clusters.
Cluster 1
Starting off with cluster 1, there are the inconsistent hitters. In this cluster, we have players such as Trevor Story, Gleyber Torres, Dansby Swanson, Willy Adames, Didi Gregorious, Kyle Farmer, Freddy Galvis, Paul Dejong, and Nick Ahmed.
With a low batting average, a below average exit velocity, and high strikeout percentage, we can infer that these players are below average when it comes to hitting. With an exceptionally high average launch angle, the players in this cluster rank among the highest in 2021 in fly ball percentage. From this, we can speculate that these players are sluggers, but inconsistent at best. This archetype has an average wRC+ of 90.11, which is, in reference to Fangraphs wRC+ calculations, an average player. With a 1.71 WAR, referencing the Fangraphs WAR interpretation scale, this archetype is labeled as a “role player,” or a player that does not start regularly but comes off the bench. This archetype is the least valuable of the five in terms of WAR.
Cluster 2
In cluster 2, we observe a much higher average in OAA, suggesting a more defensive-minded player, on average. While their batting stats are lower, on average, their defensive stats and speed are among the highest relative to other clusters.
Players in this cluster include Isaiah Kiner-Falefa, Kevin Newman, Nicky Lopez, JP Crawford, Alcides Escobar, Elvis Andrus, Miguel Rojas, and Andrelton Simmons. With a mean batting average of .256 among the players and one of the lowest launch angles relative to other clusters, this cluster highlights players that are defensive-minded and use their speed and contact skills to power through hits. On average, the mean WAR among these eight players is 2.13. In the context of Fangraphs WAR calculations, this WAR suggests Cluster 2 to be a solid starter, as referenced by Fangraphs WAR interpretation scale.
Cluster 3
Cluster 3 is one of the smallest sample sizes in this data, but it is not one to be overlooked. Consisting of Carlos Correa, Francisco Lindor, and Brandon Crawford, among others, this cluster is well balanced among all seven stats.
This cluster has the highest average OAA, displaying the best defense among the other archetypes. What these players seemingly lack in speed, they make up for with their above average advanced hitting stats including exit velocity, launch angle, batting average, and hard hit percentage. These players possess equally valuable defensive and offensive artillery. These players also have some of the most power, averaging a higher launch angle and exit velocity than the mean. With a 5.6 WAR, this balanced archetype ranks among the highest performing of the shortstops. In the context of Fangraphs WAR calculations, this balanced archetype is calculated as a superstar caliber player.
Cluster 4
Clocking in with the highest WAR, we have archetype four. This archetype is the fastest of them all, and what they lack defensively is redeemed by their batting abilities. This flashy archetype includes the one and only Fernando Tatis Jr. and Trea Turner, both superstar athletes sporting incredible WARs of 7.3 and 6.8, who are each debatably the best at their position.
With the highest WAR of the five archetypes, there is no doubt that cluster 4 is the most valuable. This cluster produces the highest offensively with an wRC+ of 149.5, but lacks in defense (which makes sense considering the absurd amount of errors Tatis made last season). Yet, with the highest average speed, and above average in every stat, the “flashy, high power, high consistency player” is the most valuable, unless like Fernando Tatis, you lie about having ringworm to cover up your uses of steroids. Even with this in mind, the idea of having a “flashy superstar” as an archetype for a shortstop is an outlier of an archetype that is likely the product of another archetype, but simply being slightly better. For example, Tatis Jr., can be put into the fifth cluster (which we will talk about soon), but he is not because he is simply in his own tier in terms of value relative to the average. It would be useless for a front office who is in search of a shortstop to look for a flashy superstar player, because it is not really an archetype that can be associated easily with any specific traits such as defense or any other types of contact.
Cluster 5
With cluster 5, we see the lowest defensive rating of all five archetypes. And yet, we see some of the highest offensive output of the five clusters.
This cluster includes players such as Corey Seager, Tim Anderson, Amed Rosario, Bo Bichette, Gavin Lux, and Xander Bogaerts. These five players produced an average wRC+ of 117.67 and an average defensive rating of 2.12.
Conclusion
These trends above give us insight into what a General Manager or Manager could look for when examining a player to fill the shortstop position. By leveraging these different archetypes, we will be able to determine the most valuable type of shortstop, which could allow coaches to identify players to fill the position at shortstop. In the Minor Leagues, these archetypes could be useful in draft simulations, draft selections, and trade decisions. Despite the efficient nature of these archetypes, there are many flaws. Had there been more time, a larger sample size would have given us a better sense of the different archetypes. Researching historical data would give us a sense of any archetypes that existed in the past.
Regardless, it is interesting nonetheless to examine what makes a valuable shortstop in the MLB.