Using KMeans Clustering to Break Down Premier League (2023-24) Striker Archetypes

The iconic image of the striker, a goal scoring machine tasked solely with finding the net, is slowly fading from the Premier League landscape. The modern game has witnessed an evolution in this pivotal position. Gone are the days of a one-dimensional approach, and in come the days welcoming a diverse group of strikers, each offering a unique blend of skills that extend far beyond simply scoring goals.

This diversity of attacking styles coincides with a dramatic rise in goal-scoring across the league. Teams are more offensive-minded, defenses are more porous, and strikers are reaping the rewards. But the "how" of those goals is as interesting as the sheer number. There is an increasing number of attacking strategies, with strikers morphing into multifaceted attackers who can score with clinical finishing, create opportunities for teammates, or act as a dominant aerial presence in the box. Some are pure finishers, others create chances for their teammates like midfielders, and some are giants in the box who win every head ball.

To unpack this new development of varying striker archetypes, I'll delve into distinct striker clusters within the Premier League. By analyzing key metrics and playing styles from the 2023-2024 Premier League season through April 12, I'll identify the different ways these modern-day strikers are redefining the role and contributing to their teams' success. 

To capture all facets of contribution that strikers bring in their different playing styles, I compiled various metrics that allow for a holistic overview. I observed passing metrics such as passes completed, progressive passes, and long passes completed to gain insight into their involvement in ball movement and build-up play, as well as their proficiency in moving the ball far upfield with long passes. I also looked at xAG, which measures the likelihood of an assist resulting in an expected goal. xAG is an extension of xA, expected assists, but slightly refined to allow us to distinguish passes that set up high probability scoring chances. In the words of herfootballhub.com, “while xA measures the quality of a pass leading to a scoring chance, xAG takes it a step further. They do that by quantifying the probability of an assist resulting in an expected goal. In other words, xAG focuses specifically on the contribution of an assist towards the likelihood of a goal being scored.” 

Then, to evaluate strikers’ goal scoring prowess, I observed total shots, total goals, and npxG, non-penalty expected goals. These metrics provided a sense of how often the players position themselves to take shots, their effectiveness in converting their chances, and the quality of their chances. 

The final aspects of play I analyzed were possession and dribbling. I focused on the frequency of live ball touches to gauge how often the strikers are involved in play, progressive carries to assess how well they drive and progress the ball forward into attacking possessions, and finally progressive passes received to determine if the strikers are target men—the focal point of attack up front and a go to target for midfielders’ forward passes. 

To conduct the analysis, I compiled the data using Excel maneuvers before loading it into RStudio. The first step was taking the data and standardizing it using z-score normalization. I then applied a K-means clustering algorithm to the scaled dataset using the Elbow Method (within cluster sum of squares method) to identify the appropriate number of clusters to apply. The method provided five clusters that I will delve into below. 

Cluster 1: Above Average Scorers, Decent Chance Creation, Diverse Skill

Players: Cody Gakpo, Kai Havertz, Darwin Nunez, Richarlison, Ivan Toney

Averages: 0.478 Goals/90

The players in this cluster exhibit above-average performance across several key metrics. The chart shows that this group has the highest scaled values for goals (Gls), expected goals (xAG and npxG), and a diverse skill set, as indicated by their strong positioning across passes, shots, touches, carries, and progressive actions. These players are not only prolific goal scorers but also contribute significantly to their team's chance creation and overall attacking play. They demonstrate well-rounded abilities, suggesting they are capable of both finishing chances and actively participating in the build-up of their team's attacks. Out of all clusters, cluster 1 has the second highest scaled average for goals/90 scored, suggesting a group of talented goal scorers. Additional metrics, including the number of passes completed and progressive passes completed, highlight their involvement on the ball. This is relatively sensible given the players in this cluster. Kai Havertz has emerged as a key part of Arsenal’s attack, scoring nine goals and playing an important role in build-up play. Richarlison, despite a horrendous scoring season last year, has found his stride this season, contributing with his fair share of impressive assists and passes while linking up with Heung-min Son up front. 

Cluster 2: Inconsistent Goal Scorers, Below Average Chance Creators

Players: Michail Antonio, Evan Ferguson, Lyle Foster, Rasmus Hojlund, Raul Jimenez

Averages: 0.332 Goals/90

Not to disrespect any of these players, especially Rasmus Hojlund who has improved massively over the course of this season, but this cluster is essentially a combination of poor strikers who offer minimal value in both goal scoring and chance creation. The chart shows that this group has relatively low scaled values for goals (Gls), expected goals (xAG and npxG), and other chance creation metrics like progressive passes (PrgP) and progressive passes received (PrgR). While they may have the occasional goal-scoring performance, their overall contribution to their team's offensive output is limited. These players struggle to consistently find the back of the net and are less effective in creating scoring opportunities. In many ways, this cluster shares similarities with cluster 5, which we’ll visit later; both consist of target men strikers—players primarily involved in positioning themselves to score and scoring, rather than participating in build-up play and moving the ball around. However, unlike cluster 5, this cluster is a compilation of players who struggle to do their one job: score.

Cluster 3: Julián Álvarez - Creator 

Players: Julián Álvarez

Cluster 3 represents the Julián Álvarez striker archetype. I almost didn’t include Álvarez because of his unique role compared to traditional strikers, as he usually plays as an attacking midfielder for Manchester City. However, he positionally identifies as a striker and has had his fair share of play up front for City and during the World Cup for Argentina, so I decided to leave him in. The chart shows that Álvarez has a strong profile in terms of chance creation metrics, such as progressive passes (PrgP), passes (Cmp), and expected assist generation (xAG), while his goal-scoring numbers (Gls and npxG) are relatively lower. This suggests that Álvarez's primary role is to orchestrate the team's attacking play, facilitating the goal-scoring efforts of his teammates through his vision, passing ability, and ball progression skills. He is a player who excels in creating scoring opportunities for others rather than being the primary goal-scoring threat himself. This distinct profile highlights Álvarez's unique role as a creative force within his team's attacking system.

Cluster 4: Skilled Dribblers and Creators, Inconsistent, Poor Goalscorers

Players: Matheus Cunha, Gabriel Jesus, João Pedro

The players in this cluster exhibit strong technical abilities, particularly in terms of dribbling and chance creation. The chart shows that this group has high scaled values for progressive carries (PrgC), successful carries (Succ), and expected assist generation (xAG), indicating their prowess in ball progression and chance creation. However, their goal-scoring metrics (Gls and npxG) are relatively low, suggesting they struggle to consistently find the back of the net. These players may be effective in advancing the ball, creating opportunities for their teammates, and maintaining possession, but their inability to reliably convert chances into goals can limit their overall impact on their team's offensive output. The cluster accurately reflects the play styles of the players in the cluster, including Gabriel Jesus, Arsenal’s striker who is notorious for his refusal to shoot and preference to show off his Brazilian heritage through his impressive special skill moves. While it doesn’t translate to consistent goals, this pattern has provided many benefits for Arsenal, as his creativity and skills on the ball have set up many scoring positions for other Arsenal goalscorers. Meanwhile, Joao Pedro has operated more as a false 9 (a player who is positionally labeled as a striker but moves into the midfield as a creator when their team has possession) for Brighton, demonstrating excellent control and movement of the ball. Matheus Cunha has been an in and out striker for Wolves, a player who tracks back on defense consistently, and similarly to Pedro, takes the ball up field on his own. However, as seen on the chart, all three players rank relatively low on scaled value for shots, suggesting that despite being able to work themselves into dangerous positions with their dribbling skills, they are unable or unwilling to get shots off and even more unlikely to convert their shots. 

Cluster 5: Goalscoring Target Men 

Players: Erling Haaland, Taiwo Awoniyi, Dominic Calvert-Lewin, Odsonne Édouard, Alexander Isak, Nicolas Jackson, Eddie Nketiah, Dominic Solanke, Ollie Watkins, Callum Wilson, Chris Wood

Averages:

The players in this cluster are characterized as traditional "target men" who excel at goal-scoring. The chart shows that this group has the highest scaled values for goals (Gls) and non-penalty expected goals (npxG), indicating their prowess as prolific goal scorers. They may not necessarily excel in other aspects of the game, such as chance creation or ball progression, but their ability to consistently find the back of the net makes them valuable assets for their teams. These players are likely to be the primary focal point of their team's attacking strategy, with their physical presence and aerial dominance allowing them to convert chances created by their teammates. Their profile emphasizes the enduring importance of goal-scoring ability in the modern game, as teams often rely on a reliable and dominant striker to provide a consistent goal-scoring threat.

After observing these trends and clusters of Premier League strikers, I created a PCA (principal component analysis) Biplot using the fviz_pca_biplot function in R to observe what specific metrics were used to sort the clusters.

This biplot allowed me to visualize the relationship between the clusters and the original variables. The points are the individual players, labeled by the specific cluster they belong to. The arrows are the metrics, and the proximity of the player points to the arrows suggest how strongly the player is associated with that particular attribute. Starting from the right side of the plot we have cluster 5. As suggested by the cluster chart presented above,as well as the label they were given as “goalscoring target men,” it makes sense that the two metrics most strongly associated with this cluster are non-penalty xG and goals. Meanwhile, cluster 1 appears to be more correlated with xAG, progressive passes received, and shots taken. These players are heavily involved in the attack and are looked to as progressive passing options when moving the ball upfield and as options for to get shots off. Clusters 3 and 4 are correlated significantly with chance creation metrics, from progressive passes completed, long passes completed, live ball touches, and successful and progressive carries. Finally, cluster 2, which is not correlated with any metrics, represents the inconsistent and overall weak nature of the strikers in this cluster. 

Then, I created a bar plot observing the clusters’ performances in each stats to get a sense of the comparison between all five.

The image shows a cluster comparison chart, which visualizes the scaled values of different soccer statistics or metrics across 5 different clusters. 

  1. xAG (Expected Assists Generated): This statistic appears to be highest in cluster 4, indicating players in that cluster generate the most expected assists.

  2. Succ (Successful Passes): Cluster 5 has the highest successful passes, suggesting these players are adept at maintaining possession and completing passes effectively.

  3. Sh (Shots): Cluster 4 stands out with the highest volume of shots taken, implying these players are more attack-minded and willing to shoot.

  4. PrgR (Progressive Runs): Cluster 4 again shows the highest values for progressive runs, meaning these players are capable of making dynamic, forward-moving runs with the ball.

  5. PrgP (Progressive Passes): Cluster 5 has the advantage in terms of progressive passes, indicating these players can advance the ball effectively through passing.

  6. PrgC (Progressive Carries): Cluster 4 exhibits the highest progressive carries, suggesting these players are comfortable dribbling the ball forward and progressing play.

  7. npxG (Non-Penalty Expected Goals): Cluster 4 has the edge in this metric, meaning these players are creating more high-quality scoring opportunities.

  8. Long Cmp (Long Completions): Cluster 5 stands out with the most long pass completions, hinting at their ability to switch play and distribute the ball over longer distances.

  9. Live (Live-Ball Fouls Won): Cluster 4 appears to be the most adept at drawing live-ball fouls, potentially indicating their aggressiveness and ability to draw fouls from opponents.

  10. Gls (Goals): Cluster 4 has the highest goal-scoring prowess, suggesting these players are the most prolific finishers.

  11. Cmp (Completed Passes): Cluster 5 dominates in terms of completed passes, demonstrating their proficiency in retaining possession.

Overall, the analysis suggests that cluster 4 players are more attack-oriented, generating the most shots, progressive runs, and expected goals, while cluster 5 players are more possession-focused, with high completion rates for passes and long balls.

The modern game bears witness to a remarkable evolution in the striker's role, marked by a surge in attacking styles and an unprecedented rise in goal-scoring across the league. Teams, driven by an offensive mindset, embrace a wide range of strategies, each tailored to the strengths of their respective strikers. No longer confined to a one-dimensional approach, strikers are now tasked with the expectations of scoring goals with clinical precision, creating opportunities for teammates, or asserting dominance as aerial threats in the box.

To unpack this new paradigm of striker archetypes, I conducted an analysis of key metrics from the 2023-2024 Premier League season, illuminating the distinct playing styles and contributions of modern-day strikers. Through a comprehensive examination of passing, goal-scoring prowess, possession, and dribbling metrics, five distinct striker clusters emerged, each encapsulating a unique profile within the Premier League ecosystem.

Cluster 1 showcases above-average scorers with a diverse skill set, adept at both goal-scoring and chance creation. In contrast, Cluster 2 comprises inconsistent goal scorers with below-average chance creation, highlighting the limitations of their offensive output and output overall as a consistent threat in attack. Cluster 3 spotlights Julián Álvarez’s uniqueness, who excels in orchestrating their team's attacking play through vision and passing ability. Meanwhile, Cluster 4 comprises skilled dribblers and creators, albeit inconsistent in goal-scoring, while Cluster 5 comprises goalscoring target men, emphasizing their prowess in finding the back of the net.

Further analysis through PCA biplots and cluster comparison charts highlighted the distinct characteristics and performance metrics associated with each cluster, providing valuable insights into the diverse playing styles and contributions of Premier League strikers.

In conclusion, the evolution of strikers within the Premier League mirrors the broader evolution of football itself. From the traditional goal-scoring machine to the versatile attacking force, strikers embody the dynamic nature of the game, constantly adapting to new tactics, strategies, and playing styles. As we continue to witness the evolution of football, one thing remains certain—the role of the striker will continue to evolve, shaping the future of the game for generations to come.

Previous
Previous

Markov Chain Analysis of State Transitions, Player Impact

Next
Next

Christian Yelich’s Decline and Resurrection