Markov Chain Analysis of State Transitions, Player Impact

This analysis explores soccer match dynamics using StatsBomb's open event data, using 2003-2004 and 2015-2016 Premier League data, implementing a Markov chain model to understand scoring probabilities and player impact. Naturally, given the lack of natural breaks in play in soccer, it took a lot of hard thinking to formulate a plan to sequence stages of play throughout the game. Eventually, I settled upon using the state definition listed below. 

State Definition

The game state is defined by three key components:

  1. Team in possession (home/away)

  2. Field position (defensive/middle/attacking third)

  3. Shot opportunity presence (true/false)

This creates a comprehensive state space that captures the essential elements of attacking progression and scoring threat. 

Analysis

Highest Transition Prob (0.9847): home middle TRUE ~ away defensive FALSE 

Highest Scoring Prob Change (0.124) : home defensive FALSE ~ home attacking TRUE

Highest Value (1.51): home attacking TRUE, away attacking TRUE

This table above captures the transition and scoring probabilities of each of the states. The highest transition probability state represents the home team with the ball in the middle third of the pitch with a shot opportunity to the away team holding the ball in their defensive third without a shot opportunity. This probability suggests that when a home team has a shot opportunity in the middle third, play very often results in the opposition being pushed back into a defensive position without an immediate threat. This reflects how possession advantage often translates into territorial advantage in soccer. 

Meanwhile, the highest scoring probability change is from home defensive FALSE to home attacking TRUE. This reads to me as counter attacks, where teams are transitioning from defense to attack in a quick, swift move. Essentially, an efficient counter attack that produces a shot is highly valuable and most likely to produce a reward output which in this case would obviously be a goal. 

Additionally, I created a value variable, a “value” associated with transitioning between these two states. The value 1.51 represents the "state value" for both "home attacking TRUE" and "away attacking TRUE" states. This is a very intuitive result, as obviously, there is a high reward opportunity, or high value when a team is in the attacking third with a shot opportunity. 

In the graph above, we see a lot of players such as Cesc Fabregas and Mesut Ozil, who were very talented midfielders yet had very low value added numbers. This is likely explained by the pure volume and nature in possession that these players possessed. These are deep lying playmakers who orchestrate play rather than making the final, high-value actions. This also shows limitations in our model in not properly capturing the effect of players who play progressive passes that don’t directly lead to shots. A future analysis would involve a similar process to that of xGChain, which distributes the value of xG across all players involved in the sequence of play. 

Top 3 - Value Added Per Transition

  1. Jamie Vardy (8610 transitions) - 0.014

  2. Shinji Okazaki (6636 transitions) - 0.013

  3. Riyad Mahrez (11592 transitions) - 0.011

It’s interesting that these top three players in value added per transition were all on the 2015/16 Premier League winning Leicester City squad, one of the biggest surprise teams of all time. Vardy’s high value per transition makes sense given his talent as a clinical striker who was likely the target on many of Leicester’s counter attacks, having many touches in dangerous areas and positions. Shinji Okazaki and Mahrez, on the other hand, were wingers on each side of Vardy, and played pivotal roles in linking up with Vardy and other players in the midfield and transitioning the team up the field. 

Top 5 - xG Per Transition

  1. Jamie Vardy (8610) - 0.00518

  2. Jermain Defoe (6362) - 0.00421

  3. Sergio Aguero (8862) - 0.00408

  4. Harry Kane (11470) - 0.00377

  5. Thierry Henry (13056) - 0.00366

Vardy’s leading average again highlights his role as a dangerous threat up top as a striker, serving as a target man of sorts in counter attack situations and overall possessing the ball in dangerous positions. Defoe shows a classic poacher style of a striker, with high xG totals, yet having negative value added numbers (-0.000456 value per transition). He’s a player not entirely involved in the build up but extremely efficient as a goal scorer when he has the opportunity. Aguero, Kane, and Henry on the other hand, played more hybrid roles for their respective teams. Kane had a role as a complete striker, known for his distributive abilities just as much as his scoring, while Henry played a winger/striker hybrid. 

This model has several notable limitations. Our simplified state space, while capturing basic game flow, misses nuanced elements like player positioning, defensive pressure, and possession quality. It fails to recognize the value of defensive efforts, as well as simplifying value as simply goal leading actions. The model particularly struggles to capture the full value of deep-lying playmakers like Fabregas and Özil, whose contributions often lie in sequence initiation as well as being the “pass before the pass,” contributing to actions that don’t directly lead to shot opportunities, but are vital in setting up sequences that lead to those opportunities. 

The case of Leicester City's 2015/16 season provides fascinating validation of our model's ability to capture effective counter-attacking soccer. The emergence of Vardy, Okazaki, and Mahrez at the top of our value metrics aligns with their historic performance. Vardy's leading position in both value added (0.014) and xG per transition (0.00518) reflects his exceptional ability to convert defensive positions into scoring opportunities. His teammates' high ratings demonstrate how the model captures successful attacking partnerships and efficient possession usage.

However, this isn’t without its own limitations. This model finds tremendous value in counter attacks, which is of course, valid in that there is certainly a lot of value in transitioning from defense to attack while the opponent is not fully set up defensively. However, this may lead to misleading takeaways. If you observe modern soccer, the most dominant teams in the world rely not on counter attacks, but on possession and slow build up, while occasionally relying on counter attacks when they arise. On the flip side, lower tier teams rely primarily on counter attacks, preferring to give possession to the other teams and playing on the counter. Therefore it is important to understand these limitations. While counter attacking transition states may provide the most value, it is not necessarily the most optimal strategy to rely on consistently to win games. 

xT Analysis

Another approach to analyzing player impact is using xT, or expected threat. xG, or expected goals has become the talk of soccer analytics over the past few years, slowly finding its way into soccer media and punditry, and while it offers insight into a teams quality of chances and shots, it doesn’t actually give us a full scope of a player or team’s performance in other aspects of the game.

xT offers us a more insightful look in the progression of the ball into various areas of the pitch, using Markov chains to evaluate a string of outcomes rather than just the final pass or final shot. This blog here does an excellent job of explaining xT in detail.

The analysis of Expected Threat (xT) from open play passes in the Premier League seasons 2003/04 and 2015/16 reveals fascinating insights about player creativity and tactical evolution. The metric, which measures the probability of a pass leading to a goal in subsequent actions, shows a diverse range of players excelling in threat creation through passing.

At the top of the chart, we find an interesting mix of players led by Erik Pieters, Xherdan Shaqiri, and Willian. Pieters' high ranking is particularly intriguing as he primarily played as a left-back, suggesting exceptional ability in progressive passing from defensive positions. Shaqiri and Willian's presence is less surprising, as both were renowned for their creative abilities from wide positions, regularly delivering threatening passes in the final third.

The high positioning of Dimitri Payet stands out as it aligns perfectly with his reputation as one of the most creative players during his time at West Ham. His technical ability to deliver precise passes that increased scoring probability is well-reflected in these numbers. Similarly, players like Robert Brady and Marc Albrighton's high rankings demonstrate how effective wide playmakers could be in creating threatening situations through their passing.

This graph shows us ball progression more broadly, also considering carries alongside passes. Most strikingly, Thierry Henry tops this chart with a remarkable xT value of nearly 6 per 90 minutes from passes and carries, significantly higher than the previous graph's maximum of around 3.

The presence of Henry at the top is particularly telling of his legendary status. His exceptional ability to both dribble past defenders and create chances for teammates made him one of the most complete forwards in Premier League history. Following closely is Willian Borges da Silva, whose high ranking across different progression metrics confirms his effectiveness as a creative force.

This version of the data features more recognized attacking players compared to the previous graph. The inclusion of forwards like Harry Kane, Jamie Vardy, and Sergio Agüero shows how modern strikers contribute to ball progression beyond just finishing. Similarly, the presence of Eden Hazard, Alexis Sánchez, and Riyad Mahrez - all known for their dribbling ability - demonstrates the importance of carry-and-pass combinations in creating threatening situations.

The data presents an interesting mix of player profiles. Some, like Dimitri Payet and Xherdan Shaqiri, maintain their high rankings from the previous metrics, confirming their all-round creative abilities. Others, like Anthony Martial and Wilfried Zaha, appear prominently here, reflecting their strength in ball carrying and progressive play. The inclusion of Moussa Sissoko shows how box-to-box midfielders can contribute significantly to ball progression through their dynamic running with the ball.

What's particularly notable is how this metric captures different types of threat creation. Players like Henry and Hazard were known for their ability to progress the ball through both dribbling and passing, while others like Kane and Agüero show how modern forwards need to be involved in build-up play as well as finishing. The higher xT values in this graph (ranging up to 6 compared to the previous 3) suggest this combined measure of passes and carries captures more of the total threat that players create.

Conclusion 

Our Markov chain analysis provides valuable insights into soccer match dynamics, particularly in identifying effective counter-attacking play and finishing efficiency. While the model successfully captures certain aspects of the game, its limitations in valuing build-up play and possession-based approaches suggest opportunities for refinement. The Leicester City case study validates the model's ability to identify efficient attacking play, while also highlighting areas for improvement in capturing different playing styles. Future iterations could incorporate temporal dependencies, more sophisticated state definitions, and better methods for valuing playmaker contributions. Despite its limitations, the model offers a quantitative framework for understanding the complex dynamics of soccer, providing actionable insights for tactical analysis and player evaluation.

Previous
Previous

Random Forest and Ensemble Model - Predicting wRC+ Values

Next
Next

Using KMeans Clustering to Break Down Premier League (2023-24) Striker Archetypes