Paris Saint-Germain 1-0 Real Madrid: A moment of genius from Kylian Mbappé secured a deserved win for PSG
Mbappé's 1.38 OBV from dribbles and carries was the 2nd-most by an individual player in the UCL this season
Let us introduce you to On-Ball Value.
Inherently, football has always been measured. The very nature of the sport is to measure which team scored more goals than the other and assign a points-based reward based on outscoring the opponent. We’ve always measured goals, but then we started measuring shots, and then we started measuring the quality of those shots with Expected Goals. But given just how little of football is shots – less than 1% of all actions on the pitch – there’s a whole bunch of football in the middle going unmeasured.
With the advent of advanced analysis quickly disappearing in the rear-view mirror as more and more teams adopt data into their day-to-day, analytics has finally reached the point of measuring what goes on between both boxes. Possession State Value (PSV) models were born, and I’d like to give a particularly warm welcome to ours and what we believe is a tangible (and importantly, measurable) upgrade on what has come before.
StatsBomb customers already have access to the data and are implementing it into their performance and recruitment analysis. A white paper was sent out alongside the model’s release into SBData to explain the methodology and design decisions of the model, and we’re excited to now release details of On-Ball Value (OBV) into the public domain. There’ll be two parts to this introduction: next week we’ll chop and slice the data to show how to use it in the football world, but there’s an important question to answer in today’s piece - what is OBV?
The premise of PSV models is to objectively and quantitatively measure the value of each event on the pitch. You can do this by assessing the change in probability of a team scoring and conceding as a direct result of the event. You don’t need me to tell you that passes the move the ball closer to the opponent’s goal have a higher value towards increasing the probability of a team scoring compared to passes that move the ball away from the opponent's goal. Equally, turnovers closer to a team’s own goal have a greater negative impact on said team’s likelihood of conceding compared to turnovers at the attacking end of the pitch.
In summary, there are two key benefits to PSV models that other common measures of build-up play - such as assists, xG Assisted, and xGBuildup - are unable to adequately account for:
a) An ability to differentiate between the value of different passes or actions within a possession chain that leads to a goal. That is to say, being able to accurately identify the actions that were more important towards the creation of a chance, and awarding them greater credit than actions identified to be less important.
b) An appropriate consideration of the opportunity cost of attempting high-risk actions and losing the ball. High-risk, high-reward players that are often key attackers on their team will be recognised and credited in this model – so long as the effect of their actions is a net benefit to the team overall.
There are several possession state value models out there, with the first known iteration (publicly at least) that of Sarah Rudd's back in 2012, though it is worth mentioning that Charles Reep in 1997 developed a seminal model that could be considered a PSV model of sorts. The back catalogue of work is vast and to be applauded for moving football analysis forward. But work is there to be built and improved on - there are numerous reasons why we believe our methodology is an upgrade on what has come before.
The key merits of OBV’s approach are:
The model is trained on StatsBomb xG. Many other models train directly on goals, but using xG to estimate the goals scored from each possession allows us to train models more accurately with the same amount of data by reducing the variance and class imbalance of purely goals scored or conceded. There are other PSV models out there known to be using xG, but our approach should be an improvement due to the xG model used, considering the shot freeze frames feature of StatsBomb xG and that it’s known as the most performant xG model available.
We opted to train two separate models for the Goals For and Goals Against components of possession value, an approach unlike most others. This allows us to track each event’s impact on the team’s chances of scoring or conceding separately to resolve between the attacking and defensive contributions of each action, instead of just net Goal Difference (GD).
We have chosen to not credit pass recipients. While there can be some value to receiving the ball and holding up play, most of that benefit comes from the movement of players off the ball. This is very challenging to quantify with event-level data. From the perspective of the ball location and event data, there is no intrinsic value to receiving a ball. If players go on to lose the ball every time they receive it, the outcome is largely indistinguishable from the players not receiving it in the first place, which indicates that the ball receipt itself does not add value. However, players that are able to put themselves in good positions to receive the ball give themselves an opportunity to follow that up with an action. Players that are good at receiving the ball are therefore rewarded (or penalised) indirectly based on the outcome of the subsequent event, which would not have occurred if the player had not successfully received the ball.
Possession state features. We have chosen to include features describing the pitch location (x and y coordinates, distance to goal, angle to goal), action context (set play, open play, etc.), whether the event was carried out while pressured from an opposition player (as can only be done with StatsBomb Data), and body part used for the event (Head, Foot, etc.). We actively decided to not include "possession history" features, such as details of the previous events in the possession, as a proxy for likely opposition defensive structure, as some similar models have opted to do. It would be desirable for these variables to act as proxies for the availability and location of teammates as well as the positioning of the opponent but, in practice, many possession history features correlate strongly with other factors such as team play style and, more importantly, team strength. To give one example, other models overvalue passes made in longer possession chains, as stronger teams typically (and demonstrably) have longer chains of possession than teams of weaker strength. Our approach ensures that each event is valued independently of team strength.