Win probability graphs and stats are now available for over 600 grand slam matches from 2011. Thanks to IBM Pointstream from this year’s slams, there is a wealth of data available like never before.
Here’s a sample match: The US Open semifinal between Federer and Djokovic.
When I first started publishing tennis research, win probability was one of my focuses. You can find earlier work here, which links to specific tables for games, sets, and tiebreaks. I’ve also published much of the relevant code, which is written in Python.
Win probability represents the odds of each player winning after every point of the match, based on the score up to that point and which player is serving. It makes no assumptions about the specific skill levels of each players, but does assume that the server has an advantage, which varies based on surface and gender. With every point, each player’s win probability goes up or down, and the degree to which it rises or falls is dependent on the importance of the point–at 4-1, 40-0, winning the point is nice, but losing the point just delays the inevitable; at 5-6 in a tiebreak, the potential change in win probability is huge.
To quantify that in the graphs, I show another metric: Volatility, which measures the importance of each point. It is equal to the difference in win probabilities between the server winning and losing the following point. 10 percent is exciting, 20 percent is crucial, and 30 percent is edge-of-your-seat stuff.
To produce these numbers, I needed to make several simplifying assumptions. Some are more important than others; here are the big two:
- The players are equal.
- Each player’s ability does not vary from point to point.
The first of these is almost always false, and the second is probably false as well. The first, however, makes things more interesting. In most matches Novak Djokovic plays these days, he goes in with an 80-percent-or-better chance of winning. If we graphed one of his matches starting at 85 percent, we’d usually get a very slowly ascending line. Instead, by starting at 50 percent, we can see where he and his opponent had their biggest openings, and who took advantage.
(In this long-ago post, I showed a sample graph with an assumption similar to the 85 percent for Djokovic, and you can see some of what I mean.)
Assuming that the players are equal also sidesteps of messy question of how to quantify each player’s skill level on that day, on that surface, against that opponent.
The second big assumption ignores possibility real-world attributes like clutch performance and streakiness, along with more pedestrian considerations like some players’ stronger serving in the deuce or ad court.
Another long-ago article of mine suggests that servers are not absolutely consistent, possibly because of natural rises and falls in performance, also possibly because of risk-taking (or lack of concentration) in low-pressure situations. One of the most interesting directions for research with these stats is into this inconsistency: We need to figure out whether some players are more consistent than others, whether “clutch” exists in tennis, and much more.
One more set of assumptions regards the server’s advantage. Since these graphs only encompass the four grand slams, I set the server’s win percentage for each tournament. The numbers I used for men are: 63% in Australia, 61% at the French, 66% at Wimbledon, and 64% at the U.S. Open. I used percentages two points lower for women at each event.
More on Win Probability
There’s very little out there on win probability and volatility in tennis. I wasn’t the first person to work out the probability of winning a game, a set, or a match from a given score, but as far as I know, I’m the only person publishing graphs like this. Much of the problem is the limited availability of play-by-play descriptions for professional tennis.
That problem doesn’t apply to baseball, where win probability has thrived for years. Here’s a good intro to win probability stats in baseball, and fangraphs.com is known for its single-game graphs–for instance, here’s tonight’s's Brewers game. In many ways, win probability is more interesting in baseball than in tennis. In tennis, there are only two possible outcomes of each point, while in baseball, there are several possible outcomes of each at-bat.
Enjoy the graphs and stats!