New “Event Records” View at TennisAbstract.com

TennisAbstract.com now offers another way to look at stats for every player on the ATP tour.

The new “Event Records” view shows–you guessed it–records by event, summarizing a player’s performance at a given tournament, including his career record, career tiebreak record, years played, best result, and the usual complement of aggregate statistics such as return points won and break points saved.

To access a player’s event records, click here, in the upper left corner, right next to the link to the head-to-head view I introduced recently:

er1

Then you’ll see something like this:

er2

 

The events names are links, so you can click on any of those to see the full list of matches the player contested at that tournament.

Three columns in the middle of the table–“First” (the player’s first year at the event), “Last,” and “Best” (his best result at the tournament)–are loaded with additional information. Mouseover the data in those columns to see a description of the player’s last match (for “First” and “Last”) and the years in which he achieved his best result:

er3

 

er4

 

If you’re interested in particular subsets of matches, most of the filters in the left-hand column function as they normally do. For instance, let’s say you’re interested in Stan Wawrinka’s performance at various events as a top-ten player:

er5

 

You can also use the filters to reduce the number of tournaments on view. Use the “Level” filter to show only Grand Slams or Masters. Use the “Surface” filter to show only events on a particular surface. I also added a “Minimum Years” filter so that you could limit the list to tournaments that the player entered a certain number of times.

In the context of event records, some of the filters are more useful than others (would anyone ever have a use for tournament-by-tournament records in matches with bagel sets?), but at the very least, there are a ton of tools here to play around with.

Enjoy!

2 Comments

Filed under Tennis Abstract

Do Players Get Broken More Often After Failing to Convert Break Point?

The headline is a bit unwieldy, but it refers to one of the most common nuggets of conventional wisdom in tennis. When a player has the opportunity to break and doesn’t do so, this viewpoint holds that they are more likely to get broken in their following service game.

Like so much conventional wisdom, this assumes that momentum plays a role. Break points are crucial moments, and if a player doesn’t capitalize, the momentum will turn against him. That momentum then carries into the following game, and the player who failed to convert gets broken himself.

Or so the story goes.

However, data from almost 3,000 2013 tour-level and qualifying-round matches suggests the opposite. The likelihood that a player holds serve has almost nothing to do with what happened in the previous game.

Let’s start with some general numbers. To make sure we’re comparing apples to apples, I’ve ignored the first game of every set. This way, we compare “games after missed break point chances” to “games after breaks” to “games after holds.” In other words, we’re only concerned with “games after something.” I’ve also limited our view to sequences of games within the same set, since the long break between sets (not to mention other psychological factors) seem to put those multi-set sequences of games in a different category altogether.

Once those exclusions are made, this set of several thousand ATP matches showed that players got broken in 21.7% of their service games. Compare that to break rates after various events:

  • after a hold of serve: 22.6%
  • after a break of serve: 19.3%
  • after a hold including a missed break point chance: 21.2%
  • after a hold including three missed bp chances: 20.9%
  • after a hold including four or more missed bp chances: 19.4%

These are aggregate numbers, not adjusted for specific players, so they don’t tell the whole story. But they already suggest that the conventional wisdom is overstating its case. After failing to convert a break point, players hold serve almost exactly as often as they do in general. In fact, they get broken a bit less frequently in those situations (21.2%) than they do following a more conventional hold without any break points (22.6%).

Let’s see what happens when we adjust these numbers on a match-by-match basis.   For example, if Tomas Berdych gets broken by Novak Djokovic 6 times in 15 tries, we can use that 40% break rate as a benchmark by which to measure more specific scenarios. If Berdych fails to convert break point twice, we would “expect” that he gets broken in 40% of his following service games, or 0.8 times in the two games. Of course, no one can get broken a fractional amount of a game, but by summing those “expected” breaks, we can see what the aggregate numbers look like with a much lesser chance of particular players or matchups biasing the numbers.

Once that cumbersome step is out of the way, we discover that–again, but more confidently–there is virtually no difference between average service games and service games that follow unconverted break points.

In my sample of 2013 ATP matches, there were 5,701 service games that followed missed break point opportunities. Players held 4,493 of those games (78.8%). That’s almost precisely the rate at which they held in other games. Had those specific players performed at their usual level within those matches, they would’ve held 4,488 times (78.7%).

We see the same findings when we focus on the most high-pressure games, ones with three or more break points. This sample contained 722 games in which the server held despite three break points. Servers held the following game 571 times. Had they performed at their usual, average-momentum rate, they would’ve held 570 times.  After holds with four or more break points (206 in all), servers held 166 times instead of an “expected” 162.

There’s no evidence here that these particular service games have different results than other service games do.

Envoi

Momentum, the basis for so many of the beliefs that make up tennis’s conventional wisdom, is surely a factor in the game, but my research has shown, over and over again, that it isn’t nearly as influential as fans and pundits tend to think.

Once we hear a claim like this one, we tend to notice when events confirm it, reinforcing our mostly-baseless belief. When we see something that doesn’t match the belief, we’re surprised, often leading to a discussion that takes for granted the truth of the original claim. Our brains are wired to understand and tell stories, not to recognize the difference between something that happens 77% of the time and 79% of the time.

It may turn out that some players are unusually likely or unlikely to get broken after failing to convert a break point. Or perhaps this particular sequence of events is more common at certain junctures in a match. But barring research that establishes that sort of thing, there is simply no evidence that momentum plays any role in the service game following unconverted break points.

Leave a comment

Filed under Hot Hand, Research

There Is No Analytics Revolution In Tennis

I’m sure you’ve heard about the trend. First, statistics overhauled baseball, and teams in every major sport now employ quants to search out that extra edge. Tennis has lagged behind the others, but with the help of big data, we’re on the cusp of a whole new era.

That’s the story, anyway. Yesterday brought us another example.

What happened in baseball is, quite simply, never going to happen in tennis.

To oversimplify a bit, the “Moneyball revolution” refers to front offices using analytics to identify underrated and underpriced players. To a lesser extent, it refers to deploying those players in a smarter way–say, rearranging the batting order or attempting fewer stolen bases.

In tennis, there are no front offices. Players aren’t paid salaries by teams. And there are no managers to decide how best to use their players.

In short: There are no organizations with both the incentives and the resources to analyze data.

Of course, when people get breathless about all the raw data floating around in tennis, that isn’t what they’re talking about. (No one really thinks Hawkeye data is going to revolutionize, say, the World Team Tennis draft.) Instead, they are implying that the data can be analyzed in such a way to be actionable for players.

That’s an admirable objective. In theory, Kevin Anderson’s coach could look at all the data from all the matches between Anderson and Tomas Berdych and identify which tactics worked, which didn’t, and make recommendations accordingly. Of course, Kevin’s coach is already watching all those matches, taking notes, reviewing video, and presumably making recommendations, so if big data is going to change the game, it needs to somehow offer coaches demonstrably better insights.

With all the cameras pointed at tennis’s show courts, that’s certainly possible. The closest analogue in baseball is the pitch f/x system, which tracks the speed, location, and movement of every pitch. Some pitchers have been able to use pitch f/x data to analyze and improve upon their own performance. The same could eventually happen in tennis. But there are systemic reasons why it hasn’t yet, and those root causes are unlikely to disappear anytime soon.

What needs to change

Hawkeye cameras are aimed at a lot of courts and have the capability of collecting an enormous amount of data. That’s how broadcasts are able to bring you stats like average net clearance and meters run. Those cameras also help generate graphics like those showing where all of a player’s serves landed.

After a match is over, with no calls left to be overturned and no broadcast needs likely to arise, what happens to the data? For all practical purposes, it gets stashed in the attic and forgotten. (Here’s a more thorough explanation.) Contrast that to Major League Baseball, which makes all pitch f/x data available immediately–to the public, for free–and has archived it indefinitely.

If tennis is to see any meaningful analytical breakthroughs, Hawkeye data needs to be aggregated in a single database. Results from one match are sometimes interesting (hey look, Andy’s net clearance is 15% greater than Roger’s!), but if we’re always looking at one match, or one tournament, at a time, we’ll never learn which of these Hawkeye-derived statistics matter, or how much.

IBM, the collector of much of this information, may already maintain some version of that database. But the results are jaw-droppingly uninspiring. On broadcasts, we get the same old stats and graphics. When IBM has ventured into predicting match outcomes, their “millions of data points” are outperformed by my much simpler model.

IBM is the one organization in the sport with the resources to do the kind of analysis that will transform tennis. But they have no incentive to do so. To IBM (and now SAP, in the women’s game), tennis is a public relations opportunity, one that allows them to brand tournament websites and on-screen graphics with their logo. (Not to mention those suspiciously pro-IBM trend pieces linked to above.)

Players might eventually benefit from data-based insights, but only a tiny fraction of them could afford to hire even a single analyst. (Hi Simona! Text me anytime!)

Once again, we have to turn to baseball for a precedent. Even in that immense sport, with its billion-dollar franchises, it was amateurs–outsiders–who did the work that brought about the analytics revolution. Even now, with teams aggressively hiring promising talent from outside the game, many of the most profitable insights still come from independent researchers. If MLB made its data as inaccessible as tennis does, that trend would’ve ground to a halt long ago.

Nice as it is to dream about a better world of tennis data, we’re unlikely to see it anytime soon. Tennis doesn’t have a commissioner, so there’s no one to appoint a data czar, let alone anyone who could convince the alphabet soup of the ATP, WTA, ITF, IBM, SAP, and Hawkeye to aggregate their data in any meaningful way.

Until that happens, and until the data is publicly available, there will be no analytics revolution in tennis. We’ll continue to get what we have now: the occasional Hawkeye stat, free of context, illustrating the same sort of analysis we’ve been hearing for decades.

Leave a comment

Filed under Hawkeye, Rants

New “Head-to-Head View” at TennisAbstract.com

I’m really excited to announce some new features on Tennis Abstract — I hope you like them as much as I do.

Let’s start with the Head-to-Head view, which you can access by clicking near the upper left corner of any ATP player’s page. Marin Cilic, for example:

h2h1

Click on the “Head-to-Head beta” link, and you get this:

h2h2

 

As you can tell, there is a huge amount of data available here. What you’re looking at is a statistical summary of every single one of this player’s H2H records at the professional level. (As you’ll see on the page itself, the screenshot doesn’t show it all–there are ten more statistical categories for each H2H, including things like service points won and break point conversion rate.)

By default, the H2H table is sorted by number of matches. But like the standard “Match Results” table on Tennis Abstract, you can sort by most other columns simply by clicking on the column header, like TB (“tiebreaks”) here:

h2h3

 

Thanks to the power of Tennis Abstract’s filters, there’s a lot more you can do with this view. As you’ve seen, the H2H view defaults to a player’s career results. Let’s say, though, that you want to see Cilic’s H2H records only on clay. Use the filters in the left-hand column as you normally would, and select clay courts:

h2h4

 

As usual, you can apply as many filters as you want, so you could look at a player’s head-to-heads in a single seasonat the Challenger level, in deciding sets, or even show a summary of a player’s head-to-heads against all opponents from a single country.

Specifically for head-to-head purposes, I added a new filter: “Minimum matches.” This way, if you’re comparing a player’s H2H stats against several opponents, you can filter out matchups that haven’t occurred very much. Here’s an example, which shows Cilic’s highest H2H winning percentages, minimum five matches:

h2h5

 

I also added another new filter that will come in handy on the standard results tab as well: “Vs Current Rank.” (The separate “Vs Rank” filter, which has always been on the page, filters by opponent rank at the time of the match; the new filter uses the most current rankings.) For instance, here are Cilic’s H2Hs against the current top 10:

h2h6

 

Another neat aspect of the “Vs Curr Rank” filter is the ability to select “Active” or “Inactive” players. (These are determined solely by whether a player is in this week’s ATP rankings.) You could display all H2Hs against active players, or in the traditional Match Results view, quickly identify matches against retired/inactive players.

All of this is available for every ATP player, past and present.

In the process of working on the new features, I made a few other improvements that I hope powerusers will recognize and enjoy. For many statistical columns in both the match results and head-to-head views, I customized the sorting behavior, so matches without stats would automatically go to the bottom. I also made a bit of progress toward making the browser back button work as expected. There’s still some work to do there, but it’s much better than it was a few days ago.

Enjoy!

2 Comments

Filed under Head-to-Heads, Tennis Abstract

Tommy Robredo and the Men Who Beat Number One

Today in Cincinnati, Tommy Robredo took out the top-ranked player in the world, Novak Djokovic, in straight sets. Robredo has had a fine career, peaking in the top five and beating many of the world’s best, but it was only the second time in eight tries that he managed to defeat a reigning world number one.

The first time Robredo accomplished the feat was more than eleven years ago, at the 2003 French Open, where he upset Lleyton Hewitt in five sets. Since then, his only chances to beat number ones have come against Roger Federer, and he lost in all five tries. When the Spaniard finally scored a win over Fed in New York last year, Roger had long since fallen out of the top spot.

With today’s win, Robredo becomes the 66th man since the advent of the ATP ranking system who has beaten at least two different number ones. Only 13 active players have managed the feat.

23 players in ATP history have beaten at least three players who were ranked number one at the time. Coincidentally, the man who defeated the most number ones was present at today’s match. Boris Becker upset six different players in the top spot, compiling a very impressive 19-16 career record against players ranked number one.

Next on the list is Michael Chang, who beat five different number ones (though he only won 7 of 27 matches against them), while Federer, Andre Agassi, Greg Rusedski, and Dominik Hrbaty beat four. Four more active players have defeated three number ones: Andy Murray , David Ferrer, Juan Martin del Potro, and Jo-Wilfried Tsonga. Each of those four recorded their upsets against Rafael Nadal, Djokovic, and Federer, except for Ferrer, who has never beaten Fed but did defeat Agassi when the American held the top spot.

Becker’s 19 wins against top-ranked players is also a record, though he has to share this one with Nadal, who is 19-10 against number ones. Boris and Rafa tower far above the next players on the list, Djokovic and Bjorn Borg, who each have 11 career wins against number ones. Next on the list among active players are Murray (9), del Potro (6), Ferrer (5), and Federer (5).

Robredo doesn’t quite rank among this elite company, but his second top-ranked scalp adds a little more luster to an already lengthy list of career highlights.

3 Comments

Filed under Rankings, Tommy Robredo, Trivia

A Quick Look at the Odds of Three-Setters

In the comments to my match-fixing post earlier this week, Elihu Feustel commented:

There are almost no situations where a best of 3 match is a favorite to go to three sets. If the market priced a player as greater than 50% to win in exactly 3 sets, that alone is compelling evidence of match fixing.

In Monday’s questionable Challenger match, not only did the betting markets believe that the match was likely to go three sets, it picked a specific winner in three sets.

It takes only a bit of arithmetic to see why Elihu’s point is correct. Let’s say two players, A and B, are exactly evenly matched. Each one has a 50% chance of winning the match and a 50% chance of winning each set. Thus, the odds that A wins the match in straight sets are 25% (50% for the first set multiplied by 50% for the second). The odds that B wins in straights are the same. The probability that the match finishes in straight sets, then, is 50% (25% for an A win + 25% for B), meaning that the odds of a three-set match are also 50%.

As soon as one player has an edge, the probability of a three-set match goes down. Consider the scenario in which A has a 70% chance of winning each set. The odds that player A wins in straight sets are 49% (70% times 70%) and the odds that B wins in straight sets are 9% (30% times 30%). Thus, there’s a 58% chance of a straight-set victory, leaving a 42% chance of a three-setter.

This simple approach makes one major assumption: each player’s chances don’t change from one set to another. That probably isn’t true. It seems most likely to me that the player who wins the first set gets stronger relative to his opponent, perhaps because he gains confidence, or because his opponent loses confidence, or because he figures he doesn’t have much chance of winning. (I’m sure this isn’t true in all matches, but I suspect it applies often enough.)

If it’s true that the probability of the second set is dependent–even slightly–on the outcome of the first set, the likelihood of a three-setter decreases even further.

Probability in practice

As expected, far fewer than half of tour-level matches go three sets. (I’m considering only best-of-three matches.) So far this year, 36% of ATP best-of-threes have gone the distance, while only 32% of Challenger-level matches have done so.

In fact, men’s tennis has even fewer three-set matches than expected. For every match, I used a simple rankings-based model to estimate each player’s chances of winning a set and, as shown above, the odds that the match would go three sets. For 2014 tour-level matches, the model–which assumes that set probabilities are independent–predicts that 44% of matches would go three sets. That’s over 20% more third sets that we see in practice.

There are two factors that could account for the difference between theory and practice. I think both play a part:

  1. Sets aren’t independent. If winning the first set makes a player more likely to win the second, there would be fewer three-setters than predicted.
  2. There’s usually a bigger gap between players than aggregate numbers suggest. On paper, one player might have a 60% chance of winning the match, but on the day, one player might be tired, under the weather, unhappy with his racquets, uncomfortable with the court … or playing his best tennis, in a honeymoon period with a new coach, enjoying friendly calls from home line judges. The list of possible factors is endless. The point is that for any matchup, there are plenty of effectively random, impossible to predict variables that affect each player’s performance. I suspect that those variables are more likely to expand the gap between players–and thus lower the likelihood of a three-setter–than shrink it.

A note on outliers

Despite the odds against three-setters, some players are more likely go three than others. Among the 227 players who have contested 100 or more ATP best-of-threes since 1998, 20 have gone the distance in 40% of more of their matches. John Isner, tennis’s most reliable outlier, tops the list at 47.4%.

Big servers don’t dominate the list, but Isner’s presence at the top isn’t entirely by chance. After John, Richard Fromberg is a close second at 46.7%, while Goran Ivanisevic is not far behind at 43.0%. Mark Philippoussis and Sam Querrey also show up in the top ten.

It’s no surprise to see these names come up. One-dimensional servers are more likely to play tiebreaks, and tiebreaks are as close to random as a set of tennis can get. Someone who plays tiebreaks as often as Isner does will find himself losing first sets to inferior opponents and winning first sets against players who should beat him.

That randomness not only makes it more likely the match will go three sets, it’s also something the players are aware of. If Isner drops a first-set tiebreak, he realizes that he still has a solid chance to win the match–losing the breaker doesn’t mean he’s getting outplayed. If there is a mental component that partially explains the likelihood of the first-set winner taking the second set, it doesn’t apply to players like him.

Still, even Big John finishes sets in straights more than half the time. Every other tour regular does so as well, so it would take a very unusual set of circumstances for a betting market–or common sense–to favor a three-set outcome.

3 Comments

Filed under Research

The Untapped Potential of Umpire Scorecard Data

There’s a lot more that can be done with tennis data. Everyone knows this. Even the ATP and WTA tours–along with their rather prominent partners–know this.

Both tours are sitting on a mountain of information that they’ve barely exploited: umpire scorecard data. It’s not cutting edge–there are no cameras, no courtside loggers counting unforced errors and winners. It’s just a log of every point, along with first or second serve, aces, and double faults. Despite those limits, there are many untapped advantages.

First: There’s an umpire scorecard for every match. Not every match on a TV court, not every match on a Hawkeye court, not every main draw match.  Every match. If a ATP, WTA, or ITF umpire is officiating the match, to the best of my knowledge, there is a scorecard–when you see a chair umpire tap on a screen, this is what they’re recording. That means data on thousands of matches and players every year, from Novak Djokovic to Djordje Djokovic.

It’s tough to overstate how valuable that is. The main drawback of most tennis stats is context. For instance, when Hawkeye puts a graphic on your TV screen, it’s often based on data from a single match or the present tournament. IBM’s much-publicized analytics are based on Grand Slam matches only. Umpire scorecards have no such problem.

Second, there’s a ton of information lurking in this low-tech tracking system. The basics of first and second serves, aces, and double faults may not sound like much, but as we’ll see below, they open the door to a huge array of stats. ATP and WTA “Match Stats” are compiled from these scorecards, but they only scratch the surface.

How to do more with scorecards

In a minute, I’ll make specific suggestions for additional totals and rates that the tours could compile from the data they already have. Before that, let me explain why simply expanding the contents of “Match Stats” should be Plan B.

More and more journalism is data-based, and more and more avid fans are, to some extent or other, analyzing tennis for publication. In other words, there is a rapidly growing base of analysts who don’t need data pre-packaged for them. Every match is different, and the numbers needed to illustrate any match report are different as well. For broader analysis, like comparing players over the course of a season, the need for customized data is more important still.

So: Release the point-by-point data from the scorecards.

Another benefit of the simplicity of umpire scorecard data is that more analysts can easily manage it. No organization could foresee everything that might be interesting about a match, so why even try? Not every journalist will want to dig into a point-by-point spreadsheet to see how often Julien Benneteau missed his first serve of a game, or how Rafael Nadal responded every time he fell behind 0-30. But some will do just that. When they do, their work benefits, their readers have more ways to engage with the next match they watch, and the sport ultimately wins.

A not-so-brief wish list

I have a sneaking feeling that no one’s going to release point-by-point data for every ATP or WTA match. I hope that’s not the case, but if it is, the tours should still consider vastly expanding the stats they compile for each match–including past matches for as far back as their databases go.

  1. Deuce/ad comparisons. Some players serve much more effectively in one than the other. For all deuce-court service points, I would like: (a) total points, (b) aces, (c) double faults, and (d) first serves in. Same for ad-court service points.
  2. Break point stats. Same as the above: For both servers facing break point: (a) aces, (b) double faults, and (c) first serves in.
  3. Break point games. In how many games did each player earn a break point?
  4. Stats for other important point scores. Break points are key, but other scenarios are important as well. If I have to pick only a few, let’s start with 0-30, 15-30, deuce, and ad-in (including 40-30). For all service points at each of those scores, I’d like (a) total points, (b) aces, (c) double faults, and (d) first serves in.
  5. Set points and match points: Same as above. Fans love match point stats.
  6. The game sequence–at what points did breaks of serve occur? This would allow us to answer many oft-posed questions: Do players hold serve more early in sets? Do breaks of serve more frequently follow breaks than holds? (And if so, how much more often?) Are players more like to drop serve immediately after winning a tight set?
  7. Set-by-set breakdowns of all stats that are currently kept, plus all of the above. The live scoring app separates stats by set, but there is no official archive with set-by-set breakdowns. This is particularly key for journalists attempting to tell the story of a match, when a small change in approach can turn the tide.
  8. Tiebreak breakdowns. Tiebreaks–especially long ones–have a life of their own, and analysts should be able to see all of the same stats for each tiebreak as for each set as a whole. For example, it would be interesting to see if a player’s ace or double fault rates (or even his or her first-serve percentage) changed between the first twelve games of a set and the breaker.
  9. A list of the score when each double fault occurred. (Aces would be nice, too.) Especially in men’s tennis, DFs are quite rare, and they often loom large in match narratives.
  10. Longest streaks for each player: consecutive aces, consecutive double faults, consecutive points won on serve, consecutive points won overall, and the score at the beginning and end of each of those streaks.
  11. For doubles matches, a separation of all of the above service stats by server. For the Samuel Groth/Leander Paes partnership, aggregate serve stats f (as they are presented now) aren’t going to tell you anything useful about either player’s performance at the line.

To reiterate, all of this stuff is in the scorecards. Most of the above are no more difficult to compile than the Match Stats that the tours already publish.

If the tours added everything on my list, that would be one big step out of the dark ages for tennis. Certainly, tennis writers would be able to file more intelligent stories and fans would have a much better way to experience the performances of their favorite players.

If the tours published current and archived raw point-by-point data, tennis would go one better: it would become an example for many other individual sports to follow. We would see an boom in fan engagement as every follower of the sport would have the opportunity to learn much more about tennis and relive matches–whether last week or late last century–in detail.

We’re not talking about a multi-million dollar infrastructure investment. To achieve all this, the tours need only do a little bit more with what they already have.

 

5 Comments

Filed under Harebrained schemes