Toward Atomic Statistics

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.


Why the ATP is More Popular Than the WTA

Last night, Fernando Gonzalez played the last match of his career.  Gonzo is a fan favorite, with a historically great forehand that propelled him to finals at the 2007 Australian Open and the 2008 Olympics.  He won tour-level titles over a ten-year span.

Next month, the man in the limelight will be Ivan Ljubicic.  He doesn’t exactly qualify as a “fan favorite,” but tennis aficionados have grown to appreciate his deadly service accuracy, beautiful one-handed backhand, and intelligence on and off the court.

Men’s tennis is in the age of the veteran.  Even though we’re talking about 20-somethings and a few 30-year-olds, virtually every player at the top of the game five years ago is still in the mix today.  With the exception of Andre Agassi, every top-ranked player from the ten years is still active.

And fans love veterans.  The current state of the ATP is tailor-made for fan interest.

There are two things going on here.  One is simply a matter of familiarity.  If you lost interest in tennis for the last five years, you might be surprised to find Mario Ancic out of the game, Arnaud Clement still in it, and Andy Roddick well out of the top ten, but the cast of characters would be immediately recognizable.  It’s like a television soap opera–you only have to watch an episode or two before you’re back in the swing of things.

The other factor is what we might call the “Agassi effect.”  In the late 80′s and early 90′s, Agassi was the stereotypical brash youngster, offending the effete and challenging Wimbledon’s all-white rule.  A decade and a half later, he was perhaps the most popular player in the game, the very picture of sportsmanship and class.  Few players undergo such a radical transformation in the eyes of the public, but the general direction is very common.

Only a few years ago, Rafael Nadal was a divisive figure, mocked by many for his sleeveless tops and bulging biceps.  More recently, Novak Djokovic was widely disliked.  I’m sure detractors are still out there, but they are much quieter.  Think back to the early days of just about any veteran’s career–Andy Roddick was exciting to American fans, objectionable to most everybody else.  Lleyton Hewitt was another Agassi, and he didn’t grow out of it as quickly.

Yet for all that, can you think of a player who has gotten less popular as he ages?  Perhaps this phenomenon is unique to individual sports.  In team sports, some figures seem to attract fans, but others lose them, as they sign mega-contracts with new teams, becoming viewed as sellouts.  (Or worse, if they take the mega-contract, then never perform as well again.)

The phenomenon of gaining fans with age isn’t limited to men–veteran WTA players experience it, as well.  It seems like Kim Clijsters was better loved upon her return to the game than she was the day she retired.  Even the Williams sisters seem to have fewer detractors these days than they did several years ago.  But while the WTA has its share of vets, it has far fewer players who have persisted at the top of the game.

Only two players from the 2007 year-end top ten (Maria Sharapova and Marion Bartoli) are in the top ten of today’s WTA rankings.  Most of the WTA’s vets have hung around on the fringes of the game’s best for years.  Li Na, Sam Stosur, and Vera Zvonareva have all given us their share of highlights, but to extend my soap opera analogy, they are peripheral characters who star in a few episodes, only to disappear into the background again.  Someone who hasn’t watched women’s tennis for a few years would have a hard time catching up.

Of course, none of this is to say that men’s tennis is inherently better.  At various times in the past, the WTA has had a stronger stable of perennial stars, and when that is the case, it rakes in the ratings.  Victoria Azarenka may not be as obviously bankable as a charmer like Caroline Wozniacki or a cover girl like Maria Sharapova, but by winning consistently, she gives the women’s game a head start toward developing what the ATP possesses right now.  If a few other players rise to the challenge for more than a couple months at a time, we might do more than just talk about Djokovic, Federer, and Nadal all the time.

1 Comment

What Does the “Hot Hand” Mean in Tennis?

In sports analytics, the topic of streakiness–the “hot hand“–is a popular one. Nearly everyone believes it exists, that players (or even teams) can go on a hot or cold streak, during which they temporarily play above or below their true level.

To a certain extent, streakiness is inevitable–if you flip a coin 100 times, you’ll see segments of 5 or 10 flips in which most of the flips are heads. That’s not because the coin suddenly got “better,” it’s a natural occurence over a long enough time span. So if you watch an entire tennis match, there are bound to be games where one player seems to be performing better than usual, perhaps stringing together several aces or exceptional winners.

The question, then, is whether a player is more streaky than would occur purely at random. To take just one example, let’s say a player hits aces on 10% of service points. If he did occasionally serve better than usual, we would observe that after he hits one ace, he is more likely (say, 15% or 20%) to hit another ace. A missed first or second serve might make it more likely than he misses his next try.

My last couple of topics–differences in the deuce/ad court, and the “reverse hot hand” at 30-40–have hinted that tennis may be structured in a way that prevents players from getting hot.

One of the most popular subjects for hot hand research is basketball free throw shooting. Researchers like it because it’s as close as basketball players get to a laboratory: every shot is from the same distance, there’s no defensive quality to consider, and even better, players usually get two tries, one right after the other. There’s nothing like it in tennis.

The one thing that seems a bit akin to free throw shooting is serving, especially for more dominant players. John Isner, Roger Federer, and Milos Raonic seem to go on serving streaks; certainly they can play game after game and control play with unreturnable serving. But when we look closer, their experience is much more nuanced. As we’ve seen, players generally are better in the deuce or ad court. It would be as if basketball player shot one free throw, then took two steps to the left and one step forward before attempting his next shot.

And, of course, there’s another player on the court. If Federer uses a relatively slow serve out wide in the deuce court for a service winner at 15-15, he is much less likely to use the same tactic at 30-30 or 40-15. Even if he was capable of hitting 50 perfect serves of that nature, he would never do so in a match. If it has any relevance for professional tennis, the hot hand must refer to something broader than a single skill.

On a more general level, the rules of tennis involve alternation more than more sports. Sure, most sports give the ball to the other team after a goal, but the length of possession–or in baseball, the length of an inning–can vary widely. In tennis, you can only add one game to your tally before handing the ball to your opponent. And even within that game, you are constantly moving from your stronger court to your weaker court; your opponent might be doing the same.

My question to you is this: If there is a hot hand in tennis, where would you expect to find it? Consecutive aces? Aces specifically in the deuce court? Service winners? Short service points? Points won? Return points won? Games won? First serves in? Point-ending winners? Avoidance of unforced errors? It’s possible that any or all of these things could occur in bunches, but which of them would indicate what we think of as a tennis player on a hot streak?

1 Comment

The Problem With “Unforced Errors”

In any sport, there are a handful of stats that are frequently cited, but are ultimately of limited use.  Often, these statistics tell you something, but are misunderstood to imply something more.  Simple examples are many “counting” stats — points scored in basketball, touchdowns thrown in football, RBI in baseball.  In all of those cases, they indicate something good, but don’t give you context — lots of field goal attempts, a great offensive line, or good hitters on base in front of you, to take those three cases.

The stat in tennis that aggravates me most is the unforced error.  Not only does it ignore some important context (as in the other-sport stats I just mentioned), but it relies on the judgment of a scorer.


The second problem is the more problematic one.  How much does a number mean if two people watching the same match wouldn’t come up with the same result?  This was a hot-button issue during Wimbledon, when the scorers were assigning an unusually small number of UEs, especially on serve returns.

If you’re watching the match, you might not notice.  If the end-of-set stats show that Nadal had 8 UEs and Federer had 17, that does tell you something … Federer was making more obvious mistakes.  But if you want to compare that to a Nadal/Federer match three weeks ago, or last year, those numbers are all but useless.

I suspect that, at events like Wimbledon, someone from the ITF, or maybe IBM, is giving standardized instructions to scorers with general rules for categorizing errors.  That would be a good start, especially if it were implemented across all tournaments at all professional levels.

…but it doesn’t matter

I suspect that no matter how consistent scorers are, the distinction between “unforced” and “forced” errors will always be arbitrary.  Consider the case of service returns.  There are occasional points, especially on second serve returns, where the returning player misses an easy shot.  But more frequently, the returning player is immediately on defense.  When is an error “unforced” on the return of a 130 mile-per-hour shot?

Ultimately, we will probably have computerized systems that classify errors for us.  If you have all the necessary data and crunch the numbers, a 125-mph serve down the T in the ad court might be returned 60% of the time, meaning there is a 40% chance of an error or non-return.  With those numbers on every serve (and every other shot, eventually), we could set the line for an “unforced” error on a shot that the average top-100 player would make, say, 75% of the time.  Or we could have different classifications: “unforced errors,” “disastrous errors,” “mildly forced errors,” and so on, indicating different percentage ranges.

The problem we have now is that professionals are so good (and their equipment is so advanced), that almost every shot can be offensive, meaning that players are almost always–to some extent–on defense.  If you’re rallying with Nadal, you might hit some winners, but you’re always fighting the spin.  If you’re rallying with Federer, the spin isn’t so bad, but you’re always trying to keep the ball away from his forehand.  (If you’re rallying with Djokovic, you’re wishing you had hit a better serve.)  That perpetual semi-defensive posture means that nearly every error is, to some extent, forced.  And because players are so good, we expect them to return every reachable ball, suggesting that nearly every error is, to some extent, unforced.


The wisdom of baseball analysts

A very similar problem arises in baseball.  If a fielder makes a misplay (according to the official scorer), he is charged with an “error.”  Paradoxically, some of the best fielders end up with the highest error totals.  If, say, a shortstop has great range, he’ll reach a lot of groundballs, and have more chances to make bad throws, thus racking up the errors.

For decades, fans considered errors to be the standard measure of defensive prowess–a stat called “fielding percentage” measures the ratio of plays-successfully-made to chances.   (In other words, 1 minus “error rate.”)  But because of the paradox mentioned above, the highest fielding percentages do not necessarily belong to the best fielders.

The solution: Ignore errors, look only at plays made.  (This is an oversimplification, but not by much.)  If Shortstop A makes more plays than Shortstop B, it doesn’t matter whether A makes more errors.  The guy you want on your team is the one who makes more plays.

Essentially, baseball errors correspond to tennis unforced errors, and baseball plays-not-made (shortstop dives for the ball and can’t reach it) correspond to tennis forced errors.  The stat that ends up mattering to baseball analysts–”plays made”–corresponds to “shots successfully returned.”  The analogy is imperfect, but it illustrates the problem with separating one type of non-play from another.


If we don’t distinguish between different types of errors, we’re left with “shots made” and “shots not made,” or–even less satisfactorily–”points won” and “points lost.”  Not exactly a step in the right direction, since we’re already counting points!

Still, I suspect it’s better to have no stat than to have a misleading stat.  Rally counts are a positive step, since we can look at outcomes for different types of points.  If you win a lot of 10-or-more-stroke rallies, that identifies you as a certain type of player (or playing a certain kind of match).  It doesn’t matter whether you lose that sort of point on an unforced error or your opponent’s winner–both outcomes might stem from the same tactical mistake three or four strokes sooner.

Either that, or we can wait until we can calculate real-time win probability and start categorizing errors with extreme precision.  “Unforced errors” aren’t going away any time soon, but as fans, we can be smarter about how much attention we grant to individual numbers.


