Category Archives: Research

The Limited Value of Head-to-Head Records

Yesterday at the Australian Open, Ana Ivanovic defeated Serena Williams, despite having failed to take a set in four previous meetings. Later in the day, Tomas Berdych beat Kevin Anderson for the tenth straight time.

Commentators and bettors love head-to-head records. You’ll often hear people say, “tennis is a game of matchups,” which, I suppose, is hardly disprovable.

But how much do head-to-head records really mean?  If Player A has a better record than Player B but Player B has won the majority of their career meetings, who do you pick? To what extent does head-to-head record trump everything (or anything) else?

It’s important to remember that, most of the time, head-to-head records don’t clash with any other measurement of relative skill. On the ATP tour, head-to-head record agrees with relative ranking 69% of the time–that is, the player who is leading the H2H is also the one with the better record. When a pair of players have faced each other five or more times, H2H agrees with relative ranking 75% of the time.

Usually, then, the head-to-head record is right. It’s less clear whether it adds anything to our understanding. Sure, Rafael Nadal owns Stanislas Wawrinka, but would we expect anything much different from the matchup of a dominant number one and a steady-but-unspectacular number eight?

H2H against the rankings

If head-to-head records have much value, we’d expect them–at least for some subset of matches–to outperform the ATP rankings. That’s a pretty low bar–the official rankings are riddled with limitations that keep them from being very predictive.

To see if H2Hs met that standard, I looked at ATP tour-level matches since 1996. For each match, I recorded whether the winner was ranked higher than his opponent and what his head-to-head record was against that opponent. (I didn’t consider matches outside of the ATP tour in calculating head-to-heads.)

Thus, for each head-to-head record (for instance, five wins in eight career meetings), we can determine how many the H2H-favored player won, how many the higher-ranked player won, and so on.

For instance, I found 1,040 matches in which one of the players had beaten his opponent in exactly four of their previous five meetings.  65.0% of those matches went the way of the player favored by the head-to-head record, while 68.8% went to the higher-ranked player. (54.5% of the matches fell in both categories.)

Things get more interesting in the 258 matches in which the two metrics did not agree.  When the player with the 4-1 record was lower in the rankings, he won only 109 (42.2%) of those matchups. In other words, at least in this group of matches, you’d be better off going with ATP rankings than with head-to-head results.

Broader view, similar conclusions

For almost every head-to-head record, the findings are the same. There were 26 head-to-head records–everything from 1-0 to 7-3–for which we have at least 100 matches worth of results, and in 20 of them, the player with the higher ranking did better than the player with the better head-to-head.  In 19 of the 26 groups, when the ranking disagreed with the head-to-head, ranking was a more accurate predictor of the outcome.

If we tally the results for head-to-heads with at least five meetings, we get an overall picture of how these two approaches perform. 68.5% of the time, the player with the higher ranking wins, while 66.0% of the time, the match goes to the man who leads in the head-to-head. When the head-to-head and the relative ranking don’t match, ranking proves to be the better indicator 56.5% of the time.

The most extreme head-to-heads–that is, undefeated pairings such as 7-0, 8-0, and so on, are the only groups in which H2H consistently tells us more than ATP ranking does.  80% of the time, these matches go to the higher-ranked player, while 81.9% of the time, the undefeated man prevails. In the 78 matches for which H2H and ranking don’t agree, H2H is a better predictor exactly two-thirds of the time.

Explanations against intuition

When you weigh a head-to-head record more heavily than a pair of ATP rankings, you’re relying on a very small sample instead of a very big one. Yes, that small sample may be much better targeted, but it is also very small.

Not only is the sample small, often it is not as applicable as you might think. When Roger Federer defeated Lleyton Hewitt in the fourth round of the 2004 Australian Open, he had beaten the Aussie only twice in nine career meetings. Yet at that point in their careers, the 22-year-old, #2-ranked Fed was clearly in the ascendancy while Hewitt was having difficulty keeping up. Even though most of their prior meetings had been on the same surface and Hewitt had won the three most recent encounters, that small subset of Roger’s performances did not account for his steady improvement.

The most recent Fed-Hewitt meeting is another good illustration. Entering the Brisbane final, Roger had won 15 of their previous 16 matches, but while Hewitt has maintained a middle-of-the-pack level for the last several years, Federer has declined. Despite having played 26 times in their careers before the Brisbane final, none of those contests had come in the last two years.

Whether it’s surface, recency, injury, weather conditions, or any one of dozens of other factors, head-to-heads are riddled with external factors. That’s the problem with any small sample size–the noise is much more likely to overwhelm the signal. If noise can win out in the extensive Fed-Hewitt head-to-head, most one-on-one records don’t stand a chance.

Any set of rankings, whether the ATP’s points system or my somewhat more sophisticated (and more predictive) jrank algorithm, takes into account every match both players have been involved in for a fairly long stretch of time. In most cases, having all that perspective on both players’ current levels is much more valuable than a noise-ridden handful of matches. If head-to-heads can’t beat ATP rankings, they would look even worse against a better algorithm.

Some players surely do have an edge on particular opponents or types of opponents, whether it’s Andy Murray with lefties or David Ferrer with Nicolas Almagro. But most of the time, those edges are reflected in the rankings–even if the rankings don’t explicitly set out to incorporate such things.

Next time Kevin Anderson draws Berdych, he should take heart. His odds of beating the Czech next time aren’t that much different from any other man ranked around #20 against someone in the bottom half of the top ten. Even accounting for the slight effect I’ve observed in undefeated head-to-heads, a lopsided one-on-one record isn’t fate.

2 Comments

Filed under Forecasting, Head-to-Heads, Research

Novak Djokovic and a First-Serve Key to the Match

Landing lots of first serves is a good thing, right? Actually, how much it matters–even whether it matters–depends on who you’re talking about.

When I criticized IBM’s Keys To the Match after last year’s US Open, I identified first-serve percentage as one of three “generic keys” (along with first-serve points won and second-serve points won) that, when combined, did a better job of predicting the outcome of matches than IBM’s allegedly more sophisticated markers.  First-serve percentage is the weakest of the three generic keys–after all, the other two count points won which, short of counting sets, is as relevant as you can get.

First-serve percentage is a particularly appealing key because it is entirely dependent on one player. While a server may change his strategy based on the returning skills of his opponent, the returner has nothing to do with whether or not first serves go in the box.  Unlike the other two generic targets and the vast majority of IBM’s keys, a first-serve percentage goal is truly actionable: it is entirely within one player’s control to achieve.

In general, first-serve percentage correlates very strongly with winning percentage.  On the ATP tour from 2010 to 2013, when a player made exactly half of his first serves, he won 42.8% of the time. At 60% first serves in, he won 47.0% of the time. At 70%, the winning percentage is 57.4%.

This graph shows the rates at which players win matches when their first-serve percentages are between 50% and 72%:

1svAs the first-serve percentage increases on the horizontal axis, winning percentage steadily rises as well.  With real-world tennis data, you’ll rarely see a relationship much clearer than this one.

Different players, different keys

When we use the same approach to look at specific players, the message starts to get muddled.  Here’s the same data for Novak Djokovic, 2009-13:

nd1sv

While we shouldn’t read too much into any particular jag in this graph, it’s clear that the overall trend is very different from the first graph. Calculate the correlation coefficient, and we find that Djokovic’s winning percentage has a negative relationship with his first-serve percentage. All else equal, he’s slightly more likely to win matches when he makes fewer first serves.

Djokovic isn’t alone in displaying this sort of negative relationship, either. The three tour regulars with even more extreme profiles over the last five years are Marin Cilic, Gilles Simon, and the always-unique John Isner.

Isner regularly posts first-serve percentages well above those of other players, including 39 career matches in which he topped 75%. That sort of number would be a near guarantee of victory for most players–for instance, Andy Murray is 32-3 in matches when he hits at least 70% of first serves in–but Isner has only won 62% of his 75%+ performances.  He is nearly as good (57%) when landing 65% or fewer of his first serves.

Djokovic, Isner, and this handful of others reveals a topic on which the tennis conventional wisdom can tie itself in knots. You need to make your first serve, but your first serve also needs to be a weapon, so you can’t take too much off of it.

The specific implied relationship–that every player has a “sweet spot” between giving up too much power and missing too many first serves–doesn’t show up in the numbers. But it does seem that different players face different risks.  The typical pro could stand to make more first serves. But a few guys find that their results improve when they make fewer–presumably because they’re take more risks in an attempt to hit better ones.

Demonstrating the key

Of the players who made the cut for this study–at least 10 matches each at 10 different first-serve-percentage levels in the last five years–9 of 21 display relationships between first-serve percentage and winning percentage at least as positive as Isner’s is negative.  The most traditional player in that regard is Philipp Kohlschreiber. His graph looks a bit like a horse:

pk1sv

More than any other player, Kohli’s results have a fairly clear-cut inflection point. While it’s obscured a bit by the noisy dip at 64%, the German wins far more matches when he reaches 65% than when he doesn’t.

Kohlschreiber is joined by a group almost as motley as the one that sits at the other extreme. The other players with the strongest positive relationships between first serve percentage and winning percentage are Richard Gasquet, Murray, Roger Federer, Jeremy Chardy, and Juan Martin del Potro.

These player-specific findings tell us that in some matchups, we’ll have to be a little more subtle in what we look for from each guy. When Murray plays Djokovic, we should keep an eye on the first-serve percentages of both competitors–the one to see that he’s making enough, and the other to check that he isn’t making too many.

2 Comments

Filed under Keys to the match, Novak Djokovic, Research

Analytics That Aren’t: Why I’m Not Excited about SAP in Tennis

It’s not analytics, it’s marketing.

The Grand Slams (with IBM) and now the WTA (with SAP) are claiming to deliver powerful analytics to tennis fans.  And it’s certainly true that IBM and SAP collect way more data than the tours would without them.  But what happens to that data?  What analytics do fans actually get?

Based on our experience after several years of IBM working with the Slams and Hawkeye operating at top tournaments, the answers aren’t very promising.  IBM tracks lots of interesting stats, makes some shiny graphs available during matches, and the end result of all this is … Keys to the Match?

Once matches are over and the performance of the Keys to the Match are (blessedly) forgotten, all that data goes into a black hole.

Here’s the message: IBM collects the data. IBM analyzes the data. IBM owns the data. IBM plasters their logo and their “Big Data” slogans all over anything that contains any part of the data. The tournaments and tours are complicit in this: IBM signs a big contract, makes their analytics part of their marketing, and the tournaments and tours consider it a big step forward for tennis analysis.

Sometimes, marketing-driven analytics can be fun.  It gives some fans what they want–counts of forehand winners, or average first-serve speeds. But let’s not fool ourselves. What IBM offers isn’t advancing our knowledge of tennis. In fact, it may be strengthening the same false beliefs that analytical work should be correcting.

SAP: Same Story (So Far)

Early evidence suggests that SAP, in its partnership with the WTA, will follow exactly the same model:

SAP will provide the media with insightful and easily consumable post-match notes which offer point-by-point analysis via a simple point tracker, highlight key events in the match, and compare previous head-to-head and 2013 season performance statistics.

“Easily consumable” is code for “we decide what the narratives are, and we come up with numbers to amplify those narratives.”

Narrative-driven analytics are just as bad–and perhaps more insidious–than marketing-driven analytics, which are simply useless.  The amount of raw data generated in a tennis match is enormous, which is why TV broadcasts give us the same small tidbits of Hawkeye data: distance run during a point, average rally hit point, and so on.  So, under the weight of all those possibilities, why not just find the numbers that support the prevailing narrative? The media will cite those numbers, the fans will feel edified, and SAP will get its name dropped all over the place.

What we’re missing here is context.  Take this SAP-generated stat from a writeup on the WTA site:

The first promising sign for Sharapova against Kanepi was her rally hit point. Sharapova made contact with the ball 76% of the time behind the baseline compared to 89% for her opponent. It doesn’t matter so much what the percentage is – only that it is better than the person standing on the other side of the net.

Is that actually true? I don’t think anyone has ever published any research on whether rally hit point correlates with winning, though it seems sensible enough. In any case, these numbers are crying out for more context.  Is 76% good for Maria? How about keeping her opponent behind the baseline 89% of the time? Is the gap between 76% and 89% particularly large on the WTA? Does Maria’s rally hit point in one match tell us anything about her likely rally hit point in her next match?  After all, the article purports to offer “keys to match” for Maria against her next opponent, Serena Williams.

Here’s another one:

There is a lot to be said for winning the first point of your own service game and that rung true for Sharapova in her quarterfinal. When she won the opening point in 11 of her service games she went on to win nine of those games.

Is there any evidence that winning your first point is more valuable than, say, winning your second point?  Does Sharapova typically have a tough time winning her opening service point?  Is Kanepi a notably difficult returner on the deuce side, or early in games?  “There is a lot to be said” means, roughly, that “we hear this claim a lot, and SAP generated this stat.”

In any type of analytical work, context is everything.  Narrative-driven analytics strip out all context.

The alternative

IBM, SAP, and Hawkeye are tracking a huge amount of tennis data.  For the most part, the raw data is inaccessible to researchers.  The outsiders who are most likely to provide the context that tennis stats so desperately need just don’t have the tools to evaluate these narrative-driven offerings.

Other sporting organizations–notably Major League Baseball–make huge amounts of raw data available.  All this data makes fans more engaged, not less. It’s simply another way for the tours to get fans excited about the game. Statheads–and the lovely people who read their blogs–buy tickets too.

So, SAP, how about it?  Make your branded graphics for TV broadcasts. Provide your easily consumable stats for the media.  But while you’re at it, make your raw data available for independent researchers. That’s something we should all be able to get excited about.

10 Comments

Filed under Data, Keys to the match, Research

Is There an Advantage To Serving First?

There’s no structural bias toward the player who serves first.  If tennis players were robots, it wouldn’t matter who toed the line before the other.

But the conventional wisdom persists.  Last year, I looked at the first-server advantage in very close matches, and found that depending on the scenario, the player who serves first in the final set may win more than 50% of matches–as high as 55%–but the evidence is cloudy.  And that’s based on serving first at the tail end of the match.  Winning the coin toss doesn’t guarantee you that position for the third or fifth set.

Logically, then, it’s hard to see how serving the first game of the match–and holding that possible slight advantage in the first set–would have much impact on the outcome of the match.  There’s simply too much time, and too many events, between the first game and the pressure-packed crucial moments that decide the match.

Yet, the evidence points to a substantial first-serve advantage.

In ATP main-draw matches this year, the player who served first won 52% of the time.  That edge is confirmed when we adjust for individual players.

39 players tallied at least 10 matches in which they served first and 10 in which they served second.  Of those 39, 21 were more successful when serving first, against 17 who won more often when serving second.  (Marcos Baghdatis didn’t show a preference.)  Weigh their results by their number of matches, and the average tour-level regular was 11% more likely to win when serving first than when serving second.  Converted to the same terms as the general finding, that’s 52.6% of matches in favor of the first server.

That’s not an airtight conclusion, but it is a suggestive one.  One possible problem would arise if lesser players–the guys who play some ATP matches against that top 39, but not enough to show up in the 39 themselves–are more likely to choose returning first.  Then, our top 39 would be winning 52.6% of matches against a lesser pool of opponents.

That doesn’t seem to be the case.  I looked at the next 60 or so players, ranked by how many ATP matches they’ve played this year.  That secondary group served first 51% of the time, indicating that the guys on the fringe of the tour don’t have any kind of consistent tendency when winning the coin toss.

For further confirmation, I ran the same algorithm for ATP Challenger matches this year.  That returned another decent-sized set of players with at least 10 matches serving first and 10 matches serving second–38, in this case.  The end result is almost identical.  The Challenger regulars were 9% more likely to win when serving first, which translates to the first server winning 52.2% of the time.

This is a particularly interesting finding, because in the aggregate, these 38 Challenger regulars prefer to serve second.  Of their 1110 matches so far this year, these guys served first only 503 times–about 45%.  Despite such a strong preference, the match results tell the story.  They are more likely to win when serving first.

When we turn our attention to the WTA tour, the results are so strong as to be head-scratching.  Applying the same test to 2013 WTA matches (though lowering the minimum number of matches to eight each, to ensure a similar number of players), the 35 most active players on the WTA tour are 28% more likely to win when serving first than when serving second.  In other words, when a top player is on the court, the first server wins about 56.3% of the time.  24 of the 35 players in this sample have better winning percentages when serving first than when serving second.

For something that cannot be attributed to a structural bias, a factor that can only be described as mental, I’m reluctant to put too much faith in these WTA results without further research.  However, the simple fact that ATP, Challenger, and WTA results agreed in direction is encouraging.  The first-server advantage may not be overwhelming, but it appears to be real.

12 Comments

Filed under Research

Simpler, Better Keys to the Match

If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM.  Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.

Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match.  The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.

Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.

It’s common to find that the losing player met more “keys” than the winner did, as was the case in the Djokovic-Wawrinka semifinal.  Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.

According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded.  But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter.  In their quest for the unusual, IBM’s team missed out on the predictive.

IBM vs generic

IBM offered keys for 86 of the 127 men’s matches at the US Open this year.  In 20 of those matches, the loser met as many or more of the keys as the winner did.  On average, the winner of each match met 1.13 more IBM keys than the loser did.

This is IBM’s best performance of the year so far.  At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the loser.  At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.

Without some kind of reference point, it’s tough to know how good or bad these numbers are.  As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”

It’s not that difficult.  In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.

I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”)  It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.

Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves.  As I’ll discuss in future posts, these stats–and others–show surprising discontinuities.  That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match.  These measurements are tailor-made for keys.

For a third key, I tried first-serve percentage.  It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them.  You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa.  And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing.  It’s not linear, but he more first serves you put in the box, the better your odds of winning.

Put it all together, and we have three generic keys:

  • Winning percentage on first-serve points better than 74%
  • Winning percentage on second-serve points better than 52%
  • First-serve percentage better than 62%

These numbers are based on the last few years of ATP results on every surface except for clay.  For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.

For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better.  Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser.  Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.

A smarter planet

It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual.  It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.

Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy.  In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site.  “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.

Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right.  And it’s not hard to spell “IBM.”

Better keys, more insight

Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one.  Commentators often talk about hitting certain targets, like 70% of first serves in.  Yet to my knowledge, no one had done the research.

With my generic keys as a first step, this path could get a lot more interesting.  While these single numbers are good guides to performance on hard courts, several extensions spring to mind.

Mainly, these numbers could be improved by making player-specific adjustments.  74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner?  His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players.  For other players, perhaps a higher rate of first serves in is crucial for victory.  Or their thresholds vary particularly dramatically based on surface.

In future posts, I’ll delve into more detail regarding these generic keys and  investigate ways in which they might be improved.  Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.

5 Comments

Filed under Keys to the match, Research, U.S. Open

Avoiding Double Faults When It Matters

The more gut-wrenching the moment, the more likely it is to stick in memory.  We easily recall our favorite player double-faulting away an important game; we quickly forget the double fault at 30-0 in the middle of the previous set.  Which one is more common? The mega-choke or the irrelevancy?

There are three main factors that contribute to double faults:

  1. Aggressiveness on second serve. Go for too much, you’ll hit more double faults.  Go for too little, your opponent will hit better returns.
  2. Weakness under pressure. If you miss this one, you lose the point. The bigger the point, the more pressure to deliver.
  3. Chance. No server is perfect, and every once in a while, a second serve will go wrong for no good reason.  (Also, wind shifts, distractions, broken strings, and so on.)

In this post, I’ll introduce a method to help us measure how much each of those factors influences double faults on the ATP tour. We’ll soon have some answers.

In-game volatility

At 30-40, there’s more at stake than at 0-0 or 30-0.  If you believe double faults are largely a function of server weakness under pressure, you would expect more double faults at 30-40 than at lower-pressure moments.  To properly address the question, we need to attach some numbers to the concepts of “high pressure” and “low pressure.”

That’s where volatility comes in.  It quantifies how much a point matters by considering several win probabilities.  An average server on the ATP tour starts a game with an 81.2% chance of holding serve.  If he wins the first point, his chances of winning the game increase to 89.4%. If he loses, the odds fall to 66.7%.  The volatility of that first point is defined as the difference between those two outcomes: 89.4% – 66.7% = 22.7%.

(Of course, any number of things can tweak the odds. A big server, a fast surface, or a crappy returner will increase the hold percentages. These are all averages.)

The least volatile point is 40-0, when the volatility is 3.1%. If the server wins, he wins the game (after which, his probability of winning the game is, well, 100%). If he loses, he falls to 40-15, where the heavy server bias of the men’s game means he still has a 96.9% chance of holding serve.

The most volatile point is 30-40 (or ad-out, which is logically equivalent), when the volatility is 76.0%.  If the server wins, he gets back to deuce, which is strongly in his favor. If he loses, he’s been broken.

Mixing in double faults

Using point-by-point data from 2012 Grand Slam tournaments, we can group double faults by game score.  At 40-0, the server double faulted 3.0% of points; at 30-0, 4.2%; at ad-out, 2.8%.

At any of the nine least volatile scores, servers double faulted 3.0% of points. At the nine most volatile scores, the rate was only 2.7%.

(At the end of this post, you can find more complete results.)

To be a little more sophisticated about it, we can measure the correlation between double-fault rate and volatility.  The relationship is obviously negative, with an r-squared of .367.  Given the relative rarity of double faults and the possibility that a player will simply lose concentration for a moment at any time, that’s a reasonably meaningful relationship.

And in fact, we can do better.  Scores like 30-0 and 40-0 are dominated by better servers, while weaker servers are more likely to end up at 30-40. To control for the slightly different populations, we can use “adjusted double faults” by estimating how many DFs we’d expect from these different populations.  For instance, we find that at 30-0, servers double fault 26.7% more than their season average, while at 30-40, they double fault 28.6% less than average.

Running the numbers with adjusted double fault rate instead of actual double faults, we get an r-squared of .444.  To a moderate extent, servers limit their double faults as the pressure builds against them.

More pressure on pressure

At any pivotal moment, one where a single point could decide the game, set, or match, servers double fault less than their seasonal average.  On break point, 19.1% less than average. With set point on their racket, 22.2% less. Facing set point, a whopping 45.2% less.

The numbers are equally dramatic on match point, though the limited sample means we can only read so much into them.  On match point, servers double faulted only 4 times in 296 opportunities (1.4%), while facing match point, they double faulted only 4 times in 191 chances (2.2%).

Better concentration or just backing off?

By now, it’s clear that double faults are less frequent on important points.  Idle psychologizing might lead us to conclude that players lose concentration on unimportant points, leading to double faults at 40-0. Or that they buckle down and focus on the big points.

While there is surely some truth in the psychologizing–after all, Ernests Gulbis is in our sample–it is more likely that players manage their double fault rates by changing their second-serve approach.  With a better than 9-in-10 chance of winning a game, why carefully spin it in when you can hit a flashy topspin gem into the corner?  At break point, there’s no thought of gems, just fighting on to play another point.

And here, the numbers back us up, at least a little bit.  If players are avoiding double faults by hitting more conservative second serves on important points, we would expect them to lose a few more second serve points when the serve lands in play.

It’s a weak relationship, but at least the data suggests that it points in the expected direction.  The correlation between in-game volatility and percentage of second serve points won is negative (r = -0.282, r-squared = 0.08).  Complicating the results may be the returner’s conservative approach on such points, when his initial goal is simply to keep the ball in play, as well.

Clearly, chance plays a substantial role in double faults, as we expected from the beginning.  It’s also clear that there’s more to it.  Some players do succumb to the pressure and double fault some of the time, but those moments represent the minority.  Servers demonstrate the ability to limit double faults, and do so as the importance of the point increases.

Continue reading

Leave a comment

Filed under Research, Serve statistics

The Unlikeliness of Inducing Double Faults

Some players are much better returners than others.  Many players are such good returners that everyone knows it, agrees upon it, and changes their game accordingly.  This much, I suspect, we can all agree on.

How far does that go? When players are altering their service tactics and changing their risk calculations based on the man on the other side of the net, does the effect show up in the numbers? Do players double fault more or less depending on their opponent?

Put it another way: Do some players consistently induce more double faults than others?

The conventional wisdom, to the extent the issue is raised , is yes.  When a server faces a strong returner, like Andy Murray or Gilles Simon, it’s not unusual to hear a commentator explain that the server is under more pressure, and when a second serve misses the box, the returner often gets the credit.

Credit where credit isn’t due

In the last 52 weeks, Jeremy Chardy‘s opponents have hit double faults on 4.3% of their service points, the highest rate of anyone in the top 50.  At the other extreme, Simon’s opponents doubled only 2.8% of the time, with Novak Djokovic and Rafael Nadal just ahead of him at 2.9% and 3.0%, respectively.

The conventional wisdom isn’t off to a good start.

But the simple numbers are misleading–as the simple numbers so often are.  Djokovic and Nadal, going deep into tournaments almost every week, play tougher opponents.  Djokovic’s median opponent over the last year was ranked 21st, while Chardy’s was outside the top 50.  While it isn’t always true that higher-ranked opponents hit fewer double faults, it’s certainly something worth taking into consideration.  So even though Chardy has certainly benefited from some poorly aimed second serves, it may not be accurate to say he has benefited the most–he might have simply faced a schedule full of would-be Fernando Verdascos.

Looking now at the most recent full season, 2012, it turns out that Djokovic did face those players least likely to double fault.  His opponents DF’d on 2.9% of points, while Filippo Volandri‘s did so on 3.9% of points.  While these are minor differences when compared to all points played, they are enormous when attempting to measure the returners impact on DF rate.  While Djokovic “induced” double faults on 3.0% of points and Volandri did so on 3.9% of points, you can see the importance of considering their opponents.  Despite the difference in rates, neither player had much effect on their opponents, as least as far as double faulting is concerned.

This approach allows to express opponent’s DF rate in a more efficient way, relative to “expected” DF rate.  Volandri benefited from 1% more doubles than expected, Chardy enjoyed a whopping 39% more than expected, and–to illustrate the other extreme–Simon received 31% fewer doubles than his opponents would be predicted to suffer through.

You can’t always get what you want

One thing is clear by now. Regardless of your method and its sophistication, some players got a lot more free return points in 2012 than others.  But is it a skill?

If it is a skill, we would expect the same players to top the leaderboard from one year to the next.  Or, at least, the same players would “induce” more double faults than expected from one year to the next.

They don’t.  I found 1405 consecutive pairs of “player-years” since 1991 with at least 30 matches against tour-level regulars in each season. Then I compared their adjusted opponents’ double fault rate in year one with the rate in year two.  The correlation is positive, but very weak: r = 0.13.

Nadal, one player who we would expect to have an effect on his opponents, makes for a good illustration.  In the last nine years, he has had six seasons in which he received fewer doubles than expected, three with more.  In 2011, it was 15% fewer than expected; last year, it was 9% more. Murray has fluctuated between -18% and +25%. Lots of noise, very little signal.

There may be a very small number of players who affect the rate of double faults (positively or negatively) consistently over the course of their career, but a much greater amount of the variation between players is attributable to luck.  Let’s hope Chardy hasn’t built a new game plan around his ability to induce double faults.

The value of negative results

Regular readers of the blog shouldn’t be surprised to plow through 600 words just to reach a conclusion of “nothing to see here.”  Sorry about that. Positive findings are always more fun. Plus, they give you more interesting things to talk about at cocktail parties.

Despite the lack of excitement, there are two reasons to persist in publishing (and, on your end, understanding) negative findings.

First, negative results indicate when journalists and commentators are selling us a bill of goods. We all like stories, and commentators make their living “explaining” causal connections.  Sometimes they’re just making things up as they go along. “That’s bad luck” is a common explanation when a would-be winner clips the net cord, but rarely otherwise.  However, there’s a lot more luck in sport than these obvious instances.  We’re smarter, more rational fans when we understand this.

(Though I don’t know if being smarter or rational helps us enjoy the sport more.  Sorry about that, too.)

Second, negative results can have predictive value. If a player has benefited or suffered from an extreme opponents’ double-fault rate (or tiebreak percentage) and we also know that there is little year-to-year correlation, we can expect that the stat will go back to normal next year. In Chardy’s case, we can predict he won’t get as many free return points, thus he won’t continue to win quite as many return points, thus his overall results might suffer.  Admittedly, in the case of this statistic, regression to the mean would have a tiny effect on something like winning percentage or ATP rank.

So at Heavy Topspin, negative results are here to stay. More importantly, we can all stop trying to figure out how Jeremy Chardy is inducing all those double faults.

5 Comments

Filed under Research, Serve statistics

The Mirage of Surface Speed Convergence

Rafael Nadal won Indian Wells. Roger Federer won on the blue clay. Even Alessio Di Mauro won a match on a hard court last week.

That’s just a sliver of the anecdotal evidence for one of the most common complaints about contemporary ATP tennis: Surface speeds are converging. Hard courts used to play faster, allowing for more variety in the game and providing more opportunities to different types of players. Or so the story goes.

This debate skipped the stage of determining whether the convergence is actually happening. The media has moved straight to the more controversial subject of whether it should. (Coincidentally, it’s easier to churn out columns about the latter.)

We can test these things, and we’re going to in a minute.  First, it’s important to clarify what exactly we mean by surface speed, and what we can and cannot learn about it from traditional match statistics.

There are many factors that contribute to how fast a tennis ball moves through the air (altitude, humidity, ball type) and many that affect the nature of the bounce (all of the same, plus surface). If you’re actually on court, hitting balls, you’ll notice a lot of details: how high the ball is bouncing, how fast it seems to come off of your opponent’s racket, how the surface and the atmosphere are affecting spin, and more.  Hawkeye allows us to quantify some of those things, but the available data is very limited.

While things like ball bounce and shot speed can be quantified, they haven’t been tracked for long enough to help us here.  We’re stuck with the same old stats — aces, serve percentages, break points, and so on.

Thus, when we talk about “surface speed” or “court speed,” we’re not just talking about the immediate physical characteristics of the concrete, lawn, or dirt.  Instead, we’re referring to how the surface–together with the weather, the altitude, the balls, and a handful of other minor factors–affects play.  I can’t tell you whether balls bounced faster on hard courts in 2012 than in 1992.  But I can tell you that players hit about 25% more aces.

Quantifying the convergence

In what follows, we’ll use two stats: ace rate and break rate.  When courts play faster, there are more aces and fewer breaks of serve.  The slower the court, the more the advantage swings to the returner, limiting free points on serve and increasing the frequency of service breaks.

To compare hard courts to clay courts, I looked for instances where the same pair of players faced off during the same year on both surfaces.  There are plenty–about 100 such pairs for each of the last dozen years, and about 80 per year before that, back to 1991.  Focusing on these head-to-heads prevents us from giving too much weight to players who play almost exclusively on one surface.  Andy Roddick helped increase the ace rate and decrease the break rate on hard courts for years, but he barely influences the clay court numbers, since he skipped so many of those tournaments.

Thus, we’re comparing apples to apples, like the matches this year between David Ferrer and Fabio Fognini.  On clay, Ferrer aced Fognini only once per hundred service points; on hard, he did so six times as often.  Any one matchup could be misleading, but combine 100 of them and you have something worth looking at.  (This methodology, unfortunately, precludes measuring grass-court speed.  There simply aren’t enough matches on grass to give us a reliable sample.)

Aggregate all the clay court matches and all the hard court matches, and you have overall numbers that can be compared.  For instance, in 2012, service breaks accounted for 22.0% of these games on clay, against 20.5% of games on hard.  Divide one by the other, and we can see that the clay-court break rate is 7.4% higher than its hard-court counterpart.

That’s one of the smallest differences of the last 20 years, but it’s far from the whole story.  Run the same algorithm for every season back to 1991 (the extent of available stats), and you have everything from a 2.8% difference in 2002 to a 32.8% difference in 2003.  Smooth the outliers by calculating five-year moving averages, and you get finally get something a bit more meaningful:

breakdiff

The larger the difference, the bigger the difference between hard and clay courts.  The most extreme five-year period in this span was 2003-07, when there were 25.4% more breaks on clay courts than on hard courts.  There has been a steady decline since then (to 16.9% for 2008-12), but not to as low a point as the early 90s (14.0% for 1991-1996), and only a bit lower than the turn of the century (17.8% for 1998-2002).  These numbers hardly identify the good old days when men were men and hard courts were hard.

When we turn to ace rate, the trend provides even less support for the surface-convergence theory.  Here are the same 5-year averages, representing the difference between hard-court ace rate and clay-court ace rate:

acediff2

Here again, the most diverse results occurred during the 5-year span from 2003 to 2007, when hard-court aces were 51.3% higher than clay-court aces.  Since then, the difference has fallen to 46%, still a relatively large gap, one that only occurred in two single years before 2003.

If surfaces are converging, why is there a bigger difference in aces now than there was 10, 15, or 20 years ago? Why don’t we see hard-court break rates getting any closer to clay-court break rates?

However fast or high balls are bouncing off of today’s tennis surfaces, courts just aren’t playing any less diversely than they used to.  In the last 20 years, the game has changed in any number of ways, some of which can make hard-court matches look like clay-court contests and vice versa.  But with the profiles of clay and hard courts relatively unchanged over the last 20 years, it’s time for pundits to find something else to complain about.

13 Comments

Filed under Research, Surface speed

Warming Up and Losing Out

This week’s pair of ATP warmups for the Australian Open provide quite the contrast.

In Sydney, only one seeded player (the hardly automatic Andreas Seppi) reached the semifinals, and only one other even made the quarters. Across the ditch in Auckland, three of the final four are among the top four seeds, and the fourth, Gael Monfils, would typically sport a ranking in the same range.

Sydney fits a conventional narrative, while Auckland confounds it. The week before a Grand Slam, many of the top players are out of action, while those who are in action … well, let’s just say warmups don’t always appear to be their top priority.

Winning in 250s

The ATP schedule gives us a convenient natural experiment in order to determine whether slam warmups really are different.

(For convenience, I’m using the term “warmups.” However, we’re only looking at tournaments the week before a slam starts. Sydney is included, but not Brisbane, even though events two weeks before Austrlian and Wimbledon are generally called “warmups.”)

Since 2009, all of the lowest rung of tour-level events have been worth 250 points to the winner. Conveniently, all tourneys the week before slams have fallen into this category.

To see if players seem to treat slam warmups differently from other events, we can simply compare results from warmups to those from other 250s. It isn’t perfect, since a few 250s have draws of more than 32 players and the field quality isn’t identical in all tourneys at this level, but by looking at a few different metrics, we can limit the impact of those quibbles.

Who cares?

Let’s start by simply counting wins and losses of seeded players. In slam warmups from 2009 through 2012, seeds won about 61% of matches against unseeded opponents (224 of 365), while in other 250s, seeds win over 70% of those matches (1499 of 2129). That’s a substantial difference.

To eliminate the quirks of the bigger 250 draw at Queen’s Club, and perhaps toss out some first-round retirements as well, let’s consider the records that seeds have posted in specific rounds.

In the round of 16 at slam warmups, seeds have gone 71-50, for a winning percentage of 58.7%. At other 250s, seeds have won 591 against 223 losses, a percentage of 72.6%.

In the quarterfinals of slam warmups, seeds have beaten unseeded players in 33 of 46 matches–71.7% of encounters. In other 250s, similar matchups have gone to the seeded player 200 of 275 times, or 72.7% of the time.

It seems that many top-ranked players show up at slam warmups with the intent of getting one or two matches under their belt. (Or perhaps fulfilling an obligation to a sponsor.) Those players don’t perform up to their usual standard. But as shown by the comparable records in quarterfinals, those who come to compete play at their usual level.

A few other looks

One issue that seems to have a particular impact in slam warmups is last-minute withdrawals, like that of second-seed Gilles Simon in Sydney this week. Those don’t show up in the won-loss records.

To consider the overall picture, including withdrawals, we can count the number of seeds who reach the semifinals in our different categories of ATP 250s.

In slam warmups, the semifinal fields in the last four years have consisted of 53 seeds and 43 nonseeds–about 55% top-ranked players. In other 250 semifinals, we’ve seen 365 seeds against 191 nonseeds–66% seeds.

Yet another angle is the performance of the top four seeds. In 250s, the 5 through 8 seeds are often barely distinguishable from the rest of the pack. For example, in Sydney this week, those last four seeds are Florian Mayer, Radek Stepanek, Jeremy Chardy, and Marcel Granollers. Not much difference between those guys and unseeded semifinalists Julien Benneteau, Kevin Anderson, and Bernard Tomic.

There’s no clear line between first-rank guys and the rest of the pack, but taking the top half of the seeds seems as good as any other option.

The results are similar to what we saw with the larger pool of seeds. Overall, when a top-four seed played a non-top-four opponent in a slam warmup, he won 65% of matches (129 of 199). In other 250s, he won 74% (978 of 1321).

In the round of 16, top-fours went 51-24 in slam warmups, for a record of 68%, compared to 76% (366-114) in other 250s.

Where the top four seeds differ from other seeds is in the quarterfinal round. In slam warmup QFs, top-fours went 31-20, winning 61% of matches. In other tourneys, they won 71% (261-105). Perhaps the first-round bye in many slam warmups means that top seeds want two warmup matches, but no more.

As mentioned, these experiments give us imprecise results, as they don’t take into account the exact field quality of the various 250s. While they may not be the final word on this question, these numbers do strongly indicate that higher-ranked players don’t view slam warmups as particularly important. Against a similar pool of opponents, they win far more matches in 250s at other times throughout the year.

Perhaps that’s one reason why winning an Aussie Open warmup doesn’t forecast any particular level of success in Melbourne–these are tournaments where some of your most highly-ranked opponents just aren’t trying as hard as usual.

2 Comments

Filed under Research

Responding to Pressure at 5-5

In a post last week, I presented some data that suggested that servers weaken a bit under the pressure of a tiebreak.  It’s not a strong effect, but it’s a consistent one.  A possible explanation–that all that time between points gives servers a chance to psych themselves out, yet may not affect returners the same way–would apply almost as much to games toward the business end of a set, such as at 5-5 or 5-6.

In other words, if players don’t serve as well (or they return better) when things get tight, we’d expect to see more breaks toward the end of a set–more breaks than expected at 5-5, but perhaps fewer breaks than expected at 2-2.

This also opens up a possible method for evaluating players, as Carl Bialik has suggested.  If someone is losing more sets 5-7 than they are winning 7-5, it may be that they are wilting under the pressure of 5-5 more than the average player.  It would make sense if the players who consistently exceed tiebreak expectations also regularly outperform 7-5 expectations as well.

Within the constraints of the ATP’s Matchstats, 7-5 sets are a great way to identify these patterns.  While some 6-4 sets end with a break (or a break followed by a set-sealing hold), a 6-4 set doesn’t necessarily end that way.  But a 7-5 set must have reached 5-5 before one player took control.

If the hypothesis is correct that players get tighter on serve as the end of the set approaches, we would expect more 7-5 sets in the real world than simulations would imply.

To estimate the number of sets that should end 7-5, we need to take each player’s service points won from each match.  With that, we can calculate the probabilities that sets will end at any given score.  Repeat the process for every match over a period of time and we get a general idea of how often we should see 7-5 sets.

As it turns out, 7-5 sets should make up about 7.8% of all sets.  In fact, 8.8% of sets end 7-5.  Not a huge difference, but one that is fairly consistent from year to year.  Every year since 1991, where this dataset begins, there have always been more 7-5s than expected.  It certainly adds more weight to the claim that the balance of power swings to the returner toward the end of a tight set.

(My set-prediction model doesn’t exactly replicate reality, since players win more games than their service winning percentages predict, in large part because almost all servers are better in either the deuce or ad court, and the variance between them makes it more likely that the player wins a given service game.  When applying a crude adjustment for this, the crumbling-server hypothesis looks even better–the more games servers are predicted to win, the fewer predicted 7-5 sets.)

Identifying the unbreakable

This type of discussion must make you wonder: Which players are good as this stuff?  If it is true that late-set pressure results in more breaks, it seems obvious that some players are more prone to that pressure, and that other players take advantage of that pressure.

In an ideal world, we’d be able to identify some great 7-5 records, point out some 5-7 records, and have some great new insights into players.

As it is … we might.

As we saw last week with tiebreak analysis, we can’t simply count up a player’s 7-5 sets and compare that total to his 5-7 set losses.  Over the last three years, Andy Roddick won more than 55% of his 7-5 and 5-7 sets, but given the players he faced in those sets and their performances in those matches, he should have won 62%.

There are two ways to quantify player accomplishments in this department.  The first evaluates how well a player avoids losing 5-7 when he reaches 5-5; the other compares his ability to break for 7-5 against his proneness to being broken for 5-7.

Let’s call the first stat Five-Seven AVoidance, or FSAV.  For any player, we first add up the sets that reached 5-5, then count the sets that he won 7-5 or reached a tiebreak.  Then we use the general method described above to estimate how many times the player should have reached 5-5, and how many of those times he should have avoided 5-7.   Since the beginning of 2010, Kei Nishikori has avoided a 5-7 finish in about 92% of the sets in which he reached 5-5.  My model would have expected him to avoid 5-7 only about 84% of the time.  (The model expects that most players will avoid 5-7 about 82-90% of the time they reach 5-5.)

From those numbers, we discover that Nishikori lost 5-7 less than half as often as we would have expected him to.  No other player comes close to that mark. In everyday language, FSAV approximates how often a player was able to hold serve at 5-5 or 5-6.  Important skill, that.

The second stat is more narrowly focused on 5-5 sets that do not reach a tiebreak.  Let’s call this one the Seven-Five Outperformance Rate, or SFOR, similar to the TBOR (TieBreak Outperformance Rate) I introduced last week.

Here, instead of comparing 5-7s to all 5-5 sets, we compare 5-7s to 7-5s.  In other words: Is the player more likely to break for 7-5 or be broken for 5-7?  As with the previous stat, after calculating the simple rate (that is, number of 7-5 sets divided by total number of 7-5 and 5-7 sets), we compare that to the results that the model would have expected the player to post.

Bizarrely enough, our three-year leader in SFOR is Ernests Gulbis, who has won about 73% of his 7-5 and 5-7 sets, compared to the 50% the model expects of him.  (It’s even more impressive when compared to the 7% that I personally would have expected from him.)

As the highlighting of Gulbis suggests, these stats probably don’t yet belong in our everyday toolbox.  There simply aren’t very many 7-5 sets, even if–as I established above–there are a few more than we would expect.  For reference, there are almost twice as many tiebreaks as 7-5s.

And to keep Gulbis in the spotlight, it may be that winning 7-5 sets is more a function of getting to 5-5 when you shouldn’t.  Perhaps many of those 7-5s racked up by the Latvian came when he should have put the set away 6-2.  Once 5-5 came along, he finally decided to get serious.  As Gulbis himself might tell you, it’s anybody’s guess.

Follow the jump for FSAV and SFOR on about 50 or so of the most active players (including all tour-level matches (but excluding Davis Cup) since the beginning of 2010, sorted by FSAV) and decide for yourself.

Continue reading

4 Comments

Filed under Research, Tiebreaks