Rules for Records

Every tennis record comes with an asterisk.

What we’re mainly interested in are achievements at the top level of the men’s or women’s game.  I often use the term “tour-level” as shorthand for this, kind of like saying “Major Leagues” for baseball to cover the American League, National League, and a handful of long-ago defunct organizations.  It disguises unnecessary complexity and allows to get to the point.

For men’s tennis, “tour-level” generally means all ATP events, plus grand slams, plus Davis Cup.  Excluded are challengers, futures, and qualifying at all levels.  Usually we limit consideration to the professional era, though if we’re talking only about Slams, there’s no artificial start date.

Which Davis Cup matches? Grigor Dimitrov is 16-9 this year, but two of those wins were in Davis Cup Group 2 against Finns outside of the top 900.  Should we exclude Group 2?  (The ATP does.) How about Group 1? (The ATP doesn’t.)  How about any rubber when the opponent is outside the top 200?

And therein lies a major problem, and one that is not limited to records and streaks.  The level of a tennis match is not defined by the ranking points on offer; its difficulty is determined by the quality of the opponent.  In the middle of Robin Haase’s 15-month streak of lost tour-level tiebreaks, the Dutchman did win a breaker, in Rome qualifying against 81st-ranked Sergiy Stakhovsky.  It doesn’t “count.”  But a main-draw match one month later against 640th-ranked Mate Pavic does.

There’s no good solution.  It isn’t a matter of tennis’s “analytics problem,” it is a reflection of the structure of the sport.

Leave a Comment

Filed under Records

If Surfaces are Converging…

Internet discussion has perked up about a post of mine from last month, The Mirage of Surface Speed Convergence.

Many people don’t like my results, and plenty of people just don’t like having someone challenge their preconceived notions–or those of the players they idolize.

Yet for all the chatter, no one has even attempted to address the question at the end of that post:

If surfaces are converging, why is there a bigger difference in aces now than there was 10, 15, or 20 years ago? Why don’t we see hard-court break rates getting any closer to clay-court break rates?

Unless there is a valid answer to those questions, it really doesn’t matter how you felt after watching the Miami final, or what a top player said in some press conference.

3 Comments

Filed under Surface speed

Barcelona or Bucharest? Scheduling Decisions Under the Microscope

This post has been withdrawn due to a mistake in the calculations that seriously affects its conclusions.  I am leaving this note here to avoid breaking the link.  Look on the bright side–on this site, there’s plenty of tennis analysis in which the mistakes have less serious effects.

6 Comments

Filed under Uncategorized

Avoiding Double Faults When It Matters

The more gut-wrenching the moment, the more likely it is to stick in memory.  We easily recall our favorite player double-faulting away an important game; we quickly forget the double fault at 30-0 in the middle of the previous set.  Which one is more common? The mega-choke or the irrelevancy?

There are three main factors that contribute to double faults:

  1. Aggressiveness on second serve. Go for too much, you’ll hit more double faults.  Go for too little, your opponent will hit better returns.
  2. Weakness under pressure. If you miss this one, you lose the point. The bigger the point, the more pressure to deliver.
  3. Chance. No server is perfect, and every once in a while, a second serve will go wrong for no good reason.  (Also, wind shifts, distractions, broken strings, and so on.)

In this post, I’ll introduce a method to help us measure how much each of those factors influences double faults on the ATP tour. We’ll soon have some answers.

In-game volatility

At 30-40, there’s more at stake than at 0-0 or 30-0.  If you believe double faults are largely a function of server weakness under pressure, you would expect more double faults at 30-40 than at lower-pressure moments.  To properly address the question, we need to attach some numbers to the concepts of “high pressure” and “low pressure.”

That’s where volatility comes in.  It quantifies how much a point matters by considering several win probabilities.  An average server on the ATP tour starts a game with an 81.2% chance of holding serve.  If he wins the first point, his chances of winning the game increase to 89.4%. If he loses, the odds fall to 66.7%.  The volatility of that first point is defined as the difference between those two outcomes: 89.4% – 66.7% = 22.7%.

(Of course, any number of things can tweak the odds. A big server, a fast surface, or a crappy returner will increase the hold percentages. These are all averages.)

The least volatile point is 40-0, when the volatility is 3.1%. If the server wins, he wins the game (after which, his probability of winning the game is, well, 100%). If he loses, he falls to 40-15, where the heavy server bias of the men’s game means he still has a 96.9% chance of holding serve.

The most volatile point is 30-40 (or ad-out, which is logically equivalent), when the volatility is 76.0%.  If the server wins, he gets back to deuce, which is strongly in his favor. If he loses, he’s been broken.

Mixing in double faults

Using point-by-point data from 2012 Grand Slam tournaments, we can group double faults by game score.  At 40-0, the server double faulted 3.0% of points; at 30-0, 4.2%; at ad-out, 2.8%.

At any of the nine least volatile scores, servers double faulted 3.0% of points. At the nine most volatile scores, the rate was only 2.7%.

(At the end of this post, you can find more complete results.)

To be a little more sophisticated about it, we can measure the correlation between double-fault rate and volatility.  The relationship is obviously negative, with an r-squared of .367.  Given the relative rarity of double faults and the possibility that a player will simply lose concentration for a moment at any time, that’s a reasonably meaningful relationship.

And in fact, we can do better.  Scores like 30-0 and 40-0 are dominated by better servers, while weaker servers are more likely to end up at 30-40. To control for the slightly different populations, we can use “adjusted double faults” by estimating how many DFs we’d expect from these different populations.  For instance, we find that at 30-0, servers double fault 26.7% more than their season average, while at 30-40, they double fault 28.6% less than average.

Running the numbers with adjusted double fault rate instead of actual double faults, we get an r-squared of .444.  To a moderate extent, servers limit their double faults as the pressure builds against them.

More pressure on pressure

At any pivotal moment, one where a single point could decide the game, set, or match, servers double fault less than their seasonal average.  On break point, 19.1% less than average. With set point on their racket, 22.2% less. Facing set point, a whopping 45.2% less.

The numbers are equally dramatic on match point, though the limited sample means we can only read so much into them.  On match point, servers double faulted only 4 times in 296 opportunities (1.4%), while facing match point, they double faulted only 4 times in 191 chances (2.2%).

Better concentration or just backing off?

By now, it’s clear that double faults are less frequent on important points.  Idle psychologizing might lead us to conclude that players lose concentration on unimportant points, leading to double faults at 40-0. Or that they buckle down and focus on the big points.

While there is surely some truth in the psychologizing–after all, Ernests Gulbis is in our sample–it is more likely that players manage their double fault rates by changing their second-serve approach.  With a better than 9-in-10 chance of winning a game, why carefully spin it in when you can hit a flashy topspin gem into the corner?  At break point, there’s no thought of gems, just fighting on to play another point.

And here, the numbers back us up, at least a little bit.  If players are avoiding double faults by hitting more conservative second serves on important points, we would expect them to lose a few more second serve points when the serve lands in play.

It’s a weak relationship, but at least the data suggests that it points in the expected direction.  The correlation between in-game volatility and percentage of second serve points won is negative (r = -0.282, r-squared = 0.08).  Complicating the results may be the returner’s conservative approach on such points, when his initial goal is simply to keep the ball in play, as well.

Clearly, chance plays a substantial role in double faults, as we expected from the beginning.  It’s also clear that there’s more to it.  Some players do succumb to the pressure and double fault some of the time, but those moments represent the minority.  Servers demonstrate the ability to limit double faults, and do so as the importance of the point increases.

Continue reading

Leave a Comment

Filed under Research, Serve statistics

The Unlikeliness of Inducing Double Faults

Some players are much better returners than others.  Many players are such good returners that everyone knows it, agrees upon it, and changes their game accordingly.  This much, I suspect, we can all agree on.

How far does that go? When players are altering their service tactics and changing their risk calculations based on the man on the other side of the net, does the effect show up in the numbers? Do players double fault more or less depending on their opponent?

Put it another way: Do some players consistently induce more double faults than others?

The conventional wisdom, to the extent the issue is raised , is yes.  When a server faces a strong returner, like Andy Murray or Gilles Simon, it’s not unusual to hear a commentator explain that the server is under more pressure, and when a second serve misses the box, the returner often gets the credit.

Credit where credit isn’t due

In the last 52 weeks, Jeremy Chardy‘s opponents have hit double faults on 4.3% of their service points, the highest rate of anyone in the top 50.  At the other extreme, Simon’s opponents doubled only 2.8% of the time, with Novak Djokovic and Rafael Nadal just ahead of him at 2.9% and 3.0%, respectively.

The conventional wisdom isn’t off to a good start.

But the simple numbers are misleading–as the simple numbers so often are.  Djokovic and Nadal, going deep into tournaments almost every week, play tougher opponents.  Djokovic’s median opponent over the last year was ranked 21st, while Chardy’s was outside the top 50.  While it isn’t always true that higher-ranked opponents hit fewer double faults, it’s certainly something worth taking into consideration.  So even though Chardy has certainly benefited from some poorly aimed second serves, it may not be accurate to say he has benefited the most–he might have simply faced a schedule full of would-be Fernando Verdascos.

Looking now at the most recent full season, 2012, it turns out that Djokovic did face those players least likely to double fault.  His opponents DF’d on 2.9% of points, while Filippo Volandri‘s did so on 3.9% of points.  While these are minor differences when compared to all points played, they are enormous when attempting to measure the returners impact on DF rate.  While Djokovic “induced” double faults on 3.0% of points and Volandri did so on 3.9% of points, you can see the importance of considering their opponents.  Despite the difference in rates, neither player had much effect on their opponents, as least as far as double faulting is concerned.

This approach allows to express opponent’s DF rate in a more efficient way, relative to “expected” DF rate.  Volandri benefited from 1% more doubles than expected, Chardy enjoyed a whopping 39% more than expected, and–to illustrate the other extreme–Simon received 31% fewer doubles than his opponents would be predicted to suffer through.

You can’t always get what you want

One thing is clear by now. Regardless of your method and its sophistication, some players got a lot more free return points in 2012 than others.  But is it a skill?

If it is a skill, we would expect the same players to top the leaderboard from one year to the next.  Or, at least, the same players would “induce” more double faults than expected from one year to the next.

They don’t.  I found 1405 consecutive pairs of “player-years” since 1991 with at least 30 matches against tour-level regulars in each season. Then I compared their adjusted opponents’ double fault rate in year one with the rate in year two.  The correlation is positive, but very weak: r = 0.13.

Nadal, one player who we would expect to have an effect on his opponents, makes for a good illustration.  In the last nine years, he has had six seasons in which he received fewer doubles than expected, three with more.  In 2011, it was 15% fewer than expected; last year, it was 9% more. Murray has fluctuated between -18% and +25%. Lots of noise, very little signal.

There may be a very small number of players who affect the rate of double faults (positively or negatively) consistently over the course of their career, but a much greater amount of the variation between players is attributable to luck.  Let’s hope Chardy hasn’t built a new game plan around his ability to induce double faults.

The value of negative results

Regular readers of the blog shouldn’t be surprised to plow through 600 words just to reach a conclusion of “nothing to see here.”  Sorry about that. Positive findings are always more fun. Plus, they give you more interesting things to talk about at cocktail parties.

Despite the lack of excitement, there are two reasons to persist in publishing (and, on your end, understanding) negative findings.

First, negative results indicate when journalists and commentators are selling us a bill of goods. We all like stories, and commentators make their living “explaining” causal connections.  Sometimes they’re just making things up as they go along. “That’s bad luck” is a common explanation when a would-be winner clips the net cord, but rarely otherwise.  However, there’s a lot more luck in sport than these obvious instances.  We’re smarter, more rational fans when we understand this.

(Though I don’t know if being smarter or rational helps us enjoy the sport more.  Sorry about that, too.)

Second, negative results can have predictive value. If a player has benefited or suffered from an extreme opponents’ double-fault rate (or tiebreak percentage) and we also know that there is little year-to-year correlation, we can expect that the stat will go back to normal next year. In Chardy’s case, we can predict he won’t get as many free return points, thus he won’t continue to win quite as many return points, thus his overall results might suffer.  Admittedly, in the case of this statistic, regression to the mean would have a tiny effect on something like winning percentage or ATP rank.

So at Heavy Topspin, negative results are here to stay. More importantly, we can all stop trying to figure out how Jeremy Chardy is inducing all those double faults.

5 Comments

Filed under Research, Serve statistics

Robin Haase’s Unlucky 13 Tiebreaks

Yesterday, Robin Haase lost a second-set tiebreak to Kenny De Schepper, a mere blip en route to a three-set victory and a place in the Casablanca quarterfinals.  However, it was yet another set-ending failure for the Dutchman, who has now lost thirteen consecutive tour-level tiebreaks.  And another reason to hate Casablanca.

Yes, thirteen.  No other active player has a streak of more than seven, and no tour-level regular has lost more than his last six.  In fact, Haase is now one lost tiebreak away from tying the all-time ATP record of 14, jointly held by Graham Stilwell and Colin Dibley, two players who accomplished their feats in the 1970s.

As I’ve shown before, tiebreak outcomes are rather random. Aside from a small minority of players with extensive tiebreak experience (such as Roger Federer, John Isner, and Andy Roddick), ATP pros tend to win about as many breakers as “expected.” The good players win more than average, the not-so-good players win fewer than average, but there are few players who seem to have some special tiebreak skill–or a notable lack thereof.

It would be premature, then, to read too much into Haase’s streak.  After all, the last fifteen months haven’t been particularly bad for him in general.  When he last won a tour-level tiebreak, in January of last year, he was ranked 62nd in the world.  Now he is #53, and he will pick up another few spots next Monday.  This despite winning only two of the matches in which he lost one of his consecutive tiebreaks.

If history is any guide, the Dutchman will probably turn things around.  Dibley won six of the 10 breakers that followed his streak, and Stilwell won four. Nikolay Davydenko and Thomas Johansson, two otherwise excellent players who lost 13 tiebreaks in a row, each won 5 of their next 10.  More remarkably, the already-missed Ivan Navarro followed a 10-tiebreak losing streak with a 8-2 record in his next 10.

In the ATP era, 43 players have suffered tiebreak losing streaks of 10 or more (full list after the jump).  32 of those have gone on to play at least 10 more.  Naturally, every tiebreak that follows a losing streak is a win, or else it would be considered part of the streak.  In the nine tiebreaks that follow the streak-breaking win, those 32 players won 134 of 288 tiebreaks, or 46.5%.

While the numbers don’t exactly presage Isnerian greatness for Haase, even a return to his pre-streak tiebreak winning percentage of 41% would be welcome.  Fortunately, that’s much more likely than another 13 losses in a row.

Update: In the Barcelona first round, Haase tied the record, losing a third-set tiebreak to Pablo Carreno-Busta.  On May 6, he lost a tiebreak in the second set of his Madrid first-round match against Alexander Dolgopolov to set a new all-time record of 15 straight lost tiebreaks.

Update 2: On 8 May, Haase lost to Jo-Wilfried Tsonga, 7-6 7-6. (How else?) That’s 17 straight tour-level tiebreaks lost.  The all-time tiebreak winning streak is 18, held by Andy Roddick.

Continue reading

10 Comments

Filed under Records, Tiebreaks

Team GB and the Rarity of Davis Cup Comebacks

Last weekend, the British Davis Cup squad pulled off a major upset, defeating the Russian team 3-2.  Even more impressively, all three of their wins came while facing elimination.  The Russians won the two singles matches on the first day before Britain claimed the doubles rubber and both of the reverse singles rubbers on the final day.

It was the first time since 1930 that Britain won a Davis Cup tie from a 2-0 deficit.  It’s also one of the very few times in the modern era that any country has won a tie after failing to post a point on the first day.

Since the formation of the current structure in 1981, there have been 1310 completed ties in the World Group and Group 1, including playoffs.  In 802 of those (61.2%), one team has raced out to a 2-0 lead by sweeping the first-day singles matches.

Of those 802 ties, Britain’s comeback was only the 19th in this 33-year span, and the first since Canada surged to victory against Ecuador in 2011.  Playing the tie at home doesn’t seem to help the underdogs: Only eight of those 19 comebacks came at home.

Many Davis Cup ties, especially at the Group 1 level, are quite lopsided, so clinching the tie with the doubles match is quite common.  In fact, that’s what has happened in nearly half of all ties at the World Group and Group 1 levels since 1981 (577, or 44.0%).  So once a squad is down 2-0, the odds are massively stacked against them.  Here are the historical outcomes for teams that sweep day one:

Clinched in…              
3rd rubber    577  71.9%  
4th rubber    159  19.8%  
5th rubber     47   5.9%  

Won           783  97.6%  
Lost           19   2.4%

Here are the 19 odds-busting ties:

Year                     Home  Surface  Winner  
2013  G1 R2: GBR vs RUS  GBR   Hard     GBR     
2011  G1 R2: ECU vs CAN  ECU   Clay     CAN     
2010  G1 PO: KOR vs PHI  KOR   Hard     PHI     
2010  WG PO: IND vs BRA  IND   Hard     IND     
1998  WG R1: SVK vs SWE  SVK   Clay     SWE     
1997  G1 QF: PHI vs INA  PHI   Clay     INA     
1997  WG R1: ROU vs NED  ROU   Hard     NED     
1996  WG SF: FRA vs ITA  FRA   Carpet   FRA     
1996  G1 PO: TPE vs INA  TPE   Hard     INA     
1995  WG SF: RUS vs GER  RUS   Clay     RUS     
1995  G1 PO: PER vs BAH  PER   Clay     BAH     
1995  WG R1: DEN vs SWE  DEN   Carpet   SWE     
1994  WG SF: SWE vs USA  SWE   Carpet   SWE     
1992  WG R1: CAN vs SWE  CAN   Carpet   SWE     
1990  G1 QF: IRL vs ROU  IRL   Carpet   ROU     
1989  G1 SF: PER vs BRA  PER   Clay     PER     
1988  G1 F: INA vs KOR   INA   Clay     INA     
1988  WG PO: SUI vs MEX  SUI   Carpet   MEX     
1988  G1 QF: PHI vs JPN  PHI   Clay     PHI

It can be done, even in the late rounds of the World Group.  But generally, it’s a good idea to start off the weekend by winning a singles match or two.

Leave a Comment

Filed under Davis Cup