Category Archives: U.S. Open

US Open Point-by-Point Stats Recap

As regular readers know, I’m working on a system to track every shot in a tennis match and then generate meaningful data based on the results.  Once I hammer out a few final bugs, I’ll introduce that system publicly.  Then, with my interactive Excel doc–and at least a little bit of practice–you can chart matches as well.

In the meantime, I’ve added another set of tables to each one of the point-by-point recaps.  My system allows (but does not require) the tracking of each shot’s direction, which seems particularly valuable in the case of a tactical baseline matchup like Monday’s final.  Follow the link to the men’s final stats, and then click either of the “shot direction” links.  I’ve broken down each player’s shots into crosscourt, down the middle, down the line, inside-out, and inside-in, then broken down each specific shot type (e.g. “forehand inside-out”) and shown the results of that shot.

At this point, the numbers are little more than a basis for conversation and speculation.  Except for Serena Williams and Victoria Azarenka, I don’t have stats on more than two matches for any individual player.  In time, however, I expect to amass a fair amount of raw data on the top-ranked men and women, and from there, we might really be able to learn something.

In the meantime, here is a list of all the point-by-point stat summaries available from the US Open.

Men:

Women:

Bonus:

3 Comments

Filed under Match charting, U.S. Open

Simpler, Better Keys to the Match

If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM.  Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.

Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match.  The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.

Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.

It’s common to find that the losing player met more “keys” than the winner did, as was the case in the Djokovic-Wawrinka semifinal.  Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.

According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded.  But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter.  In their quest for the unusual, IBM’s team missed out on the predictive.

IBM vs generic

IBM offered keys for 86 of the 127 men’s matches at the US Open this year.  In 20 of those matches, the loser met as many or more of the keys as the winner did.  On average, the winner of each match met 1.13 more IBM keys than the loser did.

This is IBM’s best performance of the year so far.  At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the loser.  At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.

Without some kind of reference point, it’s tough to know how good or bad these numbers are.  As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”

It’s not that difficult.  In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.

I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”)  It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.

Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves.  As I’ll discuss in future posts, these stats–and others–show surprising discontinuities.  That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match.  These measurements are tailor-made for keys.

For a third key, I tried first-serve percentage.  It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them.  You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa.  And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing.  It’s not linear, but he more first serves you put in the box, the better your odds of winning.

Put it all together, and we have three generic keys:

  • Winning percentage on first-serve points better than 74%
  • Winning percentage on second-serve points better than 52%
  • First-serve percentage better than 62%

These numbers are based on the last few years of ATP results on every surface except for clay.  For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.

For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better.  Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser.  Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.

A smarter planet

It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual.  It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.

Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy.  In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site.  “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.

Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right.  And it’s not hard to spell “IBM.”

Better keys, more insight

Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one.  Commentators often talk about hitting certain targets, like 70% of first serves in.  Yet to my knowledge, no one had done the research.

With my generic keys as a first step, this path could get a lot more interesting.  While these single numbers are good guides to performance on hard courts, several extensions spring to mind.

Mainly, these numbers could be improved by making player-specific adjustments.  74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner?  His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players.  For other players, perhaps a higher rate of first serves in is crucial for victory.  Or their thresholds vary particularly dramatically based on surface.

In future posts, I’ll delve into more detail regarding these generic keys and  investigate ways in which they might be improved.  Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.

5 Comments

Filed under Keys to the match, Research, U.S. Open

Rafael Nadal d. Novak Djokovic: Recap and Detailed Stats

There are a lot of words that can be used to describe Novak Djokovic, but “sloppy” usually isn’t one of them.  Despite plenty of brilliance from the Serbian, he made far too many mistakes to win today.  Of course, the man on the other side of the net, Rafael Nadal, may be the best in game at forcing his opponent to attempt low-percentage shots out of pure desperation.

This morning, I predicted that, in order to win the match, Nadal would need to serve well, piling up more quick service points than usual, as Djokovic is a master of neutralizing the server’s advantage.  Give him a few shots, and it doesn’t matter who delivered the serve or how well they hit it.

That isn’t what happened.  Nadal won fewer than one in five service points on or before his second shot.  (Djokovic did a little better by that metric, but at 21%, not by much.)  Instead, Rafa won the way Novak usually does: by neutralizing his opponent’s serve.

Rafa won 45% of return points today, a mark he has never before reached against Djokovic on hard courts.  Even more importantly, he won return points at the same rate when Djokovic was serving at 30-30 or later.  Djokovic won what would normally be an impressive number of return points: 38%.  In recent years on hard courts, that was always enough to beat the Spaniard.

It was a different kind of hard-court match today, one that was decided in grueling rallies.  20% of points played today reached at least ten strokes, and Rafa won 59% of them.  Of points that finished more quickly, Djokovic simply gave away too many.  By my unofficial (and rather strict) count, he hit over 60 unforced errors, more than double Nadal’s total.

Too many of those sloppy shots came at crucial moments.  A bad forehand miss on a mid-court sitter gave Nadal set point in the third set, which Rafa converted on the first try.  Serving down a break in the fourth at 1-4, Djokovic quickly went up 30-30, then missed his second shot on three straight points to give Nadal another break point.  At 30-0 in that game, it was possible to imagine Novak clawing his way back.  Once the double break was sealed, the match was over.

Djokovic showed plenty of brilliance, especially in the second and third sets, and contributed to some incredible tennis moments, including ten rallies that exceeded 20 shots.  Indeed, Djokovic converted a break chance by claiming the best of those, a 54-stroke slugfest in the second set (video here).  He didn’t go quietly until that dreadful game at 1-4.

By beating Djokovic at his own game, Nadal solidified his status as the most dominant player on hard courts.  His undefeated record on the surface this year didn’t leave that in much doubt, but it had been three years since he won a hard-court Grand Slam.  Assuming he stays healthy, even Rafa might agree that he heads to Australia as the player to beat.

Here are the complete point-by-point stats from the match.

Here is a complete win-probability graph, as well.

1 Comment

Filed under Match charting, Match reports, Novak Djokovic, Rafael Nadal, U.S. Open

Djokovic-Nadal XXXVII: The (Actual) Keys to the Match

Both Rafael Nadal and Novak Djokovic have had easy routes to the US Open final.  Neither was tested before the semifinals, and neither has yet to play a top-eight opponent.  Yet both were pushed further than expected in their last matches.  Djokovic nearly lost in another tough five-setter against Stanislas Wawrinka, and Nadal looked almost human at times, spraying errors in his match with Richard Gasquet.

For all that, the field is down to the final two.  They’ve played 36 times before, with Nadal leading the career matchup 21-15. On hard courts, it is the 18th meeting, with Djokovic leading 11-6.  It is their eleventh encounter in a Grand Slam, of which Rafa has won seven of the previous ten, while they’ve split their two previous US Open finals.

Based on the most relevant pieces of this head-to-head–the last seven Djokovic-Nadal matches on hard courts, dating back to the 2010 US Open–we can identify some clear trends that tell us what to watch for, and what each player must do to seal the US Open title.

The key: Rafa’s service games

Of these last seven hard-court matches, Nadal has won three and Djokovic has won four.  If we could find some statistical indicators that each player reached when they won and failed to accomplish when they lost, we might be on to something.  Think of it like IBM’s Keys to the Match, but with actual predictive value.

Sure enough, there are plenty of indicators that fit the bill, and they almost all center on Nadal’s serve:

  • In four of the matches, Nadal has served fewer than 5% aces.  In the other three, at least 7% aces.  He lost all four of the former, and won all three of the latter.
  • In four of the matches, Nadal won fewer than 70% of his first-serve points.  In the other three, he won at least 71%.  He lost all four of the former, and won all three of the latter.
  • In three of the matches, Nadal won fewer than 47% of his second-serve points.  In the other four, won at least 56%.  He lost all of the former, and won all but one (the 2011 Indian Wells final) of the latter.

We can sum up the importance of Nadal’s service games from a more Djokovic-centered perspective:

  • In three of the matches, Djokovic won no more than 33% of return points.  In the other four, he won at least 37% of return points.  Care to guess which matches he won?

Djokovic’s service non-indicators

The numbers are not nearly so clear for Djokovic’s service games.  In the two meetings when Novak hit the most aces, Rafa won.  In three of the only four matches when Djokovic made 62% or more of his first serves, Rafa won.  (These are starting to sound like some of the more inane of the IBM keys.)

Generally, winning 65% of first serves is good enough for Novak to beat Nadal, except for last month’s match in Canada, when he won 71% of first serves and lost in a third-set tiebreak.  In Djokovic’s worst second-serve performance of the seven matches, the 2011 US Open final, he barely won 44% of those points, yet won the match.

Of course, this doesn’t mean that Djokovic’s service stats don’t matter.  It’s no accident that Novak’s first-serve percentages were much higher in the three sets he won against Wawrinka than in the two sets he lost.  On the contrary, Djokovic’s serve just isn’t as potentially dominant as Nadal’s is.

For example, in Saturday’s semifinals, Nadal won 36% of his service points on or before his second shot, while Djokovic won only 24% of his service points that way.  Nadal’s number isn’t staggeringly high (for example, both Kevin Anderson and Marcos Baghdatis topped 40% in that category in their second-round match) but it’s a number he can earn only when serving well.  When he isn’t earning those cheap, quick points against Djokovic, Novak takes away the server’s advantage, threatening to break in almost every service game.

By contrast, Djokovic–like Victoria Azarenka–doesn’t consistently earn that type of advantage on serve.  Sure, he gets some free points that way, but in general, he takes the slight advantage that serving confers and uses that as an edge in a longer rally.  In the semifinal against Wawrinka, his average service point–including aces and unreturnables–lasted more than five shots.

Getting one number for Novak

Individually, Djokovic’s service stats don’t tell us much.  But if we consolidate them into one number–Nadal’s return points won–we get a little better clue of what beating Novak requires.  In the three matches where Nadal failed to win 34% of return points, he lost.  In the two matches where he won at least 42% of return points, he won.

But if you’re counting, you’ve surely noted that I left out two matches.  In Montreal last month, Nadal won only 34.7% of return points, and won.  In the 2011 US Open final, he won 41.7% of return points, yet lost.  Djokovic can be so effective in his own return games–or simply unbeatable when given break point opportunities, like he was that day–that even a masterful return performance like Nadal displayed in that final isn’t always good enough.

So Novak’s numbers just aren’t as indicative as his opponent’s.  Instead, keep your eyes on Rafa’s serve statistics.  Despite the many long, gut-busting rallies we can expect this afternoon, Nadal has this match–like his previous hard-court meetings with the world #1–on his own racquet.

1 Comment

Filed under Novak Djokovic, Rafael Nadal, U.S. Open

US Open Final: Serena Williams d. Victoria Azarenka: Recap and Detailed Stats

Today’s final was Serena Williams‘s for the taking.  She didn’t seize it as boldly as she might have, but she performed just well enough to overcome both the windy conditions and a reliably dogged opponent in Victoria Azarenka.

When Serena is playing as well as she did during the third set, it’s tough to see how she ever loses. But today we saw an excellent illustration of both her assets and her liabilities.  If her opponent can hang around in rallies, there will be enough errors to swing some matches in the other direction. Most of the WTA rank and file can’t absorb her pace and stick around long enough to reap the benefits of those errors, but Vika can.

And when Azarenka is playing her best, as she did on occasion throughout this match, she can attack on one of Serena’s less penetrating shots, creating opportunities for her own winners. A player with a bigger serve would do that with her serve; Vika must try to do so within each rally.

By the numbers, it’s a bit of a miracle that Vika forced a third set.  Twice in the second set, Serena served for the match and was broken.  It was a testament to Azarenka’s stubbornness, always putting one more ball back in play, forcing Serena to overcome both the pressure and the wind.  In that second set, Williams had a hard time doing that.

It was the wind–and Serena’s difficulty dealing with it–that kept this match going as long as it did.  While it made life difficult for both players at times, especially when playing on the right side of the chair, Serena struggled much more.  She never really adjusted to the conditions, setting up early and taking big swings when the wind was likely to move the ball a bit too much for that.  Many of Serena’s errors–especially her 33 unforced errors on the backhand side alone–can be attributed to that sloppiness.

By the third set, the wind had settled down and so had Serena.  Azarenka provided some help with two crucial double faults in the fourth game of the set, including one on break point.  It wasn’t her first poorly-timed double fault of the match–four of her five came at 30-30 or later–but this one was the beginning of the end.  Unlike in the second set, Serena didn’t let up.  She consolidated the break by holding to love, with an unreturnable, two aces, and a running backhand lob winner.

I wrote this morning that Azarenka’s chances hinged on her serve.  She won 54.5% of her service points, a bit less than she did against Serena in Cincinnati, but better than she did in each of her last three matches in New York.  Had she limited her double faults to less important moments, 54.5% may well have been enough.

In the end, Serena was simply too strong.  Vika is the very best on tour at what she does, negating the advantage of those huge weapons, but it allows her very little margin for error against Serena.  That margin for error wasn’t quite enough for her to pull off the upset today.

Here are the point-by-point-based serve, return, and shot-type stats for the match.

Leave a comment

Filed under Serena Williams, U.S. Open, Victoria Azarenka, WTA

Does Azarenka Have a Chance?

The last two times Serena Williams and Victoria Azarenka have met on hard courts, Azarenka has come out on top.  As much confidence as that might give her going into today’s final, it might be the only evidence suggesting she’s likely to win.

Today’s match will come down to Vika’s ability to hold serve, and while she has moved quickly through her last two rounds, she has yet to show that she can serve well enough to hold off the onslaught that is Serena’s return game.

In the semifinal against Flavia Pennetta, she lost more service points than she won, and was broken in five of her nine service games.  Against Daniela Hantuchova, she lost 47% of her service points, suffering three service breaks.  Playing Ana Ivanovic, she lost more than half of her service points, and was broken seven times.

While each of those players had a nice tournament, this is not exactly a Hall of Fame lineup that has reduced Azarenka’s service games to coin flips.  None brings anywhere near the weaponry to the return game that Serena does.  And Serena is considerably more difficult to break back.

These numbers make it all the more surprising that the last meeting between these two players ended in Vika’s favor.  Thanks to the hard work of Amy Fetherolf, who charted the Cincinnati final (and analyzed the results here), we have detailed data from that most recent matchup.  Azarenka managed to win 55% of her service points (the same figure she held Serena to) and landed 11 of 12 serves on game points, winning nine of them.

Another promising data point is last year’s US Open final, in which Serena managed to win only 44% of Azarenka’s service points.  In both of these recent contests, the differences between Vika’s first-serve and second-serve success rates is tiny–in New York last year, it was a mere two percentage points–suggesting that she needs only a slight edge at the beginning of a rally to win the point.

Azarenka has the ability to step up her game for the big matches, so the question she’ll have to answer today is: Can she serve more effectively than she has all tournament?  If she does, even at the modest level she did in Cincinnati, we’re in for a very competitive afternoon of tennis.

Check out this final preview from Tom Perrotta, in which everyone agrees that Vika will raise her level today.

If you missed it yesterday, I wrote recaps of both men’s semifinals.  Djokovic-Wawrinka here, and Nadal-Gasquet here.  In those posts you can find links to my point-by-point based stats for both matches.

Finally, don’t miss this piece from Carl Bialik, in which he looks at IBM’s not-very-predictive “predictive analytics,” otherwise known as their Keys to the Match.  Next week, I’ll offer a closer look at the details of the better-performing “Sackmann Keys,” which, it turns out, have much more value for tennis analysis than merely showing up the folks at IBM.

Leave a comment

Filed under Serena Williams, U.S. Open, Victoria Azarenka, WTA

Nadal d. Gasquet: Recap and Detailed Stats

Not often do we come away from a straight-set victory with newfound respect for the loser, but that’s the appropriate reaction today.

As I discussed this morning, Richard Gasquet has never accomplished much of anything against Rafael Nadal. The 10-0 head-to-head, if anything, disguises how lopsided it has been.

Today, for two sets, the Frenchman came as close to going toe-to-toe with Nadal as he probably ever will. From the start, he was playing a much more varied game than we are accustomed to from him, serving aggressively, rushing the net at any provocation, and even standing inside the stadium to return serve.

Despite getting broken three times, Gasquet never really went away. After he lost his first service game, it looked like another Nadal-administred drubbing in the works, but Richard held serve for the remainder of the set, finishing at 6-4.

In the second, he once again lost the first game of the set on serve, but went one better. He broke Nadal back, the first service game Nadal has lost in New York. Gasquet took advantage of Rafa’s carelessness to stay on serve until they reached a tiebreak.

Then came the disappointment of the match. Gasquet opened the breaker with a double fault, and serving at 1-6, he doubled once more. That was the only sign of the passive, unthreatening Richard we got all day.

The third set was more lopsided, though Gasquet kept playing aggressive tennis. Nadal was just too good. (Gasquet didn’t help, double faulting twice from 30-30 in the final game, but in the end, it was just the difference between 6-2 and 6-3.)

For Gasquet to beat Rafa, he would have to play the match of a lifetime. He didn’t come close to doing that today, but he did show up with a better set of tactics than he generally brings to bear. While a more varied attack from the Frenchman won’t earn him a spot in the top five, it will ensure he remains in the top ten.

Here are the complete point-by-point stats for the match, and in case you missed it earlier, here’s my recap of the Djokovic-Wawrinka semifinal.

2 Comments

Filed under Match charting, Rafael Nadal, Richard Gasquet, U.S. Open