The Limited Value of Head-to-Head Records

Yesterday at the Australian Open, Ana Ivanovic defeated Serena Williams, despite having failed to take a set in four previous meetings. Later in the day, Tomas Berdych beat Kevin Anderson for the tenth straight time.

Commentators and bettors love head-to-head records. You’ll often hear people say, “tennis is a game of matchups,” which, I suppose, is hardly disprovable.

But how much do head-to-head records really mean?  If Player A has a better record than Player B but Player B has won the majority of their career meetings, who do you pick? To what extent does head-to-head record trump everything (or anything) else?

It’s important to remember that, most of the time, head-to-head records don’t clash with any other measurement of relative skill. On the ATP tour, head-to-head record agrees with relative ranking 69% of the time–that is, the player who is leading the H2H is also the one with the better record. When a pair of players have faced each other five or more times, H2H agrees with relative ranking 75% of the time.

Usually, then, the head-to-head record is right. It’s less clear whether it adds anything to our understanding. Sure, Rafael Nadal owns Stanislas Wawrinka, but would we expect anything much different from the matchup of a dominant number one and a steady-but-unspectacular number eight?

H2H against the rankings

If head-to-head records have much value, we’d expect them–at least for some subset of matches–to outperform the ATP rankings. That’s a pretty low bar–the official rankings are riddled with limitations that keep them from being very predictive.

To see if H2Hs met that standard, I looked at ATP tour-level matches since 1996. For each match, I recorded whether the winner was ranked higher than his opponent and what his head-to-head record was against that opponent. (I didn’t consider matches outside of the ATP tour in calculating head-to-heads.)

Thus, for each head-to-head record (for instance, five wins in eight career meetings), we can determine how many the H2H-favored player won, how many the higher-ranked player won, and so on.

For instance, I found 1,040 matches in which one of the players had beaten his opponent in exactly four of their previous five meetings.  65.0% of those matches went the way of the player favored by the head-to-head record, while 68.8% went to the higher-ranked player. (54.5% of the matches fell in both categories.)

Things get more interesting in the 258 matches in which the two metrics did not agree.  When the player with the 4-1 record was lower in the rankings, he won only 109 (42.2%) of those matchups. In other words, at least in this group of matches, you’d be better off going with ATP rankings than with head-to-head results.

Broader view, similar conclusions

For almost every head-to-head record, the findings are the same. There were 26 head-to-head records–everything from 1-0 to 7-3–for which we have at least 100 matches worth of results, and in 20 of them, the player with the higher ranking did better than the player with the better head-to-head.  In 19 of the 26 groups, when the ranking disagreed with the head-to-head, ranking was a more accurate predictor of the outcome.

If we tally the results for head-to-heads with at least five meetings, we get an overall picture of how these two approaches perform. 68.5% of the time, the player with the higher ranking wins, while 66.0% of the time, the match goes to the man who leads in the head-to-head. When the head-to-head and the relative ranking don’t match, ranking proves to be the better indicator 56.5% of the time.

The most extreme head-to-heads–that is, undefeated pairings such as 7-0, 8-0, and so on, are the only groups in which H2H consistently tells us more than ATP ranking does.  80% of the time, these matches go to the higher-ranked player, while 81.9% of the time, the undefeated man prevails. In the 78 matches for which H2H and ranking don’t agree, H2H is a better predictor exactly two-thirds of the time.

Explanations against intuition

When you weigh a head-to-head record more heavily than a pair of ATP rankings, you’re relying on a very small sample instead of a very big one. Yes, that small sample may be much better targeted, but it is also very small.

Not only is the sample small, often it is not as applicable as you might think. When Roger Federer defeated Lleyton Hewitt in the fourth round of the 2004 Australian Open, he had beaten the Aussie only twice in nine career meetings. Yet at that point in their careers, the 22-year-old, #2-ranked Fed was clearly in the ascendancy while Hewitt was having difficulty keeping up. Even though most of their prior meetings had been on the same surface and Hewitt had won the three most recent encounters, that small subset of Roger’s performances did not account for his steady improvement.

The most recent Fed-Hewitt meeting is another good illustration. Entering the Brisbane final, Roger had won 15 of their previous 16 matches, but while Hewitt has maintained a middle-of-the-pack level for the last several years, Federer has declined. Despite having played 26 times in their careers before the Brisbane final, none of those contests had come in the last two years.

Whether it’s surface, recency, injury, weather conditions, or any one of dozens of other factors, head-to-heads are riddled with external factors. That’s the problem with any small sample size–the noise is much more likely to overwhelm the signal. If noise can win out in the extensive Fed-Hewitt head-to-head, most one-on-one records don’t stand a chance.

Any set of rankings, whether the ATP’s points system or my somewhat more sophisticated (and more predictive) jrank algorithm, takes into account every match both players have been involved in for a fairly long stretch of time. In most cases, having all that perspective on both players’ current levels is much more valuable than a noise-ridden handful of matches. If head-to-heads can’t beat ATP rankings, they would look even worse against a better algorithm.

Some players surely do have an edge on particular opponents or types of opponents, whether it’s Andy Murray with lefties or David Ferrer with Nicolas Almagro. But most of the time, those edges are reflected in the rankings–even if the rankings don’t explicitly set out to incorporate such things.

Next time Kevin Anderson draws Berdych, he should take heart. His odds of beating the Czech next time aren’t that much different from any other man ranked around #20 against someone in the bottom half of the top ten. Even accounting for the slight effect I’ve observed in undefeated head-to-heads, a lopsided one-on-one record isn’t fate.

About these ads

2 Comments

Filed under Forecasting, Head-to-Heads, Research

2 responses to “The Limited Value of Head-to-Head Records

  1. Amir

    H2H does have a predictive value (even when players played one match before), but it is not as strong as people think.
    If you take 2 palyers with the same rankings and 1 match between them, when the higher ranked player has won, he will have a higher chance of winning the socn, but not by much. It may be interesting to test this to see the exact numbers,,,

  2. josh

    H2H can be misleading if matches are spread over a long period of time between players whose “prime” periods didn’t overlap or if both’s players rankings are very different each time they play. A low number of H2H matches is also fairly meaningless if the number of wins is about evenly spread between the two. I think it’s hard to put any value in an H2H where the total number of matches played was 5 or less unless it was a lopsided 5-0 or 4-1. Does Patrick Rafter’s 3-0 record against Federer mean anything?

    The surface also comes into play as well since some players prefer certain surfaces. I would agree that a lot of H2H are “skewed” or have “noise” and it might be difficult to find “unskewed” H2H that aren’t influenced by these factors.

    I think a way to determine how “skewed” a H2H record is to breakdown the H2H by looking at it when limited to a certain time period, specific surfaces, finals matches only, slam matches only, etc.

    An interesting metric could be the “hypothetical” H2H. Include all the actual H2H matches but also include tournaments which were won by either player but where they did not meet each other. This would be interesting to do with the Nadal – Federer rivalry since many feel that the record would be more “even” if Nadal had actually made it far enough to play Federer in many of the hard court and grass court tournaments that Federer won.

    I don’t think the “greatest” player has to have a winning H2H against everybody but they should have a “competitive” record against top ranked competition. Every player has a weakness of some sort and certain play-styles they have a hard time with but great players improve their weaknesses or compensate for them. I do feel that players who are deemed the “greatest” players should not have overwhelmingly lopsided H2H loss records to their main rivals.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s