Category Archives: Harebrained schemes

The Untapped Potential of Umpire Scorecard Data

There’s a lot more that can be done with tennis data. Everyone knows this. Even the ATP and WTA tours–along with their rather prominent partners–know this.

Both tours are sitting on a mountain of information that they’ve barely exploited: umpire scorecard data. It’s not cutting edge–there are no cameras, no courtside loggers counting unforced errors and winners. It’s just a log of every point, along with first or second serve, aces, and double faults. Despite those limits, there are many untapped advantages.

First: There’s an umpire scorecard for every match. Not every match on a TV court, not every match on a Hawkeye court, not every main draw match.  Every match. If a ATP, WTA, or ITF umpire is officiating the match, to the best of my knowledge, there is a scorecard–when you see a chair umpire tap on a screen, this is what they’re recording. That means data on thousands of matches and players every year, from Novak Djokovic to Djordje Djokovic.

It’s tough to overstate how valuable that is. The main drawback of most tennis stats is context. For instance, when Hawkeye puts a graphic on your TV screen, it’s often based on data from a single match or the present tournament. IBM’s much-publicized analytics are based on Grand Slam matches only. Umpire scorecards have no such problem.

Second, there’s a ton of information lurking in this low-tech tracking system. The basics of first and second serves, aces, and double faults may not sound like much, but as we’ll see below, they open the door to a huge array of stats. ATP and WTA “Match Stats” are compiled from these scorecards, but they only scratch the surface.

How to do more with scorecards

In a minute, I’ll make specific suggestions for additional totals and rates that the tours could compile from the data they already have. Before that, let me explain why simply expanding the contents of “Match Stats” should be Plan B.

More and more journalism is data-based, and more and more avid fans are, to some extent or other, analyzing tennis for publication. In other words, there is a rapidly growing base of analysts who don’t need data pre-packaged for them. Every match is different, and the numbers needed to illustrate any match report are different as well. For broader analysis, like comparing players over the course of a season, the need for customized data is more important still.

So: Release the point-by-point data from the scorecards.

Another benefit of the simplicity of umpire scorecard data is that more analysts can easily manage it. No organization could foresee everything that might be interesting about a match, so why even try? Not every journalist will want to dig into a point-by-point spreadsheet to see how often Julien Benneteau missed his first serve of a game, or how Rafael Nadal responded every time he fell behind 0-30. But some will do just that. When they do, their work benefits, their readers have more ways to engage with the next match they watch, and the sport ultimately wins.

A not-so-brief wish list

I have a sneaking feeling that no one’s going to release point-by-point data for every ATP or WTA match. I hope that’s not the case, but if it is, the tours should still consider vastly expanding the stats they compile for each match–including past matches for as far back as their databases go.

  1. Deuce/ad comparisons. Some players serve much more effectively in one than the other. For all deuce-court service points, I would like: (a) total points, (b) aces, (c) double faults, and (d) first serves in. Same for ad-court service points.
  2. Break point stats. Same as the above: For both servers facing break point: (a) aces, (b) double faults, and (c) first serves in.
  3. Break point games. In how many games did each player earn a break point?
  4. Stats for other important point scores. Break points are key, but other scenarios are important as well. If I have to pick only a few, let’s start with 0-30, 15-30, deuce, and ad-in (including 40-30). For all service points at each of those scores, I’d like (a) total points, (b) aces, (c) double faults, and (d) first serves in.
  5. Set points and match points: Same as above. Fans love match point stats.
  6. The game sequence–at what points did breaks of serve occur? This would allow us to answer many oft-posed questions: Do players hold serve more early in sets? Do breaks of serve more frequently follow breaks than holds? (And if so, how much more often?) Are players more like to drop serve immediately after winning a tight set?
  7. Set-by-set breakdowns of all stats that are currently kept, plus all of the above. The live scoring app separates stats by set, but there is no official archive with set-by-set breakdowns. This is particularly key for journalists attempting to tell the story of a match, when a small change in approach can turn the tide.
  8. Tiebreak breakdowns. Tiebreaks–especially long ones–have a life of their own, and analysts should be able to see all of the same stats for each tiebreak as for each set as a whole. For example, it would be interesting to see if a player’s ace or double fault rates (or even his or her first-serve percentage) changed between the first twelve games of a set and the breaker.
  9. A list of the score when each double fault occurred. (Aces would be nice, too.) Especially in men’s tennis, DFs are quite rare, and they often loom large in match narratives.
  10. Longest streaks for each player: consecutive aces, consecutive double faults, consecutive points won on serve, consecutive points won overall, and the score at the beginning and end of each of those streaks.
  11. For doubles matches, a separation of all of the above service stats by server. For the Samuel Groth/Leander Paes partnership, aggregate serve stats f (as they are presented now) aren’t going to tell you anything useful about either player’s performance at the line.

To reiterate, all of this stuff is in the scorecards. Most of the above are no more difficult to compile than the Match Stats that the tours already publish.

If the tours added everything on my list, that would be one big step out of the dark ages for tennis. Certainly, tennis writers would be able to file more intelligent stories and fans would have a much better way to experience the performances of their favorite players.

If the tours published current and archived raw point-by-point data, tennis would go one better: it would become an example for many other individual sports to follow. We would see an boom in fan engagement as every follower of the sport would have the opportunity to learn much more about tennis and relive matches–whether last week or late last century–in detail.

We’re not talking about a multi-million dollar infrastructure investment. To achieve all this, the tours need only do a little bit more with what they already have.

 

5 Comments

Filed under Harebrained schemes

Three Simple Ways to Improve the ATP Ranking System

Rafael Nadal‘s two-year ranking system would favor a few veterans at the expense of everyone else.  My algorithm is too complex for players and fans to use on a weekly basis.  But there is always an undercurrent of dissatisfaction over the current system.

The rankings serve two main purposes, each of which we must keep in mind as we think through a better system:

  • Entertainment. The fans want to know who’s number one.  No system will ever be perfect, but if the ranking system told us that Nadal outranked Djokovic despite losing to him several times in a row, the system would lose credibility.
  • Tournament entry. Rankings determine who gets direct entry into tournaments.  A biased ranking system would keep stronger players out of tournaments while letting in lesser players.

A system that is good for one of these purposes is generally good for the other.  In an ideal world, the rankings would show us who is playing the best right now, carefully defining “right now” to avoid an unnecessary focus on current hot streaks.  Another way to look at is that the rankings should be as predictive as possible.  If underdogs are constantly winning, that doesn’t mean tennis is a sport full of triumphant underdogs, it means we’re ranking players incorrectly!

The current system isn’t that bad.  There are three main problems, however:

  1. Last week is equal to last year.  The winner in Miami this week will gain 1000 points.  Those 1000 points will be counted in his ranking next week, in six months, and in 51 weeks. In 53 weeks, though, he’ll have zero points.  If we’re trying to measure how good he is, a tournament 51 weeks ago isn’t nearly as informative as his tournament last week.  And if we insist on using his result from 51 weeks ago, why not his result from 53 weeks ago?
  2. Surfaces are interchangeable.  Milos Raonic won a slew of matches on indoor courts last spring, which earned him a seed at the French Open.  Now, I love Milos, but did he really deserve a seed at the French, despite virtually no professional experience on clay?  Performance on one surface translates to other surfaces to some extent, but (obviously!) all surfaces are not created equal.
  3. All opponents are equal.  In the Miami third round, Andy Roddick beat Roger Federer … then lost.  He’ll get 90 points. Kei Nishikori beat Lukas Rosol … then lost. These sorts of differences sometimes even out over time, but must we trust that they will?  Roddick’s achievement this week is much more impressive than Nishikori’s, and should be treated as such.

We can fix all of these problems with simple arithmetic, making tweaks to the system that any player or fan can understand.

In these solutions, the exact details don’t matter.  The most important thing is simply to acknowledge that not all matches are equal.

  1. Last week is worth more than last year.  In my system, last week is worth a little bit more than the week before, which is worth a little bit more than the week before that, and so on.  Here’s a simple way to incorporate that into the ATP system: After four months, tournaments are worth only 80% of their original points.  After eight months, tournaments are worth only 60% of their original points.  That way, the drop off is more gradual, and Indian Wells is worth more than, say, the 2011 Rome Masters.  If Nadal still wants two years, this can easily be extended to cover two years of results–after a year, 45%; after 16 months, 30%, after 20 months, 15%.  Now everybody’s happy!
  2. Separate surfaces, separate rankings.  There will always–and should always–be a single most important ranking list, encompassing all surfaces.  But for tournament entry, why not do better?  For example, create a clay list by doubling the point value of all clay tournaments and leaving the others alone.  David Ferrer and Carlos Berlocq will rise; John Isner and Kevin Anderson will fall.  Any tennis fan knows this happens, so tournaments should determine entry this way, as well.  After all, Wimbledon has long used this sort of approach for seeding, if not for direct entry.
  3. Bonus points for beating top players.  The WTA used to do this, and it’s the least straightforward of my suggestions.  It’s so important, though, that a little complexity is worth a lot.  Let’s say 100 points for a win over anyone in the top 3; 75 points for beating anyone ranked 4, 5, or 6; 50 points for a win over anyone else in the top 10, 30 points for beating anyone ranked 11-15, and 10 points for a win over anyone ranked 16-20.  Mega-upsets like those scored lately by Isner, Roddick, and Grigor Dimitrov tell us something important, and the rankings should listen.
This is all stuff you can do on a calculator–nothing is more complex than the rules governing protected rankings or zero-pointers.  Young players will see their rankings rise more quickly once they begin beating the top guys.   All players will get into tournaments (and earn seeds) on surfaces where they have had more success .  And the fans will have a more accurate ranking system both to rely upon and to fuel arguments about which players are really better.

12 Comments

Filed under Harebrained schemes, Rankings

The Fatal Flaw of Nadal’s Two-Year Ranking System

Now that Rafael Nadal has resigned from the ATP player council–apparently because no one took his two-year ranking plan seriously–we’re likely to hear a bit more about this alternate approach.

Presumably, Nadal’s method would count the last 104 weeks (two years) of results instead of the last 52, as is currently the case.  As far as I know, he isn’t pushing for any other adjustments.  As long as that is the case, the rest of the council (and the ATP in general) is right to ignore Nadal’s plan: It would do significant damage to the sport, without much in the way of benefits.  It would drastically slow the rise of young players, but change little for guys at the top.

Ultimately, the question is over the purpose of the ranking system.  If it is to reward past performance, a two-year ranking system may be appropriate.  If it is to rank competitors by their current level of play, treating a tournament 22 months ago the same as last week’s tournament is flat-out wrong.

Consider what the present ranking system tells us.  By equally weighting tournaments over the last 52 weeks (with more points for more important events, of course), a player’s ranking is the average of how good he has been over the last 52 weeks–in other words, it’s a approximation of how good he was 26 weeks ago.  For most players, this is a decent estimate of how good they are right now.  If we go to a two-year system, the rankings would be an estimate of how good players were one full year ago.  Yikes.

The most obvious casualties of such a system are young players (or any players, really) on the way up.  Even with the current system, the rankings take some time to catch up with a rising star like Bernard Tomic or Milos Raonic.  When Raonic had his great run in early 2011, the rankings were still counting a bunch of challenger results from one year earlier.  In a two-year system, Raonic’s more recent results would count for even less.  It would take twice as long for such a player to establish himself.

The clear beneficiaries, of course, are the opposite type of competitor: established players who are declining or injured.  If a player is consistently good, it really doesn’t matter how the ranking system is calculated–just about any way you slice it, Djokovic, Nadal, Federer, and Murray would be the top four.  But the players who benefit are the ones who posted good results between 52 and 104 weeks ago, and haven’t done nearly as well since.  Right now, that means injured players like Robin Soderling, and declining players like Andy Roddick and Fernando Verdasco.

Should Roddick and Verdasco continue to be rewarded for their play in 2010?  To me, anyway, the answer is a clear “no.”  Even with Roddick’s sharp decline, he will probably still earn a seed for the French Open.  Does he deserve more than that?

But what about Soderling?  He hasn’t played since June, and his ranking has fallen to #30.  Unless he returns in the next three months, he’ll fall off the list altogether.  If there is a case for Nadal’s system, this is it.  But the ATP already has two methods in place to protect players like Soderling: protected rankings (PR) and wild cards.  Players injured for a certain length of time are able to use a PR (equal to their ranking when they last played) for entry to a set number of tournaments.  Until recently, Tommy Haas was still using a PR of 20.  Soderling would have a PR that would get him into enough tournaments to rebuild his ranking, assuming he comes back with any semblance of his previous form.

Of course, there’s also the wild card.  When Soderling returns, even if he is unranked, every 250- and 500-level tournament would hand him a wild card without a second thought.  This makes PRs even more valuable than the ATP intended them to be: Haas, for example, has been able to use his PR of 20 for so long because many tournaments gave him wild cards.  He could save the PR for when he needed it.

The only disadvantage to PRs and WCs is that these players aren’t seeded.  But really, after sitting out for a year, does a player deserve safe passage to the third round?  I find it hard to believe that they do.  And if this is really such an important issue, perhaps a player such as Soderling could be granted the lowest seed (e.g. 32, at Indian Wells, Miami, or a slam) two of the times he uses his protected ranking.

To recap: A simple two-year system would retard the rise of young players, forcing them to prove themselves for twice as long as is currently the case.  It wouldn’t affect consistently good players.  It would help players on the decline who probably don’t deserve help.  And top players returning from injury have little problem entering tournaments; Nadal’s approach would just get them seeds.

But Jeff, doesn’t your ranking system use two years of results?

Yes, I was getting to that.  It’s crucial to distinguish between using two years of results (acceptable) and weighting all results equally (unacceptable).

The biggest problem with the ATP ranking system as is–and it would be an even bigger problem with a two-year system like Nadal’s–is that it treats long-ago tournaments as equal to yesterday’s tournaments.  The winner of the 2012 Indian Wells event has 1000 points on his ranking.  The winner of the 2011 Miami even has 1000 points on his ranking.  The winner of the 2011 Indian Wells event has … zero points on his ranking.

How a player performed 18 months ago, or 20 months ago, has some predictive value.  But not nearly as much as the predictive value of their more recent performances.  In slight support of Nadal’s case, this is particularly true of players returning from injury.  My system never removed Juan Martin del Potro from the top 10 or so; using a one-year system, the ATP rankings saw him drop far out of the top 100.

If you are to use two years of results, it is absolutely imperative to differentiate between recent results and older results.  In fact, even a simple approach of this sort would improve the current 52-week system.  My algorithm weights results one year ago about half as heavily as last week’s, and two years ago roughly one-quarter as heavily.  The weighting is not simple, and thus would be inappropriate for the ATP system, which must be easily understood by both players and fans, but it points the way toward simpler solutions that might work.

That’s enough for today.  Check back tomorrow, when I’ll go into more depth about how the current ranking system can be improved.

1 Comment

Filed under Harebrained schemes, Rafael Nadal, Rankings

Better Players in Smaller Tournaments

Last week, Jurgen Melzer entered the qualifying draw of the ATP Zagreb Indoor event.  Melzer is ranked about #40 in the world; players ranked at least #116 earned direct entry into the main draw.  Melzer decided long after the entry deadline that he wanted some matches in advance of this week’s Davis Cup, so he took the only route remaining open: qualies.

This precise scenario is not a common one.  Because tournament entries must be submitted so early, top players err on the side of entering too many.  If they ultimately decide not to play, there’s usually a convenient injury and an apologetic withdrawal.  When top players do make last-minute decisions, like Melzer did, tournament organizers often have a wild card to spare, giving the star direct entry.

It’s tempting to say that there’s a problem with the early deadlines for tournament entries.  Surely, if players didn’t have to decide so early, they might choose to enter more 250s and 500s.  But the early deadlines are there for a reason.  Not only do they allow players and their entourages to make travel arrangements, but they also lock players in so that tournaments can advertise their lineups.

The problem may not be with early deadlines, but we do have a sub-optimal arrangements.  Players enter tournaments they may not play (and tournaments advertise players who won’t show up), players don’t enter tournaments they may want to play (and events can’t advertise those players), and tournaments have less direct control over their field than they would prefer.  32-draw events only get three wild cards, and they want more.

Here’s my solution: Every withdrawal turns into an additional wild card.

Almost every tournament sees a player or two withdraw after the entry deadline but long before the start of qualifying.  Currently, those openings go to the highest-ranked entrant not yet in the main draw.  It’s not uncommon to see a half-dozen alternates in a main draw, sometimes including guys far down the list, after other alternates have opted for challengers or other ATP events.

Here are some benefits of my proposal:

  • Most obviously, tournaments have more control over their draws.  Rather than admitting a handful of players ranked between #100 and #120, they can add the top-tenner who lost his first-round match last week.   Or a local hero who just won a challenger.
  • More importantly, fans get (probably) better and (definitely) more crowd-pleasing players.  The best players (regardless of box-office value) are still invited to enter, and tournament directors have more leeway to give the fans what they want.
  • Players have less reason to enter events they may not play.  (Of course, this could become something of a vicious cycle–fewer entries lead to fewer withdrawals, which leads to fewer additional wild cards … which could result in more of these entries.)
  • Players can get into events at the last minute.  Melzer would get his Davis Cup warmup without have to go through qualifying.

There are a few potential drawbacks:

  • Fewer opportunities for journeyman pros.  Under my plan, Melzer would’ve booted Grega Zemlja, a guy to whom the tennis establishment hasn’t exactly granted many favors.  Then again, Zemlja isn’t likely to do much for the tennis establishment, either.
  • Tournaments could use the extra wild cards to weaken a draw with low-ranked locals.  A tournament director wanted to do some favors could easily turn Delray Beach into a clone of the Dallas Challenger.  To avoid that, the rule could be supplemented by stipulating that only one of the additional wild cards could be used on a player outside the top 200.   Any number of variations would maintain the quality of the draw.
  • It’s conceivable that tournaments could pressure players to withdraw, making room for a box-office draw.  That’s an ugly situation to imagine, and an appropriately stringent policy would need to be put in place to prevent it.

The only clear losers here are journeyman pros–the guys who hang around on the fringes of the main draws but would not regularly receive wild cards as compensation.  As much as I like those guys, their occasional entry as an alternate into an ATP 250 main draw is a sacrifice I would be willing to make.

The potential benefits are simply too great.  More good players–and by extension, more good matches–in more tournaments? It is almost too easy.

1 Comment

Filed under Harebrained schemes