May 21 2014 10:09AM
I was going to write a post explaining the issues and inconsistencies with that article that everybody is talking about, but Tyler Dellow has already done an excellent job and it might be redundant. I think the biggest point of contention for me is that a particular writer at the Toronto Sun has latched onto the idea that Corsi is a terrible stat because it says Jay McClement is the worst player in the league.
And I'd like to clear up a misconception or two.
There's a difference between a "record" and a "rating". A "record" is an objective agent seeing an event unfold in front of him or her, and noting what happened. The agent records things like goals, shots, hits, faceoff wins, faceoff losses, giveaways, takeaways, or anything else you might see in your Event Summary at NHL.com.
By contrast, a "rating" is a tool used to rank certain players. (A rate, which is more mathematical in nature, differs from a "rating" too, and we'll see that later on in the post.)
Bill James, when working in the early days of baseball analytics, liked to differentiate between the two. A player's OPS isn't a rating of the best hitters in baseball: it is a player's on-base percentage plus his slugging. Both the on-base and slugging numbers are objective records. Nobody is going to Fangraphs, looking up the player with the best OPS the previous season and saying "now that hitter is the best hitter in baseball, without argument". They are saying "that hitter is the hitter with the highest OPS last season".
When discussing Corsi, it's important to stress that Corsi is a record of events, and not a rating of the best players. If you go to ExtraSkater.com, sort the Corsi For % players by ascending order, you'll note that Jay McClement is the third "worst" player in the league in terms of Corsi.
What that means:
Jay McClement was the third most out-shot player in the NHL.
It does NOT mean:
Jay McClement is the third worst player in the NHL.
It really is important to understand what is a record and what is a rating, and the usefulness of each. I really have little use for ratings. You could look at hundreds of different records of information and it's up to the user, really, to determine which is the most important.
I have seen not a single person turn around the chart, note that Patrice Bergeron is the player in the NHL with the highest Corsi For rate (on-ice shot attempts for divided by [on-ice shot attempts for plus on-ice shot attempts against]) and be able to conclusively say "Patrice Bergeron is the best player in the NHL".
The high Corsi rate, however, can give us clues into how good the Bruins are as a team, Bergeron's influence over his line, and, noting that Boston has won a lot of games in the last few seasons, how valuable Bergeron is as a player. I don't think Patrice Bergeron is the best player in hockey. Sidney Crosby is. Second might be Bergeron or Anze Kopitar. All three players show an affinity for offence despite playing very tough shutdown roles and, since they all play for excellent teams (Jonathan Toews is in the conversation as well) it can be argued from the information that a high-quality No. 1 centre is key on an NHL team.
Corsi is a team
rating rate*, as in, when Bergeron is on the ice against McClement, and the Bruins take three shots during the shift, all five Bruins get three plusses and all five Maple Leafs get thre minuses. Is that fair? No, but again, we're talking about records here. Once we have enough information at our disposal, we can begin to look at see what combinations are successful, which players lead to more success with which players, all when trying to keep into context a player's defensive role or the number of times he starts a shift at either end of the ice. There are dozens of other factors you could consider.
Corsi itself isn't perfect. It will take a while for a player's Goals For % (which is ultimately what we strive for in hockey—to find players who can help my team score more goals than the other team) because hockey is replete with randomness. Generally, a good possession rate from a player will turn into a good goal rate (this is an excellent post with a couple of graphs you should check out)
As for McClement, the reason why our eyes tell us McClement is better than the third-worst player in the league is because he is. When you watch McClement, you see a serviceable defensive specialist, who was highly-regarded by the advanced stats community a few years ago.
McClement, like many Leafs, has been woefully misused by Randy Carlyle. Last season, McClement was barely even allowed to skate the puck in over the line. He had to start a huge share of his shifts in the defensive zone. He is, quite clearly, a better hockey player than he's been allowed to be under Carlyle, and the result is that McClement is the third most out-shot player in the league. That is a record. Statistical agents recorded 917 shot attempts by the opposition when he was on the ice, and just 578 for the Leafs. They recorded 30 goals against and 17 goals for (his Goals For % was very close to his Corsi For %).
What to do with the information that McClement gets horrendously out-shot is not to come to the conclusion that McClement sucks and he should not be re-signed. What I would do is get in a meeting with Carlyle, critically discuss the thought process when McClement is on the ice, and note that the Corsi % shows that whatever it is the Leafs are doing with McClement out there isn't working.
Why Corsi % and not Goals % if we're looking for players who can out-score other teams? Well, because Corsi is a better indicator of ability in a small sample. The trick isn't to look at the record of past goals, but attempt to determine future goals. It's difficult, it's imprecise, but that's the nature of sports. You don't want the data to be too clean, because that would make it uninteresting. Dave Cameron had a post on FanGraphs the other day suggesting that using statistics means that the right stories will be told, and I couldn't agree more. Everybody who covers the game needs to step back and realize their intuitions can sometimes be incorrect. Humans are imperfect, and yet attempt to be perfect and create clean models in virtually every industry. Nobody is saying Corsi is perfect, but of all the statistics we have for attempting to predict future games, you could do a lot worse. Anybody who says they have a perfect model is eventually going to pay the piper, and anybody who attacks a model when its basis is that it works in imperfections is missing the point entirely.
* - "whoops" #irony
ASK ME ANYTHING
To clear up some of the misconceptions, I'll have no trouble answering emails and questions about statistics in hockey. Send me an email ( camcharron(at)gmail.com ) and I'll try to answer as many questions as I can for beginners, whether in written form or in video form.
I understand there's been a lot of information about the trendy numbers dropped on us by CBC and TSN, who are quoting Corsi numbers almost daily now, and the term "advanced" analytics can scare people off. There really is nothing to be scared of. The "advanced" analytics are refreshingly simple. I, myself, prefer very simple indicators about performance.
And for those who say "stats don't tell the whole story" I'd urge you to read this post over at Canucks Army. A writer there, Rhys, compared the selections made by the Vancouver Canucks scouting staff over an 11-year period to simply taking the available 17-year-old player with the highest point totals in the Canadian major junior system. Stats don't tell the whole story, but the whole story also doesn't tell the whole story. There are still many stories ready to be written.