Ratings and records and misconceptions about analytics

Cam Charron
May 21 2014 10:09AM

I was going to write a post explaining the issues and inconsistencies with that article that everybody is talking about, but Tyler Dellow has already done an excellent job and it might be redundant. I think the biggest point of contention for me is that a particular writer at the Toronto Sun has latched onto the idea that Corsi is a terrible stat because it says Jay McClement is the worst player in the league.

And I'd like to clear up a misconception or two.

There's a difference between a "record" and a "rating". A "record" is an objective agent seeing an event unfold in front of him or her, and noting what happened. The agent records things like goals, shots, hits, faceoff wins, faceoff losses, giveaways, takeaways, or anything else you might see in your Event Summary at NHL.com.

By contrast, a "rating" is a tool used to rank certain players. (A rate, which is more mathematical in nature, differs from a "rating" too, and we'll see that later on in the post.)

Bill James, when working in the early days of baseball analytics, liked to differentiate between the two. A player's OPS isn't a rating of the best hitters in baseball: it is a player's on-base percentage plus his slugging. Both the on-base and slugging numbers are objective records. Nobody is going to Fangraphs, looking up the player with the best OPS the previous season and saying "now that hitter is the best hitter in baseball, without argument". They are saying "that hitter is the hitter with the highest OPS last season".

When discussing Corsi, it's important to stress that Corsi is a record of events, and not a rating of the best players. If you go to ExtraSkater.com, sort the Corsi For % players by ascending order, you'll note that Jay McClement is the third "worst" player in the league in terms of Corsi.

What that means:

Jay McClement was the third most out-shot player in the NHL.

It does NOT mean:

Jay McClement is the third worst player in the NHL.

It really is important to understand what is a record and what is a rating, and the usefulness of each. I really have little use for ratings. You could look at hundreds of different records of information and it's up to the user, really, to determine which is the most important.

I have seen not a single person turn around the chart, note that Patrice Bergeron is the player in the NHL with the highest Corsi For rate (on-ice shot attempts for divided by [on-ice shot attempts for plus on-ice shot attempts against]) and be able to conclusively say "Patrice Bergeron is the best player in the NHL".

The high Corsi rate, however, can give us clues into how good the Bruins are as a team, Bergeron's influence over his line, and, noting that Boston has won a lot of games in the last few seasons, how valuable Bergeron is as a player. I don't think Patrice Bergeron is the best player in hockey. Sidney Crosby is. Second might be Bergeron or Anze Kopitar. All three players show an affinity for offence despite playing very tough shutdown roles and, since they all play for excellent teams (Jonathan Toews is in the conversation as well) it can be argued from the information that a high-quality No. 1 centre is key on an NHL team.

Corsi is a team rating rate*, as in, when Bergeron is on the ice against McClement, and the Bruins take three shots during the shift, all five Bruins get three plusses and all five Maple Leafs get thre minuses. Is that fair? No, but again, we're talking about records here. Once we have enough information at our disposal, we can begin to look at see what combinations are successful, which players lead to more success with which players, all when trying to keep into context a player's defensive role or the number of times he starts a shift at either end of the ice. There are dozens of other factors you could consider.

Corsi itself isn't perfect. It will take a while for a player's Goals For % (which is ultimately what we strive for in hockey—to find players who can help my team score more goals than the other team) because hockey is replete with randomness. Generally, a good possession rate from a player will turn into a good goal rate (this is an excellent post with a couple of graphs you should check out

As for McClement, the reason why our eyes tell us McClement is better than the third-worst player in the league is because he is. When you watch McClement, you see a serviceable defensive specialist, who was highly-regarded by the advanced stats community a few years ago.

McClement, like many Leafs, has been woefully misused by Randy Carlyle. Last season, McClement was barely even allowed to skate the puck in over the line. He had to start a huge share of his shifts in the defensive zone. He is, quite clearly, a better hockey player than he's been allowed to be under Carlyle, and the result is that McClement is the third most out-shot player in the league. That is a record. Statistical agents recorded 917 shot attempts by the opposition when he was on the ice, and just 578 for the Leafs. They recorded 30 goals against and 17 goals for (his Goals For % was very close to his Corsi For %).

What to do with the information that McClement gets horrendously out-shot is not to come to the conclusion that McClement sucks and he should not be re-signed. What I would do is get in a meeting with Carlyle, critically discuss the thought process when McClement is on the ice, and note that the Corsi % shows that whatever it is the Leafs are doing with McClement out there isn't working.

Why Corsi % and not Goals % if we're looking for players who can out-score other teams? Well, because Corsi is a better indicator of ability in a small sample. The trick isn't to look at the record of past goals, but attempt to determine future goals. It's difficult, it's imprecise, but that's the nature of sports. You don't want the data to be too clean, because that would make it uninteresting. Dave Cameron had a post on FanGraphs the other day suggesting that using statistics means that the right stories will be told, and I couldn't agree more. Everybody who covers the game needs to step back and realize their intuitions can sometimes be incorrect. Humans are imperfect, and yet attempt to be perfect and create clean models in virtually every industry. Nobody is saying Corsi is perfect, but of all the statistics we have for attempting to predict future games, you could do a lot worse. Anybody who says they have a perfect model is eventually going to pay the piper, and anybody who attacks a model when its basis is that it works in imperfections is missing the point entirely.

* - "whoops" #irony

ASK ME ANYTHING

To clear up some of the misconceptions, I'll have no trouble answering emails and questions about statistics in hockey. Send me an email ( camcharron(at)gmail.com ) and I'll try to answer as many questions as I can for beginners, whether in written form or in video form.

I understand there's been a lot of information about the trendy numbers dropped on us by CBC and TSN, who are quoting Corsi numbers almost daily now, and the term "advanced" analytics can scare people off. There really is nothing to be scared of. The "advanced" analytics are refreshingly simple. I, myself, prefer very simple indicators about performance.

And for those who say "stats don't tell the whole story" I'd urge you to read this post over at Canucks Army. A writer there, Rhys, compared the selections made by the Vancouver Canucks scouting staff over an 11-year period to simply taking the available 17-year-old player with the highest point totals in the Canadian major junior system. Stats don't tell the whole story, but the whole story also doesn't tell the whole story. There are still many stories ready to be written.

63811cbf517d2d685ea09e103488ea3a
Cam Charron is a BC hockey fan that writes about hockey on many different websites including this one.
Avatar
#1 Truth Observer
May 21 2014, 06:00PM
Trash it!
3
trashes
Props
1
props

Comment #53 on that Canucks Army article completely dismantles the entire methodology "Rhys" used

Also, I was reading an old article about Horvat on that site, and in the comments you mentioned that Montreal's 2012 draft would prove that teams should just use Central Scouting's rankings for drafting? Need I remind you that Radek Faksa was ranked #6 for NA skaters that year?

Even if Collberg/Thrower do become established NHLers, it's still way too small a sample size for you to be able to back up such a bold statement.

Avatar
#3 Dave
May 21 2014, 08:31PM
Trash it!
0
trashes
Props
1
props

Hey Cam, I just have one question. What is the purpose of using Fenwick over Corsi or vise-versa? I know Corsi tracks blocked shots while Fenwick does not, and have also heard that Fenwick is a better predictor of future events than Corsi. If that is the case, why use Corsi at all over Fenwick, and what can we learn from one stat we cannot from the other?

Thanks for any help!

Avatar
#4 Truth Observer
May 21 2014, 09:02PM
Trash it!
1
trashes
Props
0
props

*Comment #54

Avatar
#5 Tim Bayer
May 21 2014, 10:53PM
Trash it!
0
trashes
Props
0
props

@Dave

Cam will have a better answer, but the only reason I can think of to prefer Corsi over Fenwick is the sample size is larger because Fenwick excludes blocked shots. Over a large sample, however, Fenwick is the better predictor.

Avatar
#7 Kad Chilger
May 22 2014, 08:57AM
Trash it!
0
trashes
Props
0
props

Hi Cam, So, THoR seems to be pretty useless. But I'm curious, do you see a potential path towards some sort of genuine, predictive, successful, "total hockey rating" some day? In other words, are there serious theoretical limitations to the predictive power of statistics? Or is it just engineering problems that will be solved over time.

(Aside from luck. Obviously luck will always exist, but you know, keep creeping that predictive power up and up)

Avatar
#8 deaner
May 22 2014, 10:28AM
Trash it!
0
trashes
Props
0
props
Truth Observer wrote:

*Comment #54

@Truth Observer

I read comment #54, but I think the commenter is wrong for the following reason:

In each round of a draft, once the Canucks are "up to bat", they have a set of pre-ranked prospects to decide between. Rhys's presumption is basically that these pre-ranked prospects would be similar to the unchosen players from the current pick to their next pick as it actually happened in the draft.

Now to help them decide who to pick from this set of 30 players, they can either trust their so-called expert internal scouts, or just say "let's pick the highest scoring CHL 17 year-old". As it turns out, most of the times their scouts were useless.

And this leads us to the Leafs, who I'm sure did even worse than the Canucks did in those same draft years. The smoking gun is in 2003, the alleged "best draft year of all time", when the Leafs got John Mitchell in the 5th round, and basically that's it. SMH.

That said: bring on the Stats.

Avatar
#9 Benjamin
May 22 2014, 04:57PM
Trash it!
0
trashes
Props
0
props

@ Truth Observer, Cam Charron, deaner

While it doesn't completely discredit what Rhys' is saying it does weaken his conclusion that his methods are impossible.

If he'd reached the same conclusions using say 'highest point getter of the next top 20 players using Central Scouting NA skater rankings', then its a very strong argument since he's using only free sources of information weighted heavily on a purely statistical metric. Unfortunately he picked some bizarro scenario where he could pick again after the draft like that strange Christmas present game, plus all the presents were heavily scouted by other NHL teams and, ultimately, selected.

Avatar
#10 CAm da Man
May 24 2014, 02:15AM
Trash it!
0
trashes
Props
0
props

Basically, Cam is the man. I miss reading his work over at Canucks Army...... where did you go Cam? :(

People who fail to take into account "advanced" statistics are the same bums who think Carlye is a good coach. Welcome to the 21st century, as time goes on new methodologies and analytics are created. Time for people to come onto the 21st century train.

Comments are closed for this article.