# The Theory and Nature of Current Advanced Hockey Analysis

Before we proceed with the rest of the Flames "mediocrity" series, it makes sense to discuss and explicate the basis of the analyses we tend to engage in here. The theory, the application and the practicality of the stats we employ and the resultant interpretations.

Robert Cleave’s recent article on Rene Bourque is an illustration of one of the basic tenets of analysis that is frequently overlooked: the the game of hockey isn’t one of totals but of differentials, of ratios. In 2009-10, Bourque scored 27 goals and 58 points. Last season, he managed a similar stats line: 27 goals and 50 points. Judging by the boxscores, Bourque was marginally worse than his career peak the prior year. In truth, he took a drastic step back. His stark devolution as a player was somewhat obvious to regular observers of the team last season, but is fully revealed when we peel back to the thin, but opaque layer of counting stats and delve into the numbers underneath: Bourque spent more time in the defensive zone relative to the prior year and more time getting outchanced and outscored. His presence caused a dip in scoring chance ratios with other players across the board. He was, by and large, a detriment. The eight point dip in his statsline was only the merest hint of his decline.

Hockey is about getting more not lots. The distinction is an important one, because the latter does not necessarily guarantee the former. The statement about totals and differentials above is an axiom I’ve come to accept since I began writing about the game. A Player who drives differentials help teams win and often times their basic counting numbers have only a passing relationship to the his actual value.

### On Books and Covers

"Counting numbers" is the name given to the familiar, conventional stats everyone recognizes and references in their analysis (goals, assists, points, etc). A better moniker would probably be "surface stats". They are the seemingly calm sea above a roiling soup of antecedents that are largely hidden from view. For some guys, counting stats are but a pale reflection of his abilities. For others, the boxscores act as a facade, worn to mask significant warts. For the former, a statsline can be a thin veil obscuring his value or even a scarlet letter, a stigma unfairly imposed by the results that impugns his true quality. For the latter, the boxcars are a vanity and inexorably fleeting. The quest to dump bad contracts that were previously signed by overeager managers every summer could be dubbed the bonfire of the vanities I think.

The penchant for even NHL general managers to be dazzled by the superficial illustrates the the seductiveness of surface stats to even the highest level decision-makers in the game. Results, after all, is what everyone is ultimately after, so it remains forever tempting to chase results in the pursuit of success. Doing so means inverting the causal chain, however: true analysis is understanding the variables and agents that give rise to outcome. The means to the end rather than just the end itself. The coal before before the diamond, if you will.

This is perhaps the crossroads at which conventional thought and so called "advanced stats" most frequently clash. Every hockey fan’s (coach’s, GM’s, etc.) perception of a player is inevitably anchored by heuristics; "rules of thumb" which evolve from information that is perceptually impactful or easily available. Human cognition works in general by conjuring habitual psychological markers that act as lighthouses in the maelstrom of data that is life in general. The problem is, a heuristics is not a principle: it is a quick-start guide at best, an inherent bias at worst. It isn’t the template. It is the stereotype.

The conflict occurs when analysis of the underlying numbers disagrees with the connotations we attach to the surface results. Think about the common mental signposts that are almost universally employed. A 20-goal scorer is a pretty good player, right? What comes to mind when one thinks of a 50-point player, for instance? How about a 10-goal forward versus and 10-goal defender? The automatic mental ranking that is next to unconscious for the experienced hockey fan is the activity of heuristics – rules of thumb – that are essentially functional assumptions in aggregate, but not necessarily accurate in the specific.

If we were to draw a venn diagram of the common perceptions of players based on results/surface stats and their true value or skill level, there would likely be some overlap. All things being equal, a 20-goal scorer probably is a pretty good hockey player.

All things are rarely equal, however, which is where the two tracks often part ways. Some 20-goal scorers play against lesser opponents, or spend a lot of time on the powerplay or boast a career high SH%. Others play against superstars and start out more often in the defensive zone. The surface stats say the two guys are roughly equal. The heuristics we’ve developed decide that there probably isn’t much to separate them. But, in truth, one is far more valuable to his team than the other.

As humans, we tend to cling to already held beliefs even in the face of competing evidence. In some ways, the confirmation bias ensures our cognitive landscapes aren’t consistently upended by new information. In others, it means we reject what might be accurate or true because it doesn’t accord with what we believe or prefer. In the first, it means we aren’t overly gullible or endlessly indecisive. In the second, it means we’re stubborn or willfully ignorant. That is the inevitable tug-of-war we all wage and the framework through which data and analysis is filtered.

### Possession based analysis

The theoretical value and legitimacy of the corsi statistic and possession based analysis is a hotly debated topic in some quarters as result. Not only is it new and therefore relatively unknown and untested in the eyes of most, but it sometimes flies in the face of long held beliefs or seemingly common sense conclusions. Here I will present the general theory surrounding the corsi school of thought, the mounting evidence for it’s efficacy as well as the practical applications.

As mentioned, surface stats are the effect of an interplay of causes. Goals, assists and goal differential are determined by two primary factors: volume of shots for and against and the frequency of goals scored for and against. We’ll call the former possession and the latter percentages.

Possession in this context is short-form for "possession of the puck in the offensive zone". Teams that control possession at even strength tend to have higher shot counts overall as well as better corsi differentials or ratios. The corsi stat is best conceptualized as a proxy for zone time: a high corsi differential or ratio means a player or team spends more time in the offensive zone, and vice versa. The importance of possession has been demonstrated over and over by various investigations: the correlation between corsi ratios and scoring chance ratios is persistently high (ranging from 0.7 to 0.9), for instance and JLikens of the Objective NHL has shown that the correlation between corsi and outscoring at even strength is on the order of 0.5-0.6 over a sufficiently large sample. Possession stats tends to persist as well (all things being equal) meaning corsi is reliably measuring a skill rather than merely chance or some other variable.

Percentages, on the other hand, tend to regress to the mean over time. This suggests that natural variance – or "luck" – has a stronger influence on them than ability.

To understand this, most people need to strip "luck" of the connotations is carries regarding issues of fairness and justice. A lot of discussions get sidetracked by people bristling at the suggestion that a given athlete or team on a hot-streak is undeserving of their success when variance is mentioned. An analogy that might help –

Consider each shot on net to be a lottery ticket, with the prize being a goal. Some tickets have a higher chance of winning than others: the chances range from about 3% (unscreened point shots, shots from very sharp angles) to about 25% (crease shots, break-aways, etc.). Mid-range scoring chances tend to fall in the middle – a lottery ticket with about a 15% chance of winning. In just about every game, there’s a lot more lower quality chances than higher, which is why even mediocre goalies have a SV% at or about .900 in the league. Percentages or chances of scoring in the NHL seem rather low because of the quality of the competition between players, the quality of netminding in the modern league, the size and strength of the guys in general, the advancement of equipment, the proliferation of advanced scouting, etc.

– cartoon via wondermark

To score, teams try to up their number of relatively high percentage shots or lottery tickets every game by driving the puck into the middle of the ice and closer to the net. They also do everything possible to restrict the opposition from doing the same. This is essentially what we’re trying to capture with possession stats: finding players and teams that spend more time in the offensive zone, essentially driving scoring chances for and/or lessening scoring chances against. Remember, the correlation between corsi and scoring chance differentials is persistently high.

Of course, goals are relatively random events and randomness tends to be rather untidy in small samples. Some games, the teams who outchance the bad guys don’t win: they’ll scratch more high probability (15%-25%) tickets, but won’t end up with as many winning as a matter of variance. After all, 15% is 15 out of 100, or just 1.5 out of ten. If you were to gather 100 lottery tickets, each with a 15% chance of winning, it doesn’t mean the spread of winners would be uniform over each, say, 10 ticket sample. In some, you might get 4 or 5 winners. In others, none. This is why we generally regard the percentages as fickle.

In time, however, the chap who collects the most high probability lottery tickets is probably going to win the most. And the greater degree to which he collects lottery tickets, the greater chance he has of winning. Of course, outside of getting all the tickets (versus none) there’s always the slim chance he’ll lose.

The possession versus percentages issue is basically one of skill versus variance or luck. A few commenters and analysts hold out a belief that the percentages can be driven in absense or even contrast to possession, but most analyses agree that NHL players and teams have far more ability to drive possession, whereas the percentages are more or less at the mercy of the hockey gods.

Most advanced stats analysis is focused around this general theory currently. The bulk of on-going inquiries is aimed at teasing apart the variables that moderate possession at both the individual and team level. Some of these factors include quality of line mates, quality of opposition and starting position. The overarching goal is to isolate individual contributions to possession, be it from the players themselves, or coaching systems, face-off zones, playing-to-score effect, the nature of different positions, etc. These are the multitude of variables that go into determining a player or teams surface stats: understood and viewed through the prism of possession analysis, we are just now starting to appreciate the influence all of the varying factors have on corsi (and therefore on scoring and goal differential), how variance sullies the waters, and the resultant perceptions of his abilities. In truth, we have only made a few cautious steps forward in terms of truly extricating a players true skill level quantitatively from all the competing noise. But I’d argue we’re on the right track.

### Practical Applications

Obviously, prediction of the future is the ultimate goal of such analysis. The practical application of possession theory has allowed myself and others to predict a number of outcomes over the last few years, including:

These are only a few examples. I invite others to share more in the comments.

Of course, there have been errors too. As mentioned, hockey quantitative analysis and modeling is only now getting beyond it’s infancy. Further complicating matters is that fact there are always variables and events that can’t be foreseen: injuries, the effect of unknown or unpredicted additions/subtractions to the roster, locker room discord, etc. The future can never be fully known. That’s what makes the whole thing worth watching after all.

The number of correct "hits" has improved and the body of evidence surrounding this school of thought grows larger all the time, however. To dismiss it out of hand would be folly I think. This discussion of the topic wasn’t exhaustive, but I hope it shed some light on the broader issues at hand for those who were confused or unsure.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

• loudogYYC

Kent, your articles definitely have some of the best content that I’ve ever read related to Hockey. I’m glad you cover the Flames.

I still don’t understand these stats fluidly, but your writing really helps.

Have you had a chance to share your insight with Chris Snow or the rest of the Flames brass? I look forward to more of this.

Thanks, Kent.

• Kent Wilson

Thanks, I’m glad it helps. We’ll continue to provide this sort of analysis as much as possible. As always, I encourage anyone who has questions about this stuff to email me directly.

I hope to eventually sit down with Snow in the near future have a talk. We’d publish the results here, of course.

• ChinookArchYYC

Kent,
This is a well written article and your arguements for advanced stats are very compeling. I couldn’t help to think of the ‘rule of thumb’ most GM’s use to draft players. In DSutter case, he seemed to be most interested in drafting for size and bloodlines. I can’t imagine he would ever consider a skilled 5’6″, 130 lb. winger, no matter how far down the draft the player appeared. No, I’m not talking a about John Gaudreau, just be thankful that DSutter was not drafting for us in 1987.
Has any analysis be done to see if stats like Corsi are useful for Junior and AHL players? It would be interesting to see whether our draft choices actually make their team mates better, when they play.

• Kent Wilson

Kent,
You put too much science in the game of hockey. Goals and assists equal points. That’s the difference between a Hall of Famer and NHLer.

• ChinookArchYYC

Your wrong. Goals and assists are the uncontrolled results of what players do during a game. Aside from dumb luck, good players are successful (and score) when they drive possession of the puck and when they are out chancing (with quality opportunities) opposing players, the same goes for good teams. In other words, players and teams with higher Corsi rating are probably going to experience more success.

• Kent Wilson

Well, yes. You’ve simultaneously opened and closed the book on hockey analysis. Close up shop everyone. Sean has it figured.

• Derzie

Wow poor Kent,
He’s just like a penis, you rub him the wrong way for a long time and he spits at you.

• PrairieStew

Hey Sean.

Contribute in a meaningful way or disappear.

• Cam Charron

Witchcraft, is what it is…

• PrairieStew

I am somewhere in between. Possession is great, but quality of possession is hard to quantify. You can have a guy who its great at cycling the puck in the offensive zone who has terrible hands, and compare him to a Mogilny-type player who is invisible for a long time but manages to score a couple and make a difference. The former will have a good corsi and the lattera poor one.

• Ryan Popilchak

In the case of Corsi, cycling would need to create shot attempts to be given a +1. Corsi is a proxy for possession, but it does actually measure shots, so in my mind it’s a bit more useful than a “time of possession” metric would be.

That said, cycling in the offensive end is a form of defense. When you control the puck, the other team isn’t creating scoring chances. I would take a 4th line shift in which they cycled the puck for a minute and got one poor-angled shot on net before the whistle any day over getting a quick couple shots on net and then getting trapped in our own zone for a few shots against.

• MC Hockey

Thanks Kent. I hope you can comment on a couple topics here that others wouuld be interested in as well perhaps (?). Please see A and B below and again, would love your replies.

A. I always believed corsi was a very important measuring tool and more important than the luck or randomness of surface statistics, but had two problems with fully comprehending corsi. First, I could not previously fully appreciate that it means “ratio of offensive zone possession to defensive zone possession” which perhaps I can now due to the explanations above (and having more sleep as my baby boy gets older). Second, I usually had trouble reading the corsi graphs posted by Robert for each Flames games on this website and thus could not really draw much value. Thus hopefully format changes to this blog website will allow you to post more clear corsi and other stats in 2011-12.

B. DO you have a graph and league ratings on corsi of some of the top 20 Free Agents signed on July 1, 2011 and since that date. Why do I ask? Two reasons:

1) Are the re-signings by Flames good ones based on corsi? YOu talked alot about them so perhaps you can briefly re-comment on Morisson, Glencross, Tanguay and Babchuk.

1) Want to know if Flames missed out on anyone who was strong on corsi and reasonably priced in the end. Some background….I was one of the five finalist guys in the “Flames Plan” contest and thought the Flames should sign younger UFAs Leino, Clitsome, and Eaves but definitely NOT at the big #s they received. But ignoring those 3 guys, should the Flames have considered some other guys who signed recently (but unfortunately are older) like Prospal (just signed), or Langenbrunner or Arnott, or others….? What about unsigned guys like Zherdev?

• BobB

I see the numbers that Rene Bourque had a awful year last year.

Ok, agreed. .

2. Then, our logic cycle says “Rene Bourque was one of the main reasons why the Flames were mediocre this past season.”

We have no control group in this analysis, or do we?

Rene Bourque was an excellent player with Daymond Langkow. Was this Bourque or Langkow? Likely. Bourque is BETTER with Langkow. Right?

Advanced stats show that Bourque was terrible, or that Bourque had an off year, or that Bourque wasn’t good with all the other players points to the same thing, doesn’t it? Daymond Langkow.

Isn’t this what line combinations and “chemistry” and team strength about? What if Bourque with Langkow ENABLES those good possession numbers, and without, Bourque doesn’t have them. Isn’t this what happens with Tangs and Iggy?

I guess what I’m saying is that, all though I understand the desire to examine what players are “driving the bus” and what players are “riding the bus” I dunno if we can say “Player X was terrible irrespective of the strength of the team”

Calgary was and is a mediocre team, and this is going to have an influence on all the players in a significant way that we want to attribute to them as individuals.

How good/bad can a terrible player be on a good team?

How good/bad can a great player be on an awful team?

Somehow I can’t help but feeling that we turn our attention to the underlying numbers, prescribe them massive significance and conclude our judgement based on them.

Why did Bourque not have an alright year considering he was missing Langkow?

I understand he played more in the defensive zone…. but with worse teammates, is that really surprising? We say “Rene Bourque was likely, relative to what was expected and required of him, the worst of all the significant Flames skaters, and second worst wasn’t even within hailing distance.”

It’s like that with Kiprusoff as well. Flames goalies are terrible. All of them. All for the last 5-6 years. Yet Kiprusoff is markedly better, where “second worst isn’t even within hailing distance.” Yet without context we want to prescribe that all to Kipper’s failings. Maybe we’re missing team context? Defensive context?

• Ryan Popilchak

Wow, you guys are really starting to hurt my head. It’s like probing into stool samples to check out textures to see what they ate before each friggin game. Enough. You know what it comes down to me, these guys are paid 3.0+ million a year to play with skill, energy and consistency irregardless of which 3.0Million + player they are playing with. You get paid these kind of \$\$\$\$, then expectations go with the job. Maybe the next CBA should have salaries based partly on pay for performance.

• Reidja

Wow… It’s actually nothing like probing through stool samples… I suppose you just keep an eye on the box scores if you want. Keep in mind that all of these players get paid to compete against each other. Kent is talking about painting a more detailed picture of who comes out on top rather than just looking at the final score.

Kent, I like the fact that you said “Obviously, prediction of the future is the ultimate goal of such analysis.” I think that this is important to remember (obvious though it may be). Advanced stats should be a great scouting tool – teams should start tracking this stuff in all of the junior feeder leagues ASAP.

• Reidja

Great article Kent! While I don’t personally deal with underlying metrics, I would say these measures are useful 90% of the time. The other 10% seems to tell us exactly the opposite of what we want to make assessments on players.

For instance, I don’t think you can ever discount point totals. Even against lesser competition, good players should put up high point totals. It shouldn’t be a “yeah, but…” scenario. On the surface, underlying metrics can illustrate how important players like David Moss, Tim Jackman, Daymond Langkow can be to a team… despite mediocre box score stats. These aren’t top-tier players, but they out-perform their competition (even if it is secondary competition). If you were to put David Moss on the top line, and Tim Jackman on the 2nd line for a season, I believe we’d be saying “what a horrible season these two had, they were underwater in terms of possession, corsi, etc.” As it stands now, they are outplaying the opposition in their current roles, and that’s what NHLers should do. Even if Matt Stajan is on the 4th line (groan), so long as he outplays other 4th lines, he’s doing his job. To this point, we haven’t seen that.

On the other hand, the recent article on Rene Bourque didn’t show me that he had a horrible season and should be trade bait. Rather, it showed me how damn skilled Bourque is to have posted 27 goals, in a 2nd/3rd line role, without breaking a sweat. It shows me that he can score goals when he feels like it – a rare breed on this team. I think skilled guys like Bourque and Hagman that had tough years will bounce back when paired with the lunch-bucket gang of Moss, Langkow, Jackman etc. For one, I believe a line of Staj-Langkow-Bourque would be one heck of a third line. We’ll call it the BounceBack Trio

I’m very interested to read these posts on underlying stats and believe they can be very useful. There are times, however, when we seem to dig too deep and lose sight of the end result. Olli Jokinen once scored 90 points in back-to-back seasons in his mid-20s. We don’t have to like it or agree with it… but even against lesser competition, and in the soft southeast division, he put up point-per-game numbers playing in the best league in the world. So, much as we like to hate on Darryl Sutter, I probably would have traded for a 29-year old Olli Jokinen at the 09 trade deadline too, underlying numbers be damned.

• Derzie

With stats, I think the top goals are to assess contributions as a means to predict future performance or to compare players. If I’m on the hook for this I attempt to model the behaviour of those that do those things, namely scouts, coaches and GMs. There are examples of good and bad for each of these roles. What does a good scout look for? What does a bad one look for? Is there a trend? The goal of the statistician should be to model those decisions. Make it a formula that could be automated. Player Value = some formula that combines these stats into a meaningful representation of what expert assessors think. Where I have a problem is it seems the statisticians are trying to take the place of the experts and assume what’s important. You need experts in the business of hockey scouting/management to define what an assessment looks like. It’s up to the stats guys to find a solution that works. I’ve seen this work in the business world as much as I’ve seen the opposite approach fail. The hockey stats approach needs to change. Everything looks like a nail when you have a hammer. You may need a wrench instead. Ask the experts before deciding what tools you need.

• Derzie

I’m going to suggest that the logical consequence of your argument is not to focus on Corsi (which includes so many 3 per cent chances) but to get right to the heart of the matter, and focus on those real scoring chances, those 15%-plus chances.

Why have so much noise in your analysis with all those 3 per cent chances factored in? I mean, if Player “A” has 40% of his shots at net as a 3 per cent chance, but Player “B” has just 20
% of his chances as a 3 per cent chance, that’s an issue for this kind of analysis.

Both Corsi and scoring chances data increases the sample size so we’re not so beholden to percentages, and that’s a good thing. But there’s one more issue. How can you tell who drives possession from Corsi numbers?

The best we can say, for example, is that when Player B was one of five players on the ice, the team did well when it came to shots at net differential. It’s some circumstantial evidence.

But Player B’s performance by this measure is closely aligned to his Quality of Teammates. There are four of them, just one of him. I’m not convinced you can adequately separate out individual performance from a team performance stat like Corsi or Scoring chances.

I’ve tried to do use this system, and it’s let me down in the past, so I’m hesitant to trust it.

I’d suggest the best way around this is to use the method of analysis Roger Neilson employed to study scoring chances, but that’s a debate for another day . . ..

• Zed

Did you know?

95% of all statistics are made up on the spot 50% of the time.

• PrairieStew

Start with definitions, be sure to introduce terms before using them conversationally, especially in an article meant as an introduction, then you can proselytize or just attempt conclusions.

Percentages is a mathematically term you have rendered meaningless. Have someone, preferably with a talent for language and little knowledge of the subject matter, proofread.