Staturday Weekly Column #2: Where do we start? Advanced Stats Primer, Part 2
By Ryan Hobart2 years ago
Last week, in the inaugural post of my new weekly column, we looked at some stats and resources that cover the basics of what we can call “advanced stats” or “analytics” in hockey. To recap:
- Corsi (CF%) measures how many shot attempts (shots, misses, blocked shots) there are while a player is on the ice
- Production (points) can be measured in a number of different ways
- Expected Goals (xGF%) is similar to Corsi, but it weighs the shot attempts based on how good they were, usually based just on where the shot was taken
- Shot attempt models like Corsi and Expected Goals have a reasonably good correlation with future wins, so that’s why they’re important
In this post, I’m going to branch out a little further from these base values and talk about the loads of other stats that are available for the NHL.
The beauty of publicly available data is that it takes a ton of grass roots effort to make it happen. Please consider donating to the creators of these tools so that you can support them, and so that you can access them too. Patreon links for the creators will be in their specific sections also, if you want to decide later, but here they are up front to help you out:
Manual Tracking Data
The majority of the statistics we discussed in Part 1 are pulled from game logs that the NHL puts out. You could develop an advanced stats model without having ever watched a hockey game, if you wanted, using those numbers. This is something different.
Manual tracking involves watching a game and keeping a tally of certain events that happen for each player. This can cover a number of things, but the two major focuses are on carrying the puck into or out of the zone, and passing stats. Ryan Stimson was a major player in the passing stats field, and while his data is no longer publicly available, you can read his book which covers that topic and much, much more.
The other major player is Corey Sznajder, who got his start with tracking zone entries and exits. He now also covers the passing stats as well. Basically, Corey watches every NHL game, tracks everything that happens in it, and then puts that into nice, legible spreadsheets for us to use. We can look at the raw data by itself, which is just a list of all the events that happen, or there are easier ways to interact with it.
For instance, with the games tracked for the 2019-20 season so far, this part of the spreadsheet tells me that #88 Nylander leads the team in shot assists (passes that lead directly to shot attempts).
I can also see that #94 (Tyson Barrie) led the team in possession exits from the Leafs’ own zone:
There is a wealth of (albeit incomplete) data for the 2019-20 season, available through Corey’s Patreon. There are two major caveats with this data to be aware of though:
- There are biases at play here. The trackers are as unbiased as they can be but it would be physically impossible to be completely without bias
- We haven’t yet proven the predictive value of stats like this. This means that we can’t tell you that Nylander leading the team in shot assists definitely will predict that his actual assist numbers in kind with that. It’s reasonable to think that’s true, as I do, but we can’t prove it yet.
Because of these reasons, these data should be used in select circumstances only where it makes sense, not as a widely used tool for every player in every circumstance.
“Viz” is a colloquialism for data visualizations, and we find them very often across hockey statistics forums. The major contributor to these Viz is Micah Blake McCurdy, who uses his site hockeyviz.com to host his data. The good stuff is behind a paywall, so check out Micah’s Patreon to contribute and get access.
The viz you can find on Micah’s site include:
- Matchup simulator, a fun tool to pit user-created teams against each other to see who has the best chance to win. Example using last year’s teams for Toronto and Montreal (please don’t let this cast doubt on the value of the tool!):
- A PP shot location tool to show you where players are taking their shot attempts (in this case, not including blocked shots, which is typical for looking at powerplays because blocks can be extremely detrimental to the overall powerplay setup). Examples of Matthews and Barries’ charts:
- An “environmental distiller” where you can isolate the impacts of certain players with or without each other (commonly called With You Without Yous, or WOWYs). Example of how the offense looks with Matthews and Marner vs. how it looks with Matthews and Nylander:
- An individual shot map generator, showing you where particular players take their shots. Examples with Marner and Nylander from 2016-20:
There’s even more tools than just the ones I’ve showed here, but these are the main ones. You can also look at how particular games went, how particular teams are doing, and more.
These Viz are quite valuable because it takes data and puts it into a medium that is easier to understand for many people. I know that I’m a visual learner and often I learn new advanced stats concepts from Viz like these. Hopefully you can do the same.
GAR/WAR, RAPM Models
Here’s where we get into the math-heavy stuff, so feel free to skip over this part if you don’t want to hear about that. The “executive summary” of this section is as follows: math nerds take all the data we have, put it in a big box of data analysis, and it spits out a number that shoes how certain players are doing. The box of data analysis is called “regression” and the whole process is called a “model”.
The stuff we’re going to talk about comes from evolving-hockey.com, a site run by a pair of twin brothers who like the Minnesota Wild (@EvolvingWild on Twitter). These twins are our math nerds, and have put together two models for the NHL.
The first model is Regularized Adjusted Plus-Minus (RAPM). This model is inspired by a similar model developed for the NBA. In the NBA the model attempts to predict points scored per 100 possessions, but for hockey, we’ve learned over time that points are not a reliable predictor of future points (sources on this have dried up, it’s become more community know-how at this). The RAPM process can be used to predict any variable you want. In this case, the model tries to predict future Corsi, with the knowledge that Corsi predicts future goals, and goals predict future wins. As an example, one target variable the RAPM model can be used to predict is Corsi For Per 60 Minutes.
In order to predict this variable, we put a number of different things into the “box”:
After this, the twins took it a step further and developed a second model. Since RAPM can predict future Corsi, and we know future Corsi predicts future goals, we can make another model to show which players are likely to produce future goals. This gives us the second model, Goals Above Replacement (GAR). This model aims to give a number of how many goals a player could help create, after accounting for all kinds of factors like what shot attempts they’re taking, who they’re playing with, and other factors. The number of goals is expressed as a positive or negative, relative to a “replacement level” player. Colloquially, this is your average AHL call-up who can contribute at the NHL level but shouldn’t be relied on. Finally, you can use this GAR model to predict how many Wins Above Replacement (WAR) a player will contribute to, similar to WAR models in baseball.
When it spits out the results, here’s generally how to interpret them:
This post covers a lot of different topics from a few different resources, and I understand that that might be overwhelming. Feel free to dive into one of these three resources, take your time to get familiar with it, and then move on to the next. You can always come back to this post if you need help framing, or just if you want links to the different sites.
These are the main “extras” that I’ll use in future Staturday columns to tell stories about players and teams. I’ll always include links to the two primer posts at the bottom if you ever want to reference back here and remember how to get access to something, or what the definition of something is.
Ultimately, data analytics in hockey is such a dynamic field that at any point, one of these people could get hired by a team and all of their stuff might disappear, so we have to make use of what we have while we have it. And new stuff will keep appearing as people try to innovate in this space, and if anything cool happens I’ll be sure to make a Staturday column about it to show it to you all.
That about does it for this week. Next week’s column is a mystery to me as well, so I’ll just leave it with this: See you next Staturday!
Recent articles from Ryan Hobart