Here we go again: Advanced Stats and Analytics Primer, Part 2
By Ryan Hobart2 years ago
Last week, in Part 1 of this primer on advanced stats and analytics, we looked at some stats and resources that cover the foundation of what we can call “advanced stats” or “analytics” in hockey. To recap:
- Corsi (CF%) measures how many shot attempts (shots, misses, blocked shots) there are while a player is on the ice
- Production (points, goals, assists) can be measured in a number of different ways (primary points, 5v5 points, etc.)
- Expected Goals (xGF%) is similar to Corsi, but it weighs the shot attempts based on how good they were, based on where the shot was taken
- Shot attempt models like Corsi and Expected Goals have a reasonably good correlation with future wins, so that’s why they’re important
In this post, I’m going to branch out a little further from this foundation and talk about the loads of other stats that are available for the NHL. So, it’s important that you’re comfortable with everything in that summary before we take this next step.
The beauty of publicly available data is that it takes a ton of grass roots effort to make it happen. Please consider donating to the creators of these tools so that you can support them, and so that you can access them too. Patreon links for the creators will be in their specific sections also, if you want to decide later, but here they are up front to help you out:
Manual Tracking Data
The majority of the statistics we discussed in Part 1 are pulled from game logs that the NHL puts out. You could develop an advanced stats model without having ever watched a hockey game, if you wanted, using those numbers. This is something different.
Manual tracking involves watching a game and keeping a tally of certain events that happen for each player. This can cover a number of things, but the two major focuses are on carrying the puck into or out of the zone, and tracking passes, specifically those that lead to shot attempts. Ryan Stimson was a major player in the passing stats field, and while his data is no longer publicly available, you can read his book which covers that topic and much, much more. Or, you can hop in the way back machine and look at some of Ryan’s old blog posts on Hockey Graphs. First is this one which shows how shot assists (passes that lead to shot attempts) are a good predictive metric. Next, you can look at how you could use this data, with a focus on our Toronto Maple Leafs of yesteryear:
- Toronto Maple Leafs Passing Metrics 101
- Toronto Maple Leafs Opposition Analysis
- Toronto Maple Leafs Passing and Linkup Network
- Toronto Maple Leafs Passing Lane Corsi
The other major player in this field is Corey Sznajder, who got his start with tracking zone entries and exits. He now also covers the passing stats as well. Basically, Corey watches every NHL game, tracks everything that happens in it, and then puts that into nice, legible spreadsheets for us to use. We can look at the raw data by itself, which is just a list of all the events that happen, but there are far easier ways to interact with it also. All this is now accessible from his website launched yesterday, AllThreeZones.com.
For instance, with the games tracked for the 2020-21 season, this tableau visualization @Auston Matthews and @Mitch Marner are outliers in terms of how many chances they take and create, respectively, while @William Nylander is in the top tiers of both. I’ll give you three guesses as to who that Edmonton Oiler is hanging out on his own in elite territory, and the first two guesses don’t count.
Similarly, this tableau visualization shows me that @William Nylander and @Alexander Kerfoot were the best on the team at entering the opponent’s zone without creating turnovers:
There is a wealth of data for the 2020-21 season, and historical seasons back to 2016, available on this website. There are two major caveats with this data to be aware of though:
- There are biases at play here. The trackers are as unbiased as they can be but it would be physically impossible to be completely without bias
- We haven’t yet proven the predictive value of stats like this. We know that zone entries lead to shot attempt creation, as demonstrated by Charlie O’Connor. We also know that winning the shot attempts battle leads to wins, from last week’s post. But zone entries/exits are only a component of what goes into shot attempt success, primarily on the shots for side (there’s limited impact of zone entries on shot attempt prevention).
Because of these reasons, these data should be used in select circumstances only where it can add to a nuanced, in-depth analysis.
“Viz” is a colloquialism for data visualizations, and we find them very often across hockey statistics forums, including those I shared above. One of the major contributors to these Viz is Micah Blake McCurdy, who uses his site hockeyviz.com to host his tools. The good stuff is behind a paywall, so check out Micah’s Patreon to contribute and get access.
The viz you can find on Micah’s site include:
- Matchup simulator, a fun tool to pit user-created teams against each other to see who has the best chance to win. Example using 2018-19 teams for Toronto and Montreal:
- A PP shot location tool to show you where players are taking their shot attempts (in this case, not including blocked shots, which is typical for looking at powerplays because blocks can be extremely detrimental to the overall powerplay setup). Examples of Matthews and Barrie’s charts from 2019-20:
- An “environmental distiller” where you can isolate the impacts of certain players with or without each other (commonly called With You Without Yous, or WOWYs). Example of how the offense looks with Matthews and Marner vs. how it looks with Matthews and Nylander:
- An individual shot map generator, showing you where particular players take their shots. Examples with Marner and Nylander from 2016-20:
There’s even more tools than just the ones I’ve showed here, but these are the main ones. You can also look at how particular games went, how particular teams are doing, and more. I wanted to update these with 2020-21 data but my subscription to the site needs to be updated, so unfortunately you’re stuck with outdated viz’s.
These visualizations are quite valuable because it takes data and puts them into a medium that is easier to understand for many people. I’m a visual learner and often I learn new advanced stats concepts from Viz like these. Hopefully you can do the same.
GAR/WAR, RAPM Models
Here’s where we get into the math-heavy stuff, so feel free to skip over this part if you don’t want to hear about that. The “executive summary” of this section is as follows: math nerds take all the data we have, put it in a big box of data analysis, and it spits out a number that shoes how certain players are doing. The box of data analysis is called “regression” and the whole process is called a “model”.
The stuff we’re going to talk about comes from evolving-hockey.com, a site run by a pair of twin brothers who like the Minnesota Wild (@EvolvingWild on Twitter). These twins are our math nerds, and have put together two models for the NHL.
The first model is Regularized Adjusted Plus-Minus (RAPM). This model is inspired by a similar model developed for the NBA. In the NBA the model attempts to predict points scored per 100 possessions, but for hockey, we’ve learned over time that points are not a reliable predictor of future points (sources on this have dried up, it’s become more community know-how at this). The RAPM process can be used to predict any variable you want. In this case, the model tries to predict future Corsi, with the knowledge that Corsi predicts future goals, and goals predict future wins. As an example, one target variable the RAPM model can be used to predict is Corsi For Per 60 Minutes.
In order to predict this variable, we put a number of different things into the “box”:
After this, the twins took it a step further and developed a second model. Since RAPM can predict future Corsi, and we know future Corsi predicts future goals, we can make another model to show which players are likely to produce future goals. This gives us the second model, Goals Above Replacement (GAR). This model aims to give a number of how many goals a player could help create, after accounting for all kinds of factors like what shot attempts they’re taking, who they’re playing with, and other factors. The number of goals is expressed as a positive or negative, relative to a “replacement level” player. Colloquially, this is your average AHL call-up who can contribute at the NHL level but shouldn’t be relied on. Finally, you can use this GAR model to predict how many Wins Above Replacement (WAR) a player will contribute to, similar to WAR models in baseball.
When it spits out the results, here’s generally how to interpret them for 82 games of data:
This post covers a lot of different topics from a few different resources, and I understand that that might be overwhelming. Feel free to dive into one of these three resources, take your time to get familiar with it, and then move on to the next. You can always come back to this post if you need help framing, or just if you want links to the different sites.
These are the main “extras” that I’ll use in future Staturday columns to tell stories about players and teams. I’ll always include links to the two primer posts at the bottom if you ever want to reference back here and remember how to get access to something, or what the definition of something is.
Ultimately, data analytics in hockey is such a dynamic field that at any point, one of these people could get hired by a team and all of their stuff might disappear, so we have to make use of what we have while we have it. And new stuff will keep appearing as people try to innovate in this space, and if anything cool happens I’ll be sure to make a Staturday column about it to show it to you all.
That about does it for this week. Next week will be the third and final part of this primer series where we’ll break down the stats resources for women’s leagues like the PWHPA teams and the PHF (formerly NWHL).
Recent articles from Ryan Hobart