logo

Introducing a New WAR model: Part 5 – Data Release & Final Thoughts

Ian Tulloch
6 years ago
This is the final instalment of my 5-part WAR series. Here are the links to the other sections:

We made it! I’m sorry for spending so long on the methodology, but I felt that it was important to provide a detailed explanation of how #MyModel works. As a reminder, here’s a quick breakdown of what my WAR formula takes into account:
  • 5v5 Shot Impact
    • Blends Shot Quantity (Corsi) & Shot Quality (Expected Goals)
  • Expected Offence
    • Expected Point Production for Forwards
    • Shooting Talent (‘Goals Above Expected’) for Defencemen
  • Penalty Differential
  • Faceoff Differential
  • Power Play Value

Data Release

So without further ado, here’s a link to my WAR data over the last three seasons. For those who prefer Goals Above Replacement, here’s a link to the GAR data (shoutout to Manny Perry for teaching me that roughly 4.5 Goals are worth one Win).
…and before anyone asks, yes it is repeatable from year to year (especially for forwards).
 

So…Why Should We Care About This?

The reason I like single metrics like WAR is because they do what we’re all trying to do when we use statistics – weighting them all against each other in attempt to reach some kind of conclusion on a player. It’s really hard to do that in your head when you consider all of the information that you’d need to take into account (dozens of key stats, multiple years of data, ~400 Forwards and ~200 Defencemen in any given year…that’s a lot to ask someone to do in their head).
That’s not to say it’s impossible; I know a lot of us can do it for players on our favourite teams (since we’re so close to the action and know their numbers inside and out), but measuring performance across the league is a lot trickier. This is where a metric like WAR can help objectively guide us in the right direction. Now, is any WAR model ever going to be perfect? Absolutely not (I’ll get into #MyModel’s many limitations in a minute), but that doesn’t mean we shouldn’t at least try to quantify player value.

A Look At Some Other #Models

I want to make it clear that my WAR metric isn’t the only attempt at this. There are some fantastic models in the analytics community trying to accomplish the same thing that I am; getting player value down to one number. Here’s a list of the most well-known single player metrics, which I highly encourage you to check out:
To give you an idea on how effective single metrics can be, I’ve taken an average of these metrics and determined the highest ranked forwards and defencemen over the past 3 seasons (500 TOI minimum), weighting recent seasons slightly more heavily than previous seasons. Here’s the link to this data for those who are interested, but for you lazy few, we’ll take a quick look at the Top 30 Forwards & Defencemen according to the #Models:
That looks like, more or less, the best players in hockey. We can always argue about the order in which players are ranked (“Player X isn’t better than Player Y”, “I think he should be in the Top 10, not 16th”), but at that point we’re missing the forrest for the trees. When players are that close together, you need to dive much deeper to determine which player is providing more value, and even then you tend to be splitting hairs. I’ve found that the better way to look at these metrics are in “tiers”, in that we can be pretty confident that player ranked in the Top 50 is providing more value to his team than a player ranked in the early 100s.
This isn’t to say that we can simply cite a player’s WAR and end the conversation there (I hate it when people do that, it’s not what us nerds wanted when we made these models). Think of single metrics stats as a starting point – a conversation starter, not ender. If a player’s WAR is higher or lower than you think it should be, let’s dive deeper and try to determine why there’s a difference of opinion. Hell, a lot of models disagree on certain players (Patrick Laine or Mitch Marner anyone?), so even within the analytics community there isn’t a consensus on how to measure performance.
Disagreements don’t have to be a bad thing; a lot of the times they generate good discussion and help us learn more about the game. As long as we respect each other enough to not resort to name-calling, civil debate can be great for introducing some of us to different approaches, which helps move us forward as a community.

Understanding What the #Models Favour

I agree wholeheartedly with this statement. Sometimes a WAR model will spit out a result that makes you scratch your head, but when you understand how the model works and dig deeper into the data, there tends to be a logical explanation. For example: #MyModel hates Patrick Laine (which is funny because I love him), whereas Manny Perry’s WAR metric thinks that he’s elite. How can this be? Well, when you dive deeper into how the models measure performance, it actually starts to make a bit of sense.
Patrick Laine’s value comes mostly from his ridiculous shooting ability – he scored an absurd amount of goals from long distance last year.
Manny Perry’s WAR model heavily weights in-season shooting talent (how well a player outperforms their Expected Goals in a given year), so it loved what Laine did last season. My WAR formula, on the other hand, doesn’t weight shooter talent as heavily. On top of that, it didn’t think that Laine’s shooting percentage was sustainable, so it strongly regressed his shooting talent. When you consider that most of Laine’s value comes from this area of his game, it actually makes sense how the two models can disagree so strongly on a player like him.
To help you sort out some of the disagreement you’ll see between models, I’m going to briefly explain aspects that each model strongly values and what metrics they tend to disregard.

Ian’s WAR Model (#MyModel)

Strongly Values:

  • What players are going to do (Expected point production)

Doesn’t Value:

  • What players have done 

    • Players who put up more points than “Expected”

    • Players who drastically outperform their career Sh%

DTMAboutHeart’s WAR Model

Strongly Values

  • Penalty Differential

Doesn’t Value

  • Players who drive play drastically better/worse than in prior seasons

    • Bases play-driving ability on a multi-year sample

Manny’s WAR Model

Strongly Values

  • In-Season Shooting Talent

    • How many more Goals than Expected Goals a player has scored

Doesn’t Value

  • Point Production

    • Especially empty calorie points (players who produce points but don’t drive play)

    • This is because his model literally doesn’t look at Points

Dom Luszczyszyn’s Game Score and EvolvingWild’s Weighted Points Above Replacement

Strongly Value

  • Point Production

Don’t Value

  • Play-drivers with mediocre point production, especially defencemen (ie. Tanev & Hjalmarsson)

Keep this all in mind when you see a player whose WAR or Game Score doesn’t pass the smell test. Sometimes it simply has to do with the model’s weights, sometimes that player is genuinely more valuable than public perception (this is often the case), and sometimes…

Limitations

…the models aren’t always perfect! I could make this section longer than the Leafs’ Stanley Cup drought if I wanted to, but to make life easier for you, I’m going to quickly touch on #MyModel’s biggest limitations (many of which also apply to the other models).

Dividing Credit

A lot of models suffer from this, but mine especially. When a line or pairing is very successful (or unsuccessful), #MyModel struggles determining who deserves more credit for those results. There are some great models that use priors to help determine which players have historically driven results at a higher rate (ie. DTMAboutHeart’s), but unfortunately my model isn’t quite as sophisticated. Although my formula tries to determine players’ shot metrics relative to their teammates, it runs into problems when players spend most of a season on the same line or pairing. This is known as:

‘The Sedin Problem’

This thread by Matt Cane does a great job explaining the concept, but basically when two players play together for an entire season (or entire career in the Sedins’ case), you have an extremely small sample of those guys playing apart from each other. This can sometimes result in the two players having drastically different relative numbers, since their shot metrics without each other are based on such a small sample and therefore subject to more variance.
This is an unfortunate side effect of using relative numbers, but it’s worth noting that this problem also occurs with regression models, which Matt Cane explains very well in the aforementioned twitter thread. To put it more simply:
TLDR: If we’re going to use regression to determine player ratings, players who play most of their time together are going to come out wonky

Playing with a Dominant Player

It can’t be stated enough that these models try to adjust for context. When you’re dealing with a freak like Crosby, though, it’s hard to fully adjust for his greatness (especially his ability to elevate his teammates’ shooting percentage, which is almost impossible to account for in a model). Just keep in mind that players who play with elite talents tend to have their numbers inflated. Following this same logic, players who play with a boat anchor are going to have their numbers deflated. Sophisticated models like DTMAboutHeart’s and Manny’s do a much better job of accounting for these factors, but unfortunately #MyModel isn’t as smart as theirs.

Playing without a Dominant Player

If you’re on the Boston Bruins’ 2nd line, it means that Corsi God, Patrice Bergeron, is going to be on the ice quite a lot when you’re off of the ice. Even when you take my Team Strength adjustment into account, it’s still going to kill your relative numbers playing on a different line than him. Keep this in mind when someone plays on the same team as a dominant play driver, but on different line or pairing.
Guys like Krejci, Marleau, and Zetterberg are classic examples of this (having to play second fiddle to shot differential beasts like Bergeron, Thornton, and Datsyuk). My favourite example, though, is Yannick Weber on Nashville’s 3rd pairing last season. #MyModel hated him in 2016-2017 because his teammates performed drastically better without him. I can’t tell my model “no don’t worry about it, they were playing four #1 defencemen when he was off of the ice”, so unfortunately the Yannick Webers of the world are going to be unfairly punished.
Long story short: if there are incredible players on the ice when you’re off of the ice, you’re probably going to be underrated by #MyModel (and a lot of other models if we’re being honest).

Closing Thoughts

As we’ve established, #MyModel isn’t perfect. Like everything else in life, it has some flaws. That doesn’t mean we should disregard its benefits, though. The reason statistics are used in every walk of life is because we can use them to objectively analyze things. That’s what WAR models like mine are attempting to do by quantifying player value. They try to take all of the important information into account, weight each metric based on its impact on goals, and then spit out a final result. To quote Moneyball, it’s about getting things down to one number.

Failed to load video.

There’s been some significant backlash to using single metrics like WAR because of the perceived simplicity in evaluating a player with one number. When you look into how the models arrive at that number, though, you begin to realize how detailed the analysis really is. Hell, my model’s one of the simpler ones out there, and it took me over 7,500 words to fully explain it. As simple as single metrics may seem, it’s important to know that any good WAR model involves complex analysis.
Now, these metrics are not the be the be all and end all – realistically they should never be treated like that. Instead, I like to think of them as an excellent starting point for player evaluation, before we dive deeper into things (breaking down the game tape, looking at a player’s linemates, usage, play style, age, etc). If we can treat WAR models like this, while understanding their limitations, I personally believe that they can have tremendous value.
So go ahead, make love…and WAR.

Check out these posts...