How to judge predictions

Here is something we occasionally hear from time to time:
“If your model is so good then predict the next season to prove it!”

In the past, members of the WoW network have tried to do just this. But as I will note in the moment, this exercise is mostly for fun (not part of any kind of rigorous inquiry).

That being said… recently wiLQ over at Weak Side Awareness put up a post looking at how well a series of models and analysts did evaluating the 2010-2011 season. I thought this was a very enjoyable post. However, there are two issues I have with what people may takeaway from this wilQ’s rankings.

The absolutely most incorrect way to judge a decision is solely by the outcome.

I’m pretty much stealing this straight from Dan Ariely. So if you want a quality author’s take on the subject I suggest you hit up his blog and come back.

Alright I know this sounds crazy. When we evaluate a decision we should look at a few things beyond the outcome of the decision.  For example…

  • What did you know when you made the decision.
  • What did you think would happen?
  • What actually happened?
  • Why did what happened differ from what you thought?

I’ll give a basketball example. The Memphis Grizzlies took on Zach Randolph’s terrible contract in 2009. The end result is Zach Randolph has been one of the premier power forwards in the league the last two seasons, and furthermore, the Grizzlies were a power in the playoffs. However, here’s what we could have said in 2009 with all of the data we had on Randolph:

  • Zach Randolph was 28, meaning he was unlikely to improve past what we’d seen.
  • Zach Randolph’s performance had been marginal at best in the past.
  • Zach Randolph was an expensive player
  • Zach Randolph had issues on and off the court.

So when we judge the Zach Randolph decision should we rate it highly because things turned out ok? NO! It was a terrible decision. Rewarding it is rewarding luck. Unlike good management, there is no guarantee luck will continue in an organization.

When we evaluate a analyst’s predictions we have to evaluate why they said what the said. Was their information good? For instance, Arturo thought Portland would be very good. Unlike other analysts he didn’t know Roy’s knees were so bad they could not be insured. Additionally, Arturo thought  Przybilla, Camby and Oden could stay at least healthy enough between the three of them to be a force. Oden and Pryzbilla were lost for the season and Camby had injuries. Anyway to sum up:

If we judge based on results — without looking at context — we are not doing good work and we may just be rewarding luck!

Make sure your test of a model actually tests the model.

Alright recap time. The Wins Produced model begins with the individuals box score stats, and uses this information to statistically measure a player’s production of wins. This model is very good at explaining wins (unlike Player Efficiency Rating and NBA Efficiency) and is fairly consistent year to year at a player level (unlike plus-minus and adjusted plus-minus).

Okay, let’s have a pop quiz!

What should we use to evaluate this model? If you said an individual player’s contribution you are correct! I’ll actually show an example of Wins Produced being used for predictions in a way that I think is fair.

Last season our analysts tried to determine which players would win specific awards (using the Wins Produced metric as a barometer) and guess what? The analysts actually did a good job guessing which individual players would be good in specific categories. In fact as a consensus our analysts were top three for every major award using Wins Produced.

So using a metric that judges individual’s performances to predict an individual’s performances seems to work.

The analysts for fun also tried to predict team records (with much less success).

Now pop quiz number 2! Tell me all of the things that are involved with how well a team plays. Here’s just a few I could think of.

  • Individual performances
  • Trades
  • Injuries
  • Breakout seasons for young players
  • Breakdown seasons for older players
  • Minute allocation by coaches.

Of this list, Wins Produced (as well as metrics like Win Shares, WARP, etc…) attempts to handle only one of these factors.

Pop quiz number 3 (final one I promise) does the Wins Produced — or any other metric designed to measure player performance — address any other factor on the list. If you answered no, you’re right! Estimating an individual player isn’t too difficult. Estimating 400-500 of them and a bunch of other things is. So when you try and judge how well a metric designed to measure one thing does when applied to many other values, you are really not testing the model. For example Win Shares is listed. The “prediction” was made based on a bunch of simulations run on the league as it stood at the end of the 2010 season. Comparing it to a different league with many changes is not a good test of if Win Shares is a good metric.

I want to make it clear that I greatly enjoyed wiLQ’s post. I think it is a lot of fun (or painful when I’m reminded of my guesses) to look back at predictions analysts make. What I want to avoid is the notion of anyone reading that this is somehow a good test of one metric versus another.

Summing up

The Wins Produced metric is very good if you want to:

  • Look at which players to reward for good performance on your team (e.g. sign players to new salaries)
  • Determine which players from last year will likely be good this year (e.g. free agents)
  • Determine which players may or may not be overrated/underrated (e.g. trades)

In short, the Wins Produced metric is an excellent tool if you’re a GM or fan and you want to explain or evaluate parts of basketball. Now we love this metric around here. It was a life changer for those of us watching Melo and A.I. put up 25+ points per game and still not contend. We’ve also sold it as an easy to use single number. That said I want to make one thing clear in case we haven’t already:

Just because you use the Wins Produced metric does not mean you should ignore other information!

We’re against things like Adjusted Plus Minus and PER because they’re bad information (i.e. they are inconsistent across time or they do not explain what the purport to explain).  That said, we’re fully behind good useful information. Should you have a crack set of medical trainers that can evaluate players and help them if they get injured? Absolutely! Should you have a psychiatric team (Arturo’s idea btw.) to make sure players on your team are sound and to help them if they have issues (e.g. Beasley) You betcha! Should you analyze your coaches to see if they play the right players! Yes! All of this information should be considered.

As long as you evaluate information properly and use it correctly it’s useful. When you don’t do this… well, then what you are doing isn’t quite as useful.


Comments are closed.