Arturo Galletti has written an excellent column detailing the calculations behind adjusted plus-minus (APM). This column explains how this measure is calculated, and in the process, casts further doubt on the validity of this approach. Below is a comment on Arturo’s post. Before you read this, please read what Arturo said.
Furthermore, you probably should also listen to our weekly podcast (with Andres Alvarez, Mosi Platt, Arturo, and I). This podcast was devoted almost entirely to this subject.
Okay, now that you have read Arturo’s post and listened to our podcast, here are some additional thoughts. These additional thoughts begin with a review of what I think we already knew about APM.
Problems with Box Score Models
The APM method appears to be a response to the methods used to analyze the box score. So our review begins with the models used to evaluate a player’s box score measures.
Perhaps the oldest model created to evaluate NBA players is NBA Efficiency. This simple model – which adds together a player’s positive stats and subtracts the negative (without any effort to weight these) – has it roots in the TENDEX model created by Dave Heeran. And Dave Heeran said this model goes back about 50 years.
The NBA Efficiency model – as has often been noted – is not highly correlated with team wins. And this is because it rewards inefficient shooting. If a player exceeds a minimum threshold for shooting efficiency (33% from two-point range and 25% from three-point range), the more a player shoots the better will be his NBA Efficiency score (a similar observation is made about John Hollinger’s Player Efficiency Rating). Since inefficient shooting doesn’t actually win games, models with this problem will have a hard time explaining outcomes.
The inability of the NBA Efficiency family of measurements to explain wins has not gone unnoticed by some people. People have seen players with high efficiency marks (like Allen Iverson) leave a team and the team hasn’t actually gotten worse. Or join a team and not make it much better. This has led some to question whether these statistical formulas really capture player performance.
These questions, though, didn’t just cause people to question the formulas. What people have actually questioned is the box score numbers used to calculate these metrics. Because basketball is a team sport, it is reasonable to think that a player’s numbers depend on his teammates. Furthermore, there are events that happen on the court that the numbers don’t capture (like on-the-ball defense). Consequently, those box score number –some argued — can’t be relied upon to measure player performance (of course, such an argument — as discussed before in this forum –ignores the consistency we see with respect to NBA box score numbers).
Moving to Plus-Minus
Such a story has led people to look past the box score numbers at a player’s plus-minus. Plus-minus captures how a team does when a player is on and off the court. The problem with plus-minus, though, is that basketball is a game of five-on-five. So a player’s plus-minus is a function of the following factors: the player’s ability, the ability of his teammates, the ability of the teammates who take the floor when the player is on the bench, and the quality of the opponents the player is facing. Of this list, we really just want to know about the player’s ability. So how do we capture this one factor?
The solution people have offered is adjusted plus-minus (APM). This measure – which several NBA teams have now apparently employed for a few years – is supposed to control for a player’s teammates and opponent. And therefore, it is supposed to be the “best” representation of a player’s ability. But upon further review….
Here is what we know about APM.
As detailed in a published journal article, Stumbling on Wins, a soon to be published article in an academic collection, and the FAQ page for this forum…
The APM coefficients are often insignificant.
For example, consider Corey Brewer. With the Timberwolves this year, Brewer had an APM of 0.57. So according to this number, Brewer was an above average player with the T-Wolves. When we look at Kobe Bryant with the Lakers this year, his APM is -10.87. And that means that Kobe is a below average player in 2010-11. Actually that is an understatement. A mark of -10.87 means that Kobe is just awful this year.
Or does it? For both players the standard error of the coefficient is so large that the correct interpretation of the result is that neither Brewer nor Bryant had a statistically significant impact on the outcomes observed for their team. In other words, because the standard error is relatively large (a general rule of thumb is that the coefficient should be twice the value of the standard error) we cannot differentiate the coefficient from zero. And therefore, we cannot conclude a relationship between the player and outcomes actually exists (i.e. neither Brewer nor Kobe matters for their respective teams).
People have argued that when you add more data the problems of large standard errors will be reduced. This is true, but even when we have more years it is still the case that many of the estimated coefficients appear to be statistically insignificant (Brewer and Bryant both have insignificant coefficients when we look at two years). Furthermore, one reason we see “improved” results with more years is that when you add more data to any model the standard errors will fall (because number of observations is part of the standard error calculation). So that may not mean the model is any better.
The APM coefficients are inconsistent across time
Beyond insignificance we also have a problem with inconsistent measurements across time. Decisions are made about the future. So we don’t want to know if a measure can just explain the past. We need to know whether future measures are correlated with measures taken in the past. For simple plus-minus, year-to-year correlations are quite low.
One might think this is because plus-minus doesn’t control for teammates and opponents. In other words, APM – which supposedly controls for teammates and opponents – would solve the problem observed with plus-minus. But as reported in various places before, only about 7% of a player’s APM this year is explained by the player’s APM last year. And when a player switches teams, the player’s APM this year is not statistically related to his performance the previous season. And that means APM can’t tell you anything about what a player will do when he changes teams. So if you change teammates –something APM is supposed to be controlling for – you don’t get the same APM.
Arturo Deconstructs the Model
The issue of insignificance and inconsistency suggest the APM model can’t be used by decision-makers. But there is yet another issue. Arturo Galletti has offered an extensive discussion of this model that details how it is calculated. And this discussion reveals a few items of interest.
Quoting from Arturo’s article:
….two things jumped out (when Arturo looked at the APM model). One the correlation to wins was very low (~10% R^2) and the +/- numbers don’t quite add at the team level. Somehow they do add up in the final +/- APM numbers.
Let’s talk about the lack of correlation. Arturo notes he looked at this model from a variety of different angles. And as Arturo notes…
Every single regression gave me less that 5% R-Sq. So I feel confident in the statement that the correlation of the model in step 1 (as described) is <5%.
So the model designed to control for the quality of a player’s teammates and the opposition the player faces only explains less than 5% of outcomes. The lack of explanatory power, though, is not something proponents of APM have gone out of their way to highlight.
So how does one take a model that can’t explain outcomes and transform it into something that can? Well, there are two more steps. Again we turn to Arturo’s post:
The model now takes the True +/- values (outcome from step one) for each player from the first equation and regresses those against those player’s stats to determine weights for each stat.
This second step has a reported r-squared of 44%. Again, that isn’t explaining outcomes very well either.
To get to a model that explains outcomes, we have a final step. Again from Arturo…
The final step is to take the Pure regression (step one) and the Stats model (step two) and adds them up by player like so:
APM = x* Pure +/- + (1-x)*Statistical +/-
And proceed to adjust x between 10% and 90% for each player to minimize the error.
So what does that mean?
Here is how Arturo summarizes the explanatory power of the model:
…the r-squared for the APM model is very much a fabrication. The correlation to point margin & wins of the model shown in BasketballValue is artificially inflated by adding the error back in.
Summarizing the Story
So the APM model has the following three characteristics:
1. The coefficients are often not statistically significant. So for most players, the correct interpretation of the results is that the player in question does not have a statistically significant impact on outcomes.
2. The results are very inconsistent over time. So a decision-maker cannot look at past values and use these for decisions about the future (of course, all decisions are about the future).
3. And the model itself doesn’t really explain outcomes. At least, it doesn’t appear to explain outcomes without that very interesting third step.
As Arturo summarizes…the APM model examined does not hold up under scrutiny. It is built to account for all the variability in the process but hold very little actual correlation to the actual process.
One should remember – as Arturo notes – that there is more than one version of the APM model. So it is possible that other versions address these issues. But at this point, we can’t be sure about these other approaches. Or as Arturo put it in the comments section on his post…
The APM model as currently constructed on BasketballValue is not something I can put any credence in at this point, given what I now know about it’s construction. However, models like Wayne Winston’s are interesting as points of references. I do tend to take closed models with a huge grain of salt now. Call me Doubting Thomas.