Correlation, Causation, and Jamal Crawford

Ed Carson of Investors. com reported the following trend a few weeks ago (HT Freakonomics):

The best kept secret of the past 20 years has been this: When the Los Angeles Lakers won the NBA championship, the market would almost always fall that year. When the Lakers lost, the market would usually rise. The Laker Indicator only steered investors wrong in three years during the entire span, and not once from 1995-2007.

An investor who put down $1,000 into the Nasdaq at the start of 1987 and stayed fully invested through 2007 would have ended up with $7,604. But an investor who bought the Nasdaq in years the Lakers lost and stayed in cash when the Lakers won would have finished with $21,189. This strategy would have kept you in the market during the 1990s bull market, avoided the 2000-2002 bear and then got back in as the market uptrend resumed.

Such numbers speak for themselves. In fact, I can’t imagine anyone looking at Carson’s numbers and coming to another conclusion.

Okay, if you read a little further in the column you see that Carson disagrees with Carson’s numbers.  Here is Carson again: Correlation does not always mean causation. Psychological ‘secondary’ gauges may appear to work for a time, then suddenly stop. And it’s easy to look for excuses, a la the Lakers in 2008, for why your special indicator really still works.

Focusing on Jamal Crawford

The Carson story about the Lakers came to mind as I watch a bit of basketball on Christmas day.  While watching I heard (and I didn’t write this down so this may not be exact) Hubie Brown briefly discuss the contenders in the East. After noting Boston, Cleveland, and Orlando, Brown noted that people shouldn’t ignore the Atlanta Hawks.

So far I agreed with Brown.  The Hawks currently rank second in the NBA in efficiency differential (offensive efficiency minus defensive efficiency).  So any discussion of NBA contenders has to include Atlanta.

But after noting that Atlanta is a contender, Brown decided to offer an explanation for Atlanta’s surge.  And that explanation focused on Jamal Crawford.  Essentially Brown argued that the off-season addition of Crawford is the reason why Atlanta has improved. 

I should note that I am probably being unfair to Brown. Yes, I did hear him say this. But in the course of a game – where the talking heads have a job to do (i.e. keep talking) – one is bound to find something said that doesn’t make complete sense or that the person talking wouldn’t take back (or at least explain better).  That being said, though, I want to proceed as if Brown really meant what he said (i.e. Jamal Crawford is the reason why the Hawks have gotten better).  And I want to proceed in this fashion because Brown’s focus on Crawford illustrates a larger point about correlation and causation.

Explaining Atlanta’s Surge

Let me start by offering a defense for Brown’s comment.  It’s easy to see why someone might focus on Crawford.  Of the eight players who have logged at least 400 minutes for the Hawks this season, Crawford is the only one who was not with Atlanta last year.  Last season the Hawks won 47 games without Crawford.  This year the team’s efficiency differential suggests Atlanta will win more than 60 games.  Such evidence seems fairly self explanatory.  Atlanta only has one new player getting any minutes, so that one new player must be the reason why the team is much better.

Once again, though, correlation doesn’t prove causation.  Just because Atlanta has done better since it acquired Crawford, it doesn’t mean Crawford is the reason why the team has improved. 

Back in November I offered an explanation for why the Hawks have gotten better.  That explanation centered on the play of Josh Smith.  Now that Atlanta has played 29 games it seems like a good idea to re-check the numbers.

We can find Josh Smith’s box score numbers at Basketball-Reference.com.  A check of these numbers reveals – just as we saw in November – that Smith has improved with respect to shooting efficiency, rebounds, assists, steals, and turnovers.  His per game scoring is down, but with respect to the statistics that drive wins (both theoretically and empirically), Smith has improved tremendously.

We can see this improvement when we turn to Wins Produced.

Table One: The Atlanta Hawks after 29 games in 2009-10

Table One reports the Wins Produced for each player Atlanta has employed this year.  And it reports what we could have expected had each player maintained his performance from last season.  The numbers from last season indicate the Hawks should be on pace to win 46 games in 2009-10.  Again, that’s about what this team did in 2008-09.  So given the team’s roster moves – and what the players on this roster did last year — Atlanta shouldn’t be any better.

But Atlanta clearly is better.  When we look at how performance has changed, we see that the primary reason for this improvement is the play of Josh Smith.  Had Smith maintained his 2008-09 productivity level this season, he would be on pace to produce 6.2 wins. Instead, his improved box score numbers translate into more than ten additional wins.  In sum, both the empirical evidence found in the box score numbers – and how these numbers theoretically and empirically are linked to wins – indicates it’s not the addition of Crawford that has transformed the Hawks (Crawford – as I noted last April – has generally been a below average shooting guard in his career and he is once again below par in 2009-10).  The key is really the improved play of Josh Smith (Al Horford and Joe Johnson have also helped some).

Correlation Stories

Let’s say, though, you really wanted to stick with the Crawford story.  One could argue that somehow Crawford’s presence has led Smith to hit more shots, grab more rebounds and steals, and commit fewer turnovers. Such a story is tempting, especially if you begin your analysis with a correlation.

And this is a great example for why we tend not to begin our analysis with a data-mining search for correlations.  Once you find a correlation it’s too easy to start inventing theories.  A better approach – and the approach we teach our students – is that good empirical analysis begins with some sort of theoretical model, and then moves on to the data.  It’s only through theory that we can actually argue any causation at all. Or in other words, without a theory all we have is a correlation. 

The Lakers-stock market story and the Crawford-Hawks stories are good examples of correlations in search of a theory. Other examples in basketball can be seen whenever you see people argue that when player X is added to team Y, team Y does better (or worse).  Again, one can show that such correlations exist.  But without a theoretical structure, it’s often hard to believe that a causal relationship has actually been uncovered.  And without a causal relationship, you really don’t have much of a story.

- DJ

The WoW Journal Comments Policy

Our research on the NBA was summarized HERE.

The Technical Notes at wagesofwins.com provides substantially more information on the published research behind Wins Produced and Win Score

Wins Produced, Win Score, and PAWSmin are also discussed in the following posts:

Simple Models of Player Performance

Wins Produced vs. Win Score

What Wins Produced Says and What It Does Not Say

Introducing PAWSmin — and a Defense of Box Score Statistics

Finally, A Guide to Evaluating Models contains useful hints on how to interpret and evaluate statistical models.

Comments are closed.