My last post focused on media’s decision to give the 2009 Rookie of the Year award to Derrick Rose. The post began by noting that Rose was essentially an average point guard in 2008-09. It then proceeded to offer the following argument:
So why was he named Rookie of the Year?
The key issue is point score per game. Rose had the second highest scoring average among rookies. And since the leading scorer – O.J. Mayo – played on a losing team (and was also drafted after Rose), we should not be surprised that most of the media focused on the point guard from Chicago.
Explaining the Vote
After writing this sentence I thought it might be a good idea to investigate my claim. Essentially I am arguing that the media primarily focuses on scoring in voting for the Rookie of the Year. And beyond scoring, draft position and playing for a winner also matters.
My argument is bolstered by past studies – detailed in The Wages of Wins and elsewhere — of how talent is evaluated in the NBA. However, none of these past studies – at least, none that I am aware of – looked specifically at voting for the Rookie of the Year. So I thought I would spend a bit of time seeing if the conjecture I offered was supported by some empirical evidence.
The study begins with the voting process. The Rookie of the Year award is decided as follows: Each member of the sports media asked to vote for this award names three rookies. The top choice from this trio receives five voting points. Second place is worth three points. And the third choice receives a single point. After all the votes are in, the voting points of each rookie placed on a ballot are added together. The rookie with the most votes gets a trophy. And everyone else gets to wonder why they didn’t get a trophy.
With voting explained, we now need to look at the data. As is the case with all awards given out by the NBA, voting data for the Rookie of the Year can be found at Patricia’s Various Basketball Stuff, a great NBA website maintained by Patricia Bender. According to Patricia’s website, from 2003 to 2009, 62 rookies received at least some consideration for this award (prior to 2003, Patricia Bender only reports the players who received a first place vote). What we wish to identify is the factors that caused the voting pattern we are observing.
Let’s start with a very simple model. Specifically, let’s regress voting points on the three variables identified earlier: points scored per game, team wins, and draft position. To this list I am going to add two more factors: number of games played and the size of the market where a rookie plays his home games.
As Table One indicates, regressing voting points on this list of factors reveals that the conjectures noted in the Derrick Rose article are supported by the evidence.
Points scored per game, team wins, and draft position are all statistically linked to voting points, with points scored having the largest impact. In addition, games played also matters. Market size, though, was not statistically significant (I am going to explain what “statistically significant” means at the end of the column).
Our simple model explains 75% of the variation in voting points. What if we considered, though, the other box score statistics? To answer this question, adjusted field goal percentage, free throw percentage, rebounds, steals, assists, blocked shots, and turnovers were added to the original model (all of the non-shooting data was per-game and measured relative to position played). Including all of these factors boosts our explanatory power to… 75%. Okay, explanatory power looks the same although if you go out a few decimal points you see a slight increase. In addition, of the new variables added we see a statistical link between voting points and shooting efficiency, steals, personal fouls (a negative link), and maybe assists (at the 10% level for assists). Rebounds, turnovers, and blocked shots were not found to be statistically significant. And of all these factors, points scored per game easily has the biggest impact on voting points.
Just Like Coaches
For those who read the Wages of Wins, this story should sound familiar. In our book we discuss voting for the All-Rookie teams, which is done by the coaches. We report that a simple model — with points scored as the sole measure of performance — explained 76% of the variation in the coaches’ voting. Adding all the other box score statistics to the model only boosted explanatory power to 77%. And just like we saw with the sports media, points scored was the dominant performance factor.
Such a result should probably not surprise. One suspects that how the sports media see the players is heavily influenced by how the coaches see the players. So when coaches tell us that Kevin Durant and Derrick Rose are great players, one should expect the media to adopt this perspective. And when non-scorers do not get as much attention from coaches, we should expect the sports media to offer similar evaluations.
On Statistical Significance
Okay, that ends my story for today. What follows is a brief discussion of statistical significance. If you are not interested in statistics, this final section will not be the best thing you will read today. But since I used the word “statistical significance” in the column, I thought I would offer a brief introduction to the concept.
Let me start with a huge qualification (and something I think I have said in the past). You really can’t teach econometrics in a blog (at least, I don’t think I can). Having taught econometrics in the past I know this is a subject that requires a fair amount of class time. And after putting the time in the classroom (and I mean, taking more than one class), you then have to then put in additional time to gain experience. This means doing research that is reviewed by other people who understand statistics and econometrics (in other words, other people who have published research that utilized econometrics). After you put in all this time you will probably reach a point where you realize there is still much for you to learn (yes, this stuff is not real easy).
All that being said, let me try and clarify what is meant by the words “statistical significance”. In running a regression we are estimating the statistical relationship between the dependent variable (in this case, voting points) and each independent variable (for example, points scored). That relationship is captured by a coefficient (which tells us the direction and magnitude of the relationship) and a standard error. Although people tend to focus on the coefficient, the standard error is extremely important. This is because it’s the standard error that tells us whether or not our coefficient is statistically different from zero.
What does that mean? Before we can talk about the direction and magnitude of a relationship, we have to first establish whether or not a relationship even exists. If a coefficient was actually zero, then there would clearly be no relationship between the independent variable and dependent variable. In estimating the model, though, you are not going to see a coefficient that is actually zero. The number we do see, though, has to be something that can be differentiated from zero. And to make that differentiation, we compare the size of the coefficient to the size of the standard error.
To further explain this concept, let me fall back on a standard rule of thumb (keep in mind, an actual review of an article goes beyond this simple rule). Introductory textbooks will note that the general rule of thumb is that a coefficient has to be twice the size of the standard error for us to conclude the coefficient is statistically different from zero. So looking back at our simple regression, the coefficient for points scored per game is 6.33. The corresponding standard error is 0.74. The ratio of 6.33 to 0.74 exceeds 2, so we can now conclude that points scored per game has a statistically significant impact on voting points (although the strength of our conclusion depends on a host of other econometric issues that a reviewer would consider).
What about market size? The estimated coefficient is -0.19. One might interpret this result as evidence that rookies in larger markets receive fewer voting points from the sports media. But that is not the correct interpretation. The standard error for market size is 0.27. So the coefficient, in absolute terms, is not twice the value of the corresponding standard error. From this we can conclude that the empirical evidence suggests no relationship between market size and voting points.
It’s important to highlight my wording. The empirical evidence “suggests” a story. One could come back and say:
- what if we measured market size differently?
- what if we used a different functional form to estimate the model?
- what if we used a different method of estimation?
- what if we had more data?
and on and on….. In other words, even after you run a regression, there are still questions people could ask (and at meetings and in the journal review process these questions tend to get asked).
Even though we have questions, at this point it would be inappropriate to talk about the coefficient we have estimated for market size as being anything else than statistically insignificant. In other words, in interpreting the results we have, our present conclusion is that the link between market size and voting points is statistically insignificant. We do not say (and this point should be emphasized) the “coefficient is insignificant” and then proceed to tell additional stories about the link between these two variables.
One of my co-authors puts it this way to her students.
“When I teach econometrics I tell my students that a sentence that begins by stating a coefficient is statistically insignificant ends with a period.” She tells her students that she never wants to see “The coefficient was insignificant, but…”
Unfortunately I don’t always see people on-line following this advice. I have seen people report regression results but fail to note standard errors. Or standard errors are reported but the statistical insignificance of the results is ignored. Hopefully this brief discussion will help people understand what they are reporting and furthermore, what they are reading.
Let me close by noting there are a number of issues to consider in reviewing Table One. For example, the variables were logged so the estimated coefficients are actually elasticities. In addition, the estimation method [i.e TOBIT] was utilized because rookies who did not receive any votes were considered. In an article, each of these issues would be noted and explained. Since this is a blog post, though, I think I will avoid writing a few more paragraphs and just end the post.
The WoW Journal Comments Policy
Our research on the NBA was summarized HERE.
Wins Produced, Win Score, and PAWSmin are also discussed in the following posts:
Finally, A Guide to Evaluating Models contains useful hints on how to interpret and evaluate statistical models.