Conventional wisdom is a tricky beast: If “everybody knows” something, it is possible that the belief is justified. On the other hand, it’s not that uncommon for many people to all make the same mistake. For instance, consider the recent banking crisis. Everyone knew homes were a safe investment and betting on them gave good returns. Except towards the end, when it all fell apart. The basic point is that ideas have to stand up to scrutiny, and as we can see, reality provides an excellent test.
There are good tests and bad tests. While intuition and estimation can be useful for formulating ideas and quickly, these techniques are not rigorous and have no place in arguments. Some particularly egregious examples are the “laugh/smell/eye” tests, which pass simply on an appeal to conventional wisdom. In fact, I very much oppose the this test, even when used by heroes of mine. I made the following point on Twitter to basketball analytics icon Dean Oliver:
To which, Dean Oliver, who incidently does not yet follow me on Twitter, responded:
Smell test is when it smells bad to everyone, not when good to one person. Weak test.
And my final response was:
In practicing what I preach, I’ll be re-reading section in BoP then judging. I’ll give write up Facebook :)
Except, my write up will be here instead. Dean still isn’t off the hook as the smell test here is defined as looking correct to many people (I expect everyone is a hyperbole as by definition no test could ever pass the smell test if even one person disagreed)
I did dig up Basketball on Paper to grab Dean’s exact words to see exactly how he broached the subject. I got to do a flier on Bill James earlier this year, would this be my chance to add Dean Oliver to the list? The answer is, well, I get to agree and disagree with Dean Oliver several times in the span of several paragraphs.
Other people have thought of this concept [Adjusted Plus-Minus], but Winston and Sagarin got hold of the play-by-play data to do it. Despite the concept making sense, the results – as we like to say in this business – “don’t pass the laugh test” Winston/Sagarin’s results suggested than in 2002, Shaquille O’Neal, commonly viewed as the best player in the league, was only the 20th best player in the NBA. Their results, also suggesed that rookie Andrei Kirilenko, not commonly viewed as even being in the league’s top fifty, ranked second among NBA players in overall contribution. See? Doesn’t pass the laugh test! Or the rolling eye test. Winston and Sagarin acknowledged those embarrassments and were reported to say that they “don’t claim their rating is the only tool for evaluating a player’s value” Oh well.
Dean goes on to discuss Winston and Sagarin’s boldness in asking for payment for such a black box and ends with this:
Understanding Why is as important as understanding what or how much. If you cannot explain why Kirilenko ranked so much higher than O’Neal, there is great reason to doubt the result.
Sadly, while the start of the quote makes it look like my write up to Dean will allow me to be the victor, the context makes Dean’s point more solid. Yes, Dean does bring up the Smell Test (see I even got the name wrong in remembering. Turns out our memories suck!) And the exact point he uses it on is interesting. In 2002, I would in fact give credence to the notion that Andrei Kirilenko was better than Shaq. Observe, via the NBA Geek comparison engine
2001-2002 Shaquille O’Neal vs. Andrei Kirilenko Stats via the NBA Geek
|Stat||Shaquille O’Neal||Andrei Kirilenko|
|Wins Produced per 48 Minutes||0.242||0.259|
|Points per 48 Minutes||36.1||19.6|
|Defensive Rebounds per 48 Minutes||9.5||5.6|
|Offensive Rebounds per 48 Minutes||4.7||3.3|
|Assists per 48 Minutes||4.0||2.1|
|Turnovers per 48 Minutes||3.4||2.4|
|Blocks per 48 Minutes||2.7||3.5|
|Steals per 48 Minutes||0.8||2.6|
Accounting for the fact that Kirilenko was a small forward and Shaquille O’Neal was a center, and examining the stats, the notion that Kirilenko was near Shaq’s level isn’t so crazy. Now, I’m perfectly fine with the point that a high volume, high efficiency, dominant center is more valuable (due to rarity) than a defensively powerful small forward. The numbers take the notion of who is better (Shaq wins on totals, Kirilenko wins on per -minute) from eye-roll to at least debatable.
My victory was almost sealed by the initial comment and the anecdote! When Dean explains his point though, I basically end up agreeing with him. If you present an idea that flies in the face of conventional wisdom, you HAVE to be able to explain it. To merely present an idea, and defend it by saying “We don’t know why it says that, we just know it says it.” is unacceptable. So is the smell test an acceptable test? I still argue no. But Dean argues it’s a decent filter, which I can agree with. If you can’t answer the very first “why” in defense of your point, and the point goes against the norm, perhaps it’s time to go back to formula.
I’d actually like to end this post with a quote very early in Basketball on Paper, which I also get to disagree with :)
To Paraphrase some famous guy who wrote a bunch of baseball books: reducing quality to one number has a tendency to end a discussion, rather than open up a world of insights
Now, as you may notice, we tend to use “one number” that’s built to evaluate quality around here. And as Dean points out, when such numbers are opaque, there are problems. But when you understand these numbers, I don’t think the discussion ends. In fact, I treat it more like a search tree. A real world example of this is the game 20 questions. You ask a question, and that leads to more specific questions. When you start with a metric like Wins Produced, you are asking in a sense “Is the player good?” but this opens up perfectly to the next question – “Why are they good (or bad)” and because we know how the formula is broken down and have great sites like “The NBA Geek“, the discussion doesn’t end, it just begins!
This is in fact a general point for many stats. Things like Assist Percentage, Usage, and Rebound Percentages should not be the end of the discussion. Any “single number” stat should be just one question on the road to hopefully a bigger and more interesting conversation. At least, that’s how I view them.