“Science is the belief in the ignorance of experts.”
― Richard P. Feynman
Yes, before you ask, as is contractually required by any and all bloggers I will be talking about the unlikely Jeremy Lin. Now, I know we touched on this yesterday but our goal today is different. My take will be different. You see rather than waxing poetic about the unbelievable and unpredictable nature of basketball or focusing on how no one could have seen this coming, I’m going to focus on how we kind of did.
Because when faced with a supposedly unsolvable problem, we brought the science and science once again beat the experts.
The problem I’m alluding to is evaluating talent in the NBA draft. Anyone who knows me knows I love to write about the draft. For those who don’t, hello you must be new here. Just in case, let me illustrate that by throwing some links up for your viewing pleasure.
- Part 1: Finding Elite Rookies in the NBA Draft or How the NBA Draft is a Lottery
- Part1a: The Top 33 Rookies in the Past 33 Years
- The WSJ Piece: Arturo Galletti Evaluates 30 Years of the NBA Draft for the Wall Street Journal
- Part 2: Ranking 30 Years of Draft Picks
- Part 3 :A Sunday Kind of Piece: Return of The Draft (Where I ask How good are GMs at finding talent? The answer: Not Very
- Part 3a: The Draft,The Draft,The Draft………
- Where I concluded that:
- Talent is always available in late in the first round.
- The trick is finding it with some accuracy (which I postulate we can do).
- Given that the identification is risky a later,cheaper pick is better.
This lead to a lengthy draft strategy segment in my guide to running an NBA franchise (Build me a winner rev.2).
The key takeaway was that talent was that I needed to build an effective draft model to predict player performance based on publicly available data. I built two (go here for the model build parts 1 & part 2 ). In very general terms the models use the available data to predict future performance for each player coming into the draft from college. Based on that prediction a ranking is done and a draft recommendation is generated.
Now this model is a work in process, I build it then publish it then go back at some future point to review to see if it worked. I will make corrections as needed over time.
One of the key ideas is having a public build to allow for peer review and answer the skeptics.
For the purposes of this discussion for example I will focus on the last 2010 build (see here) because at the request of some of our loyal readers I had included the best undrafted rookies. Can you guess who was number one?
Mr. Lin actually was the number tenth overall ranked prospect on our draft board and easily the best undrafted. The model had him slightly below the draft treshold. Given this and a few other similar data points, I moved the treshold slightly down to .090 WP48 for Model #1 and .060 WP48 for model #2. You will see the results of that in the numbers that follow.
Why should you care exactly?
It’ll make more sense if I just give you the full story:
That’s every drafted player coming from the NCAA’s from 1995 thru 2010 who’s played at least 400 minutes in the NBA (2010 shows additional players who haven’t played those minutes yet). It shows the player’s draft year, where he was picked, the model predictions and the player actuals for his first 4 years and his career. For 2010 for example, we can see both the Knicks starting guards in the top 10 but this could simply be coincidence. Did the models actually do anything?
A simple test is to look at correlation between the place the player was picked, where the models suggested picking him and actual rank by draft in terms of production. Draft order vs production shows minimal correlation with an R-square of about 5%. It jumps to 25% for the predicted production rank.
A more complex and interesting test is to look at:
- The probability of landing a better than average player (>.090 WP48)
- The probability of landing a good player (>.150 WP48)
If I do this for all picks by the Models as well as all draft picks and Model picks taken after the top 5 picks I get:
So to review, using publicly available data we built a model that picks draft winners at a 75% rate which is better in general than having the #1 pick in the draft and big winners at a 40% rate which is better than everything but the #1 pick.
P.S. How about one more bonus table?