The problem with high-volume low-usage analysis

big-data

I came across this great blog post: Why Sports Analytics Fans Shouldn’t Expect the Big Data Revolution Any Time Soon at theSpread.us. While the popularity of the Sloan Analytics conference has grown by leaps and bounds, and the excitement over sports analytics is certainly high, a few remain skeptical. One of my favorite writers on the subject is Kevin Draper over at “the Diss NBA”. 

Yah! Data!!

Allen Iverson wants the crowd to be louder

A problem I’ve found with sports analysis is the love of data without the required understanding that goes with it. This hearkens back to a fun basketball analogy. In basketball scoring lots of points is viewed as good. And yes, many players that score lots of points are helping their teams. There’s a very important fact many miss. Every shot costs your team a valuable possession. Without examining how often your player is actually getting the ball through the hoop, you can just be impressed by lots of shots (Allen Iverson is at the pinnacle of this. He won the NBA MVP in 2001 despite being slightly better than chance at getting the ball through the hoop).

Data is very similar. We have tons of it. Except, not all of it is useful! The problem I see is that there is not enough emphasis on seeing what value the data has, and seeing how to use it. There is a huge emphasis on collecting more data though. We’re in love with this! Teams got the boxscore in the 1970s because they needed better stats (at least Lee Meade in the ABA thought so). Then the 2000s saw us get easier access to play by play. And we’re now getting access to visual tracking data of every movement on the court! And yet, through all of the “revolutions”, I’m not seeing teams slow down to see if the data is useful or how to use it. No, I’m seeing that the trend is to grab more data! As soon as we get more data, the argument goes, we’ll finally understand the NBA. Except, very few people understand the data we have now!

A Simple Game

images (2)

At theSpread, they discuss that while the vendors are selling “big data” and huge systems, the research is much more focused. And this may not be a bad a thing. One of my favorite quotes from Bull Durham sums up baseball as follows:

This… is a simple game. You throw the ball. You hit the ball. You catch the ball.

Basketball is also pretty easy – get the ball, keep the ball, put it through the hoop. We’re expecting that more and more data will unlock huge insights. Much of this was started with the “Moneyball Revolution”. Michael Lewis’ book explained how the poor Oakland A’s could still compete at baseball. Much of this was thanks to the work of general manager Billy Beane. This clearly showed the value of “big data” right? Well, not really. The key to Beane’s success was understanding a very simple set of metrics:

  • Getting on base matters! This is true for batters and pitchers!
  • Younger players are hard to predict and overrated.
  • Teams care too much about players “looking like ball players”

If anything, Beane ignored a lot data and focused in on what mattered! Yet, the movement after him has been a scramble to grab as much data as possible and hopefully find some hidden gem within it. And, I’m all for getting more data. I just want to stress that on its own data is neither good nor bad. If I tell you that I have thirty gigs of player data on my computer, is that useful? Maybe. If I tell you that player X shot 30 shots last night, is that good or bad? We don’t know until we see how many points they scored! Being impressed by big numbers, be it points in basketball, or gigabytes in data, is bad if we ignore context. And that’s why it may be important to stop focusing as much on “Big Data” and more on “Useful Data”

-Dre

Comments are closed.