# Why It’s Impossible to Solve Basketball

A very popular form of analysis is to look at lineups. Basically, can we determine how well players do together and estimate how well they’ll do against an opponent?

Another popular theme, when it comes to basketball, is that the game is very complex and hard to understand. I disagree with this theme. However, if you use the lens of lineup data it turns out this is true. In fact, I’m here to say if you use lineup data to try and “solve” basketball, you may as well give up.

## Counting That High Is Difficult

Lineups are combinations of players. It turns out that combinations grow rapidly. The math I’ll use is the Binomial Coefficient. The Wikipedia article is a fun read but in case you don’t feel like skimming it, I’ll break it down quickly. If I have a group of players, I can choose a lineup of them. If I don’t care about order (e.g. Kidd, Terry, Marion, Nowitzki, Chandler is the same as Terry, Kidd, Nowitzi, Marion, Chandler) then we actually have the math to count how many options we have (aren’t nerds great?)

Let’s start with the most complicated case. If I had a team of 12 players that could play every position (all LeBrons for instance) then I could choose a lineup of five players from a roster of 12. This gives us our “worst case” scenario, which is a team of 796 viable lineups.

“That sounds like a lot”, you say. Luckily, players usually only play a position or two, so it gets easier. Let’s use the 2012 Oklahoma City Thunder as an example. I’ll even simplify it further by limiting us to three positions – Point Guard, Guard-Forward and Forward-Center. A team has to put out a lineup of a Point Guard, two Guard-Forwards and two Forward-Centers. Using Yahoo Sports and Popcorn-Machine, here’s how the Oklahoma City Thunder’s finals roster looked last year.

- R. Westbrook (PG)
- D. Fisher (PG)
- R. Ivey (GF)
- L. Hayward (GF)
- D. Cook (GF)
- J. Harden (GF)
- T. Sefolosha (GF)
- K. Durant (GF)
- C. Aldrich (FC)
- N. Collison (FC)
- S. Ibaka (FC)
- K. Perkins (FC)

I have two option at PG, six options at GF and four options at FC. If we do the math on this — *For those of you that read the article, that would be: (2 c 1) * (6 c 2) * (4 c 2) — *we find we have 180 possible lineups! That’s a huge amount. This grows even further if we start to say things like: Durant can play at the power forward or that Westbrook is really a shooting guard.

What’s more, the other team has a huge number of lineups. In the absolute “simplest case” (I got this with 1 PG, 2 GF and 9 FC. Understand this means the backcourt would play the whole game) we could get a team with “only” 36 lineups. And each of those lineups could in theory match up with each of the opponents lineups. This means the possible lineup duels two teams could in theory have is somewhere between 1000-600,000, with around 30,000 being the most likely!

## What can lineup data tell us?

The basic problem we are faced with when it comes to lineup data is the sheer possibilities. There’s another big question, how much information is actually contained in the data we do have? 82 Games kindly tells us the top 20 lineups for each team. If we use 2011 (I picked 2011 over 2012 to avoid lockout arguments) we find most teams still have a lot of information missed. Here’s a quick rundown of the percentage of team minutes held in the top 20 lineups of each team.

### 2010-2011 Minutes in top 20 lineups by team. Data via 82games.com, Playoff teams in Bold

Team | % Minutes in Top 20 Lineups |
---|---|

L.A. Lakers |
80.5% |

Chicago |
67.0% |

OKC |
65.8% |

Portland |
60.6% |

Houston | 57.7% |

Memphis |
57.5% |

Boston |
56.4% |

Philadelphia |
55.9% |

Golden State | 54.7% |

San Antonio |
54.5% |

Dallas |
54.0% |

Phoenix | 53.5% |

Indiana |
52.7% |

New Orleans |
52.1% |

Orlando |
51.1% |

Miami Heat |
50.8% |

Utah | 49.5% |

New York |
49.0% |

Charlotte | 47.6% |

L.A. Clippers | 47.1% |

Atlanta |
46.2% |

Detroit | 43.2% |

Minnesota | 42.0% |

Cleveland | 41.5% |

Denver |
41.4% |

Milwaukee | 39.5% |

Sacramento | 39.2% |

New Jersey | 38.0% |

Toronto | 37.7% |

Washington | 33.0% |

Barring the Los Angeles Lakers, virtually ever team has at least 1/3 of their minutes left unexplained by their top rotations. What’s more, most team are actually closer to 50% when in comes to their minute allocations. Here’s a simpler breakdown:

- Average % of team minutes used by top 20 lineups: 50.7%
- Std. Dev % of team minutes used by top 20 lineups: 9.9%

For playoff teams:

- Average % of team minutes used by top 20 lineups: 53.3%
- Std. Dev % of team minutes used by top 20 lineups: 8.8%

It’s also worth noting that lineups drop off quickly. Detroit’s 20th lineup had the most minutes at 50.5 (or around a game) So when looking to team lineups, you only get a small subset of all possible combinations. What’s more, you get very limited data on most of these.

## Summing Up

We have a large set of possibilities and a very small amount of data to try and explain it. If we go this route when explaining basketball, it’s easy to see why it looks impossible. And in fact, if this was the only route worth going, I would agree with you.

The good news is the data supports a different notion. Players, regardless of lineup, tend to be pretty consistent. What’s more, good players have a significant edge in the NBA. It turns out that the NBA’s box score (courtesy of Lee Meade) is a tremendously valuable tool for any team.

Lineup data may seem valuable. As with all data the real question is what can it tell you?When it comes to the assumption of lineups and how much we can learn, the answer is that it can’t tell you that much.

-Dre