One of the frustrations readers have with this forum is how the Wins Produced analysis is presented. Across the regular season, one can only see the Wins Produced of a team’s players when I choose to present this information (a choice that depends upon how much time I have). Consequently, fans of most teams only get to see this information once or twice during the season (and for fans of the Pacers and Clippers…well, I hope to say something this summer).
The infrequent offerings are a reflection of the how Wins Produced is calculated. To analyze a team I have to download player and team data. Then I have to follow all the steps reported in various articles, The Wages of Wins, Stumbling on Wins (and at Calculating Wins Produced).
Many of these steps could be automated. But as people who have asked about this have discovered, complete automation runs into two significant roadblocks.
First of all, the measurement of a player’s productivity requires that one consider such issues as team defense, team rebounds, and team turnovers. These calculations – described in the above writings – are somewhat complex (although I don’t think that difficult to follow).
Perhaps more important is the issue of position played. The Wins Produced calculation requires that all players be compared to the average at their position. And this requires that all players be allocated to the five positions of basketball (i.e. center, power forward, small forward, shooting guard, and point guard). In the various discussion of how Wins Produced is calculated, it was emphasized that the assignment of positions required some judgement calls. These judgement calls seemed to cause a difficult barrier to overcome for those interested in completely automating the calculation.
Across the years, various people have tried to tackle this problem. So far, though, none has succeeded to automate the Wins Produced calculations. At least, that was true until Andres Alvarez took on the challenge. As was reported in the comments section a few days ago, Andres – or Dre from the comments section – has managed to create an automatic Wins Produced calculation.
To explain the process, Andres has prepared the following YouTube presentation:
As Andres explains, his process allocates players across positions according to position identification, player height, and assists. To illustrate, imagine a team with an abundance of centers. The method Andres developed is to move the shorter centers into the power forward position (or the taller power forwards into the center position). This process is also utilized when looking at small forwards and power forwards. Turning to guards, Andres takes into account assists. Specifically, guards with more assists are moved into the point guard slot.
This entire process is driven by the measurements Andres is able to download. In other words, no judgement calls are made. And that means – as Andres observes — this process is not perfect. But Andres notes he can evaluate the entire league in less than four minutes. To put that in perspective, I can’t evaluate one team in four minutes. So what Andres has done is extremely important for people who wish to see a Wins Produced calculation for their favorite NBA players as the season unfolds.
There may be one downside to all of this. Once Andres gets his work hosted (he is working on this), then it appears I will no longer be needed during the regular season. Instead of waiting for me to analyze each and every team, Andres will be presenting all the teams all the time.
And that means maybe I do a bit less blogging and a bit more research (or take up skiing).
Regardless of what I am doing, I hope everyone appreciates what Andres has done. And it looks like the Wages of Wins coverage of the 2010-11 season is going to be much more timely – and much better – for everyone.
- DJ
The WoW Journal Comments Policy
Our research on the NBA was summarized HERE.
The Technical Notes at wagesofwins.com provides substantially more information on the published research behind Wins Produced and Win Score
Wins Produced, Win Score, and PAWSmin are also discussed in the following posts:
Simple Models of Player Performance
What Wins Produced Says and What It Does Not Say
Introducing PAWSmin — and a Defense of Box Score Statistics
Finally, A Guide to Evaluating Models contains useful hints on how to interpret and evaluate statistical models.
Wow! Thanks for taking the time to do this, Andres, this is great to have. Perhaps Dave can focus only on writing things critical of Kobe Bryant now?
Perhaps weight can be taken into account for the PF/SF disambiguation as well. For example, Kevin Durant is tall for a small forward, but because of his small frame, this is the position he usually plays.
I would love to see more articles on Kobe Bryant, Allen Iverson and of course Carmelo Anthony.
I would personally LOVE to see more articles about Ben Gordon, Jonny Flynn, Mo Williams, Jermaine O’Neal, Gilbert Arenas, Thaddeus Young, and Glen Davis.
Well done Andres Alvarez, look forward to the results.
Dave – look forward to the unique insights you will bring once all your data crunching time is freed up.
Also given the availability of the data going forward, will be interesting to see the Wins Produced per dollar of salary expended by team, this might extend the WP debate from just MVP to best GM.
Damn, cool approach on the position breakdown. I had the stats upload down but the other was the problem. 4 minutes sounds like a long time though. I now have something to do at work :-)
This is the best news. I’ve been dying for someone to provide these stats on a daily basis. GIGANTIC thanks to Andres.
My co-workers (who get so frustrated when I dare to claim Kobe and Carmelo shouldn’t be NBA First Team) are about to have to deal with a tidal wave of new data.
Prof. Berri – I’ve got two more chapters to go in Stumbling on Wins, and so far it’s a home run.
Andres,
Thanks so much! I can hardly wait to have all the data more readily available. I’ll second the idea that including weight in the distribution might help clarify the position for some players (you certainly would not have DeJuan Blair end up at SF). Also, I believe there are other sights that provide breakdown of the number of minutes played by a player at each position (although I can’t think what the name of site was now). Anyway, thanks a bunch!
youtube video shows that Chuck Hayes is 100% SG :-)
Andres,
Can’t wait to play with it.
I’m thinking for the PG/SG breakdown, a ratio of shots taken to assists might be a good indicator of a guard’s true nature (although dribbles taken would be the best indicator, but I doubt that’s tracked!).
For the PF/SF breakdown, a ratio of shots vs. rebounds might help there as well, although the height thing may be better.
Thanks again!
Great news! Andre – chapeau bas!
I think position identification would need more work though. I would use not only height, but weight as well (Barkley). And use some stats (a lot of rebounds – it’s somebody who plays close to the basket etc).
I’d think about using a neural network.
This is great news!
I look forward to playing around with this tool this weekend!
82games.com has great data on minute allocation that’s updated every game, which can either be pulled straight from the site with a crawler or via raw data if you can get their permission.
Scroll down to the “Production by Position” in a player’s page– it tells you what % a player played of a team’s total minutes at each position.
For example, Manu Ginobili played 3% of the Spurs’ PG minutes, 33% of the SG minutes, and 17% of the SF minutes.
http://www.82games.com/0910/09SAS5.HTM
It also has Chuck Hayes at 43% C and 1% PF.
http://www.82games.com/0910/09HOU19.HTM
Could the application have position as an input? Instead of position being pre determined is there a way where the user can change the pre-determined position?
This is awesome news. Look forward to hearing back soon about how we can access this data..
Hey all in some point in the near future Dr. Berri will provide me with his position assignments for the last 30 years. We’ll do regression on these using height and weight and use the regression values for future assignment of positions within category (G,F,C). TRad a neural network would likely end up doing regression on a 3 variable problem, so good idea!
@mrparker: The system is such that changing a players position changes the average of at least two positions, which changes numbers for the entire league. So hopefully I can just get a system in that people agree with :)
Thanks everyone for the kind words. I’ll let you know the second it is hosted.
Great work Dre. Getting something up and working is the most important step. I think this could have a big impact on the proliferation of Wins Produced. People will be easier to convince when they can play with the numbers for themselves.
Dre,
Will you eventually be going all the way back to ’77-’78? If you are, I may be forced to ask you to marry me. (My wife has made many sacrifices for my NBA zeal, but this might be too much!)
Seriously, I love this. Great work!
jbrett,
When I post the All-NBA teams we will once again post your letter-coded common complaints. Are there anymore complaints to add this year?
Dr. Berri,
I am determined to be far less cranky on the subject than I have been in the past, but I might have a couple of thoughts, focused mostly on the favorite strawman arguments. Not sure where we left off, so I’ll just throw them out as quotes.
1. “Marcus Camby?!!!! Sure–go ahead and put 5 Cambys on the floor, and see how many games you win.”
2. “It’s just preposterous to try to define the value of any player’s contributions with a single statistic; therefore, anyone who attempts to do so (or anyone who enables that misguided wretch thru positive feedback or affirmation) is by definition not worthy of a scathing criticism, or even a haughty dismissal. I can’t believe I visit this blog ten times a week, I hold it in such disdain.”
(OK, that one might be a bit long-winded even for me.)
I’ll try one more:
3. “These measurements are largely irrelevant because they measure only what a player has actually DONE, and not what he WOULD be able to do if he had better coaching and were playing a different position on a different team in a universe with different laws of physics where I, ZOD, were the UNDISPUTED MASTER OF ALL!”
Sorry, got a little excited there. I’m hoping someone can translate that one into whatever language is spoken in the Bottle City of Khandor. (That would be some dialect of Cryptonian, right? Would it be Cryptic-onian? These are the worst puns ever; I am a hopeless geek at this point.)
Off the top of my head, those are the ones that have stayed with me this season. Surely many others have their own (least) favorite cliched attacks? I hate to hog the floor. (No, really, I swear!)
jbrett,
I love these. How about this one….
4. I have regressed my model on your model and this test — which I clearly made up as a valid test of an empirical model — proves that your model is wrong. In fact, if you ignore the fact I made up the test… it is a devasting and damning proof. At least as long as you don’t also look at the obvious shortcomings of my model (very inconsistent over time, most player evaluations are statistically insignificant).
Okay, that complaint might be too geeky. But it should be on the list someplace.
Probably far past the ZOD comment. That one is a classic.
Dr. Berri,
That one’s a keeper. And how could we have forgotten this one? “All you’re doing is dividing up the team’s wins amongst the players, then making a team adjustment to hide any discrepancies.” Is that the essence of the complaint, or am I missing something?
Or this one: “A correlation of 0.93? Clearly that isn’t possible; therefore, you must not really be measuring anything new or unique.” OK, I’m putting words in people’s mouths now–but they’re the right words.
One more: “Unless and until we are able to tabulate every single act that can conceivably occur on the court, and calculate their relative value in terms of wins, the metric you present here is either so incomplete as to be completely useless or hopelessly skewed and misleading.”
That can’t be all, can it?
Those are all very good as well. I forgot about that specific team adjustment argument.
Actually, why does a player’s position need to be automated? Why not set it as a constant in the database (similar to a player’s name..) and go from there?
It could even be set to allow a player’s production to be calculated against multiple positions, if he plays a certain number of minutes in each position. Although I understand that this would make things harder because of the need to calculate the whole league’s averages, so a full calculation will require the daily data on all the players to be inserted before the entire league is calculated.
Dre,
You, sir, are a gentleman and a scholar. Many thanks in advance.
Pingback: The WoW All-NBA Team and the jbrett Codes Again « The Wages of Wins Journal
There may be one problem with this tool — how is it possible to falsify the data to prove that my favorite player ________ is much better than your favorite player __________?
As to complaints, you’ve forgotten the mother of them all:
5. “What? Your model conflicts with what I think. Obviously therefore it’s wrong.”
Oh, and the sister of them all:
6. “Kobe Bryant — also known as Zod, UNDISPUTED MASTER — is the world’s best basketball player. Let me know when you’ve adjusted your model to reflect that obvious reality.”
Myself and a guy named Dave Meyers (apparently not the ex-Buck who turned to religion — though he will not confirm this definitively) are working on a method to automate “Marginal Win Score” statistics straight from NBA play-by-play data, which will then be presented daily at Courtsideanalyst.wordpress.com.
As you know, MWS is closer to Pete Palmer’s Baseball Linear Weights, in that considers both “batting runs” (win score) and “fielding runs” (opposition win score) in its calculation, so we have to have a second-to-second automated “matchup judgment” mechanism if the system is to work.
Since the judgment is always made based on the matchups at the “defensive” end of the court, its not that difficult.
Offensive designations can be arbitrary (what is Jason Kidd), but defensive matchups are almost always easy to guess at using a height/weight/normal position balancing test.
Realizing that, we’ve come up with a system where we rank each roster player from 1-to-whatever, based on each player’s height, weight, and natural position. Then we have the automated system match the ten players from lowest to highest ranked.
We’re testing it with real world examples at the moment. Its pretty accurate. Of course, had LeBron James defended Rajon Rondo last night, as he threatened to do, the computer would have melted down like Hal in 2001 Space Odyssey.
Pingback: Heat Check: Is Dave Chappelle the Dwyane Wade of Comedians after Zo’s Summer Groove? | NBA News Report