clock menu more-arrow no yes mobile

Filed under:

Stop stat abuse

Statistical analysis is a very important part of baseball, but ESPN's new push could drive people away from them permanently.

It seemed strange when ESPN announced their acquisition of Nate Silver's site FiveThirtyEight. The Worldwide Leader has worked to make us all dumber by putting Skip Bayless and Stephen A. Smith front and center to have passionate debates over opinions1 no one could actually defend. On the other hand, FiveThirtyEight's message was don't listen to the pundit's narratives, the hard data in the polls are much more indicative of reality. After nailing the results of two presidential elections that politicos insisted were a toss up, Silver showed how useless talking heads are. Maybe this alliance meant ESPN really did intend to change for the better. Turns out that's the opposite of what happened.

One of the first sports articles posted on 538 was "Wayne Gretzky had it easy", a big shot at the guy that's by all accounts the greatest hockey player of all time. The article points out that save percentages across the NHL have gone up 4.5%2 over the last 30 years and...that's about it. Never mind that Gretzky has over 50% more points than any other NHL player he still "had it easy".

Still, FiveThirtyEight needs to make money and "Wayne Gretzky had it easy" is bound to generate more clicks than "save percentage in the NHL has gone up the last 30 years, don't bother clicking here that's literally the whole story" so maybe things would get better.

The next day brought us "What to expect from Baseball America's top 100 prospects". Silver's PECOTA method was the first major projection system so surely this will be an in depth look at how a better prospect than Mike Trout3 will perform. Maybe combine his performance with the glowing scouting reports, talk about how rare is combination of speed and power is on a player with his frame is wait, they're just going to find the average WAR of every previous number one prospect and go with that. Sorry Byron, because Todd Van Poppel and Delmon Young were huge disappointments we can't just expect much from you.

Data is the easy part of statistics, the hard part is interpreting that data. If the conclusion of that top 100 prospects article was "non elite prospects tend to miss" there'd be no problem. When you look at the data and decide "Buxton will be worth 17 wins over the next seven years" you're just making something up, but you decided to make it look better by putting a graph next to it. Ultimately, these kind of pieces that Neil Degrasse Tyson over here posted are harmless. 95 percent of the people who would see that will reject it for being too dorky or not dorky enough and move on. The next step in the ESPNization of 538 is far more insidious.

As much as people try to tell you otherwise, stats don't lie. "Jack Morris would have one of the highest ERAs in the Hall of Fame" is not something up for debate as much as some would like to think otherwise. However, translating those numbers into something that predicts human behavior is damn near impossible. ESPN's 2014 MLB preview brings us their new chemistry score. Through some simple inputs ESPN's crack team is able to determine how much people like each other4. Not only that they've managed to turn compatibility into wins.

Unfortunately, part of this algorithm actually involves grading a team on racial purity. This leads to someone saying the difference in the AL West could be that "Dominican reliever Fernando Abad is the only nonwhite pitcher on the [Athletics] projected roster", and the Dodgers will suffer because Hyun-jin Ryu and Kenley Jansen are the only people from their respective countries on the roster. Suggesting that the Dodgers would be better off if they managed to trade Ryu and Jansen for their white guy equivalents would look horribly racist 25 years ago, let alone today. However, because "it's based on real math" there will be some people out there that take this seriously and argue the Dodgers should deal Ryu for Mike Minor.

This isn't the last we're going to see of this either. Baseball Prospectus suggests that we have the potential to use the new field f/x data to look into a players mental state and see if he's depressed. This is incredibly dangerous. Being able to measure every single thing a player does means you can certainly find a smoking gun somewhere to prove whatever it is you want to, and making a leap from "this guys first step is slower" to "he's suffering from mental illness" is hugely irresponsible. Again because it's backed up with "evidence" people are going to take opinions like that much more seriously.

Baseball stats are very good at telling you how good someone is at baseball, but as much as stat nerds would like it to be true, they are very bad at modeling human behavior. Hazarding guesses at someones mental state via statistical analysis is at best going to be ignored, and at worst going to be taken seriously by some and cause others to disavow stats forever. I run a site that tracks minor league numbers, yet I feel more and more inclined to apologize for using anything more advanced than OPS thanks to high profile people using numbers badly. Information is great, but the only way people are ever going to take stat guys seriously is if they start showing some responsibly.

1. Sure Mike Trout is great, but let me ask you this: can he be the best without the charisma of Derek Jeter?

2. This article also commits a major statistical sin with really misleading axes.

3. This opinion is insane, but bear with me.

4. Why they didn't use this gift to build the world's most successful dating site I don't know.