On Thursday, May 26, 2016 at 12:37:08 PM UTC-5, Mordecai Brown wrote: > On Tuesday, May 24, 2016 at 3:05:37 PM UTC-5, Michael Sacks wrote: > > RBI tends to be higher for stronger hitters, but also tends > > to be higher for players who have teammates who get on base. > > OPS doesn't worry about trying to account for that lurking > > variable, but RBI totals run into that problem. > > You're absolutely right, Hits are an individual metric, runs > a team metric which is why Bill James focused all his > attention on hits. The team is the environment a player must > suffer or benefit. > > My system assigns this blame using runs not hits ... I jumbled my explanation and it is the crux in the difference between a run based model and a hit based model. Michael's post caused me to think about this some more. Here is what I was trying to say. Sabermetrics uses this Hits ---estimate---> Runs ---estimate---> Wins There is error in each estimation. The estimation from runs to wins is being fed errored data. There is no point estimating Runs because we already know what they are with absolutely no error at all. Managers of baseball teams need to know some of these hit stats for day to day decision making. A manager does not want to put in a high WHIP guy with the bases loaded unless he has no choice. That is where WHIP is useful. It is not useful as a value indicator, value being a measure as to how much this player contributes to the bottom line of a team winning. A batter hitting 0.350 could have less value than a batter hitting 0.250 but when a manager makes his lineup he puts the 0.350 guy on top because they make less outs and the top of the order gets more plate appearances. BA can be almost used as a probability right out of the box. A manager can estimate pinch hitting a 0.350 guy in a RISP situation gives him a 35% chance of success over 25% with the guy hitting 0.250. This is why they have kept track of batting averages wince 1871. Slugging ratio cannot be used as a probability at all because it has a range of 0-4. Probabilities must be between 0-1. James' philosophy of every player being their own island fails the fairness test. When assigning value in any system there are winners and losers. Not everyone can be the greatest. This is why I do sorts with rank because just showing a number is meaningless without context. Guys who make a lot of hits do well in James' model at the expense of guys who actually drive in runs. Most of the time the guys getting a lot of hits are the guys scoring all the runs so it doesn't matter. Where it matters is the muddy middle of the league where average and slightly above average players lie. It is unfair not to recognize players who actually drive in and score real runs from a historical perspective. And thanks Michael for making me think about this. This should have been a topic for winter when we didn't have a championship caliber team to follow. As one final note of interest I ran some numbers trying to prove (you can't disprove unless you can prove) James' run creation theory which is the basis for all the RC stats. Total Bases intrigued me. I wondered how many runs get scored on each of the various hit types. Here's what I have so far: TYPE|R/event|std deviation WALK 0.021 0.004 SINGLE 0.226 0.010 DOUBLE 0.425 0.032 TRIPLE 0.642 0.041 HOME_RUN 1.588 0.033 Second column is runs scored every time that hit type occurs. A home runs drives in 1.59 runs on average. Data from 1980-2015. What I found interesting is the ratio between single/double/triple/homerun. It's almost 1/2/3/8. This ratio is almost what is used by total bases except home runs are only 4x not 8x. I was looking for a reason total bases is a valid measure and this could be it and why when used to estimate run creation it actually comes close to reality.