Hit vs. Run Model

On Thursday, May 26, 2016 at 12:37:08 PM UTC-5, Mordecai Brown wrote:
> On Tuesday, May 24, 2016 at 3:05:37 PM UTC-5, Michael Sacks wrote:

> > RBI tends to be higher for stronger hitters, but also tends
> > to be higher for players who have teammates who get on base.  
> > OPS doesn't worry about trying to account for that lurking
> > variable, but RBI totals run into that problem.
>
> You're absolutely right, Hits are an individual metric, runs
>  a team metric which is why Bill James focused all his
> attention on hits.  The team is the environment a player must
> suffer or benefit.  
>
> My system assigns this blame using runs not hits ...

I jumbled my explanation and it is the crux in the difference
between a run based model and a hit based model.  Michael's
post caused me to think about this some more.  Here is
what I was trying to say.  Sabermetrics uses this

Hits ---estimate---> Runs ---estimate---> Wins

There is error in each estimation.  The estimation
from runs to wins is being fed errored data.  There is
no point estimating Runs because we already know what they
are with absolutely no error at all.

Managers of baseball teams need to know some of
these hit stats for day to day decision making.  
A manager does not want to put
in a high WHIP guy with the bases loaded unless
he has no choice.  That is where WHIP is useful.
It is not useful as a value indicator, value being
a measure as to how much this player contributes
to the bottom line of a team winning.

A batter hitting 0.350 could have less value than
a batter hitting 0.250 but when a manager makes his
lineup he puts the 0.350 guy on top because they
make less outs and the top of the order gets
more plate appearances.  BA can be almost used
as a probability right out of the box.  A
manager can estimate pinch hitting a 0.350 guy
in a RISP situation gives him a 35% chance of
success over 25% with the guy hitting 0.250.  This
is why they have kept track of batting averages
wince 1871.  Slugging ratio cannot be used
as a probability at all because it has a range
of 0-4.  Probabilities must be between 0-1.

James' philosophy of every player being their
own island fails the fairness test.  When assigning
value in any system there are winners and losers.
Not everyone can be the greatest.  This is why I do
sorts with rank because just showing a number
is meaningless without context.

Guys who make a lot of hits do well
in James' model at the expense of guys who actually
drive in runs.  Most of the time the guys getting
a lot of hits are the guys scoring all the runs
so it doesn't matter.  Where it matters is the
muddy middle of the league where average and
slightly above average players lie.

It is unfair not to recognize players who
actually drive in and score real runs from a
historical perspective.  

And thanks Michael for making me think about
this.  This should have been a topic for winter
when we didn't have a championship caliber team
to follow.

As one final note of interest I ran some numbers
trying to prove (you can't disprove unless you
can prove) James' run creation theory which is
the basis for all the RC stats.

Total Bases intrigued me.  I wondered how many
runs get scored on each of the various hit types.

Here's what I have so far:

TYPE|R/event|std deviation
WALK 0.021 0.004
SINGLE 0.226 0.010
DOUBLE 0.425 0.032
TRIPLE 0.642 0.041
HOME_RUN 1.588 0.033

Second column is runs scored every time
that hit type occurs.  A home runs drives
in 1.59 runs on average.  Data from 1980-2015.

What I found interesting is the ratio
between single/double/triple/homerun.

It's almost 1/2/3/8.

This ratio is almost what is used by
total bases except home runs are
only 4x not 8x.

I was looking for a reason total bases
is a valid measure and this could be it
and why when used to estimate run creation
it actually comes close to reality.