Historical Baseball Data Part 2

In Part 1 we looked at  real team win loss data.  Today we’ll  compute deltaWAAs for starting pitchers and starting lineups.  In Cubs series matchup reports we show the difference in starting pitchers from the Ouija Board and then list the lineup WAAs.  Since this WAA measure is additive the sum of all players in a lineup or any group represents the WAA for the entire group.  There is a proof for this.

We have daily snapshots of MLB taken from event data curated by retrosheet.org that can be downloaded here.  The data gets parsed and run through this data model to compute WAAs and rankings for each day.  We know the strength of lineups and starters for each game and we know who actually wins or loses those games.  The data below represents all games from 1970-2016 excluding March and April.

Here is a table for pitchers.

deltaWAA # Games WinPct
1 21707 0.511
2 16718 0.529
3-4 18960 0.539
5-7 9027 0.565
>8 2260 0.593

These categories or sets are different than the real win loss categories because the range of deltaWAAs for a single player is far smaller than for an entire team.   Because of this and that player WAAs have error in the estimation process whereas a real team WAA is 100% accurate, we need to consolidate into 5 sets.  Each set has roughly the same number of games.  Games with starting pitching deltaWAA between 0 to 0.5 are thrown out.

You can see from the above table WinPct increases for a team as deltaWAA between starting pitching increases.  One would expect that.  Let’s look at today’s Cubs game against ARI as an example.  From the Ouija Board here are the two starters.

DATE 08_03 2:20_PM Aug_3  ARI CHN
LINEAWAY ARI [ 0.465 ] < 0.439 >
STARTAWAY 4.79(0.658) Zack_Greinke_ARI
LINEHOME CHN [ 0.574 ] < 0.580 >
STARTHOME 0.53(0.519) Jose_Quintana_TOT

ARI has the better pitcher with Greinke today so

deltaWAA = | 4.79 – 0.53 | =~ 4

According to the above table a deltaWAA = 4  translates into a WinPct of 0.539 or 0.54 for Arizona.  Right now the market has the Cubs favored at 0.58.

What about lineups?  The table below shows that.

deltaWAA # Games WinPct
1-2 17144 0.525
3-5 21838 0.523
6-9 19848 0.526
10-15 14632 0.529
16-inf 9188 0.525

Lineups are sums of 9 players so the range of WAAs are much greater than pitching.  Something very interesting can be seen in this table.  One would expect a higher deltaWAA for a lineup would lead to a higher WinPct but that’s not the case here.  WinPct remains constant no matter what the difference is.  Baffling!

The only theory that can explain this phenomenon is that a team with a very good lineup will probably be a team with very bad pitching.  Since this is a calculation of an entire dataset of games between 1970-2016 everything will converge to  average.  What matters is the lineup/starting pitcher combo which we’ll present in a future post.  When a bad lineup goes against a bad pitcher it’s almost the same as a good lineup going against a good pitcher.  That’s what the above table represents and it’s a terrific clue.

There might be some of that influence on the pitcher table too.  Since pitchers are only a single player it may not matter as much.

We did lineups at the start of the CHN/ARI series and ARI is ahead with a deltaWAA of around +12.  I’m not sure plugging that into the above table means anything.  We need to see the pitcher/lineup tables to figure this out.

That is all for now.  We’ll pick this up in a couple of days.  There’s another Cub series coming up and we’re going to revisit the Iowa Cubs and AAA to see what is going on there.  Until then….