Category Archives: Historical

Simulation Reboot Part 1

Work is currently being done on the next iteration of the regular season simulation.  Last year a 5 part series of posts attempted to explain how this simulation worked.   Simulation in the past two years relied on all games from 1970 – present; around 100K games.  The next iteration of this simulation will use the daily snapshots back to 1920 or around 200K games.

Due to major differences in the way teams were managed between then and now there are some issues that need to be resolved.  This set of posts will highlight those issues and some of the decisions being made to address them.  This is after all a log book.

Relief was a big vector for error and how this was solved will be covered later.  Today will be a short, perhaps interesting post about starting pitching between then and now that affects the way the simulator treats starters.

As explained in part 5, there are two kinds of combo pairs; lineup -> starter and lineup -> relief. Each of these pairs is assigned an integer between -6 and +6.  A tier combo of -6 is the worst lineup facing the best pitcher or relief squad; +6 the opposite.

Each game consists of 4 pairs; an away ls, lr, and a home ls,lr.  The simulator will look back into the past and for lineup->starter grab the number of innings pitched and number of earned runs scored for both home and away.  It calculates who won that simulated game, counts it, and does it again one million times per simulation.  At the end it tabulates wins and losses and that’s the estimated probability for the current game.

There is a rather large difference between modern baseball and legacy baseball.  Right now studies are being made to show those differences.  Below is a table showing percentages for starters who pitched complete 9 innings.

Tier Inn 1950 1960 2000 2010
-6 9 0.459 0.413 0.088 0.062
-5 9 0.442 0.392 0.069 0.052
-4 9 0.397 0.350 0.047 0.034
-3 9 0.361 0.315 0.043 0.029
-2 9 0.337 0.288 0.039 0.024
-1 9 0.302 0.270 0.039 0.021
0 9 0.312 0.239 0.030 0.016
1 9 0.280 0.221 0.020 0.018
2 9 0.288 0.208 0.024 0.017
3 9 0.259 0.191 0.027 0.015
4 9 0.261 0.215 0.023 0.023
5 9 0.226 0.167 0.028 0.019
6 9 0.252 0.194 0.027 0.024
TOT 9 0.315 0.257 0.033 0.021

The columns represent a decade of games for the 1950s, 1960s, 2000s, and 2010s.  A -6 Tie combor is the worst lineup vs. best starter, +6 tier combo is the best lineup vs. worst starter.  The TOT row is  average across all tier combos for that decade.

in 1950s  A starter pitched 9 innings in 31.5% of all games.  That dropped to a little more than 1/4 in 1960s.  In modern baseball for which this simulator is supposed to handicap, it’s down to 2.1%.  Even though Tier combo -6 is almost 3x that at 6.2%, it’s still extremely rare.  Managers want to conserve wear and tear on their pitchers’s arms.  Sports medicine wasn’t as sophisticated back in the 1950s.

One would expect more complete games as Tier combos go negative where starters exceed lineups and that’s what we’re seeing.  The 0 Tier combo row represents even steven between lineup and starter.  One would expect that percentage should come close to the overall average and it does.  It’s a little off in modern baseball but that could be due to smaller sample size.

There is a table like this for each inning but the 9th is most interesting.  It is still a work in progress resolving 1950s data with modern baseball.  The more data we have the more accurate the simulation will be.

That’s all for this tidbit into the simulator.  This model cannot start simulating until May when there’s at least a month of baseball in the books and there’s still some problems with that. More on that later.

I had planned to cover spring training but decided to wait until opening day when we’ll do playoff horse race based upon 3 year splits of current roster.  Then we’ll cover new guys for both White Sox and Cubs.   Since the Cubs may not be on TV at my local tavern or any tavern around here we may be forced to follow the White Sox this season.  We’ll see.   Until then ….

Opening Day 4/21/1950

I ran across this ticket stub after going through some of my dad’s stuff he threw in the top dresser drawer and forgot about if for over half a century.  Apparently dad went to a Cubs home opener at Wrigley on 4/21/1950.  On the back he wrote down the score and the Cubs  starting pitcher.

This post will do a data dump look see into this game.

cubsticket

Looks like the Feds put a 21 cent tax on a $1.25 ticket.  No state or city amusement tax yet back then.  This is the second of only 5 games Cubs played in April 1950.  Aprils can be cold in Chicago and elsewhere.  Since we’re at the beginning of the season rankings, WAA calculation and tiering cannot be done so there’s not much to see in terms of handicapping this game.

GAME 195004210 SLN CHN

TeamID Line Score Runs TB Hits E
AWAY SLN 000000000 0 7 4 0
HOME CHN 00010100 2 7 3 1

Total Bases (TB) provides context to the Hits column.  Although Cubs had one less hit than Cardinals, both teams had equal total bases which could have been a factor in Cubs scoring those two runs.  The clickable web site will allow for navigation into events for this game.

STARTERS 195004210

Rank WAA Name TeamID Tier
XXXXX 0 Harry_Brecheen SLN 0
XXXXX 0 Bob_Rush CHN 0

As my dad wrote on the back of his ticket stub, Bob Rush started this game for the Cubs with the Cardinals starting the veteran Harry Brecheen.  Right now, at the start of the season, there isn’t enough data to evaluate these two pitchers.  Since we’re from the future however let’s look at their careers.

Bob Rush

Year Rank WAA TeamID Pos
1948 XXXXX 0.78 CHN PITCH
1949 XXXXX 0.48 CHN PITCH
1950 +049+ 3.88 CHN PITCH
1951 XXXXX 1.05 CHN PITCH
1952 +015+ 5.9 CHN PITCH
1953 XXXXX -1.32 CHN PITCH
1954 XXXXX 0.84 CHN PITCH
1955 +057+ 2.69 CHN PITCH
1956 +038+ 4.16 CHN PITCH
1957 -025- -2.73 CHN PITCH
1958 +096+ 1.53 ATL PITCH
1959 +055+ 3.42 ATL PITCH
1960 XXXXX -0.76 TOT PITCH
1960 XXXXX -0.13 ATL PITCH
1960 XXXXX -0.61 CHA PITCH
TOTAL X 19.18 PITCH

Pretty decent career with his best year 1952.  The 1950 season will be his career year when that season ends.

Harry Brecheen

Year Rank WAA TeamID Pos
1940 XXXXX 0.32 SLN PITCH
1943 +045+ 3.19 SLN PITCH
1944 +064+ 2.6 SLN PITCH
1945 +040+ 3.59 SLN PITCH
1946 +019+ 5.06 SLN PITCH
1947 +060+ 2.73 SLN PITCH
1948 +005+ 10.29 SLN PITCH
1949 +054+ 3.38 SLN PITCH
1950 +076+ 2.33 SLN PITCH
1951 +070+ 2.58 SLN PITCH
1952 XXXXX 1.07 SLN PITCH
1953 +061+ 2.88 BAL PITCH
TOTAL X 40.02 PITCH

Brecheen had a better career with his best year in 1948.

HITTERS SLN 195004210

Rank WAA Name Pos PA R RBI TB H W
XXXXX 0 Solly_Hemus X 4 0 0 0 0 0
XXXXX 0 Red_Schoendienst 2B-SS 4 0 0 2 1 0
XXXXX 0 Stan_Musial OF-1B-LF-CF-RF 4 0 0 1 0 0
XXXXX 0 Enos_Slaughter OF-RF-LF 4 0 0 1 1 1
XXXXX 0 Joe_Garagiola CR 4 0 0 0 0 1
XXXXX 0 Rocky_Nelson 1B 4 0 0 3 1 0
XXXXX 0 Harry_Walker OF-CF 4 0 0 0 0 0
XXXXX 0 Eddie_Miller SS 3 0 0 0 0 1
XXXXX 0 Harry_Brecheen P 3 0 0 0 0 1
XXXXX 0 Bill_Howerton OF-CF-LF-RF 1 0 0 0 0 0
TOTAL X X X 35 0 0 7 3 4

PITCHERS SLN 195004210

Rank WAA Name Outs PA R ER TB H W SO
XXXXX 0 Harry_Brecheen 24 28 2 2 7 3 2 7
TOTAL X X 24 28 2 2 7 3 2 7

HITTERS CHN 195004210

Rank WAA Name Pos PA R RBI TB H W
XXXXX 0 Wayne_Terwilliger 2B 4 1 0 3 1 0
XXXXX 0 Hal_Jeffcoat OF-RF-LF 3 1 0 3 1 0
XXXXX 0 Preston_Ward 1B 3 0 1 1 1 0
XXXXX 0 Hank_Sauer LF-OF-1B 3 0 1 0 0 0
XXXXX 0 Andy_Pafko OF-CF-RF 3 0 0 0 0 1
XXXXX 0 Bill_Serena 3B 3 0 0 0 0 0
XXXXX 0 Roy_Smalley SS 3 0 0 0 0 1
XXXXX 0 Mickey_Owen CR 3 0 0 0 0 0
XXXXX 0 Bob_Rush X 3 0 0 0 0 0
TOTAL X X X 28 2 2 7 3 2

The baseball-handbook.com web site will allow for one click navigation to any of these players.

PITCHERS CHN 195004210

Rank WAA Name Outs PA R ER TB H W SO
XXXXX 0 Bob_Rush 27 35 0 0 7 4 4 5
TOTAL X X 27 35 0 0 7 4 4 5

Update:  There’s a discrepancy in hits given up by Rush (4) and the number of hits SLN made (3).  The game logs show 4 but reading the event data I only see 3.  Box scores are derived from event data.  Stan Musial hit what they call a single bunt after a double where the runner on second got thrown out at third.  I don’t think that’s considered a hit but the scorekeeper might have.    Doesn’t really matter in the grand scheme.

Starters for both teams pitched complete games which was normal back then.  This reduces the importance of relief during this era in baseball.  More on that later.   Here’s how the 1950 season ended for the Cubs and Cardinals.

NL 1950

Tm W L BAT PITCH UR
PHI 91 63 -30.1 119.1 7.8
LAN 89 65 82.9 4.0 22.8
SFN 86 68 -18.1 98.1 9.8
ATL 83 71 27.9 28.2 -13.2
SLN 78 75 -58.1 67.0 13.8
CIN 66 87 -92.1 14.0 2.8
CHN 64 89 -98.6 13.1 -34.2
PIT 57 96 -70.1 -89.1 -17.2

Both teams bottom of the pack which will become a common place for the Cubs to be for the next almost two decades.  We’re sill using the three letter franchise codes.  In 1950 LAN was Brooklyn Dodgers, ATL Boston Braves, and SFN was New York Giants.  Franchise codes are used internally in the database but official names could be used for display purposes.

That’s all for now.  The above is a work in progress presentation of a game data dump.  Spring training new guys post coming soon as I hear they started already.  Until then .,…

2008 Chicago Cubs

It has been over a month since the last post.  Now that NFL is done pitchers and catchers report to Spring training soon.  When pre season starts we’ll take a look at the new guys on both Cubs and White Sox.   Not a big fan of following pre-season but this model has minor leagues covered back to A+ league.

The last month was spent moving this entire data model into mysql and sqlite databases for the baseball-handbook.com site which will allow people to explore any player, any team, any season since the year 1900 like this log book covers the Cubs.  Many scripts had to be rewritten to do sql lookups.   The prototype web interface coming in April will require those scripts  to be rewritten again in php and then in java for the app.

We’re also getting more detailed box scores on a daily basis for current season  from mlb.com which will be used to better estimate rosters and hopefully have a more accurate evaluation of relief.  We know exactly who is starting and the lineup for each game.  Relief however relies on our source for rosters which can lag a day or two.   More on this later.

In December the 1919 World Series was covered  day by day giving me a chance to test and improve scripts that query the post season data set and to test various formats to improve presentation.  The purpose behind baseball handbook is to allow users to easily navigate players, teams, and seasons while not overloading them with lots of unnecessary and sometimes deceptive numbers.

It took 9 days to cover the 1919 World Series.  Today we’ll cover the entire 2008 divisional series between LAN and CHN — which only lasted three games.   What made this series memorable to me was what happened in the 5th inning of game 1 with the Cubs up 2-0.  Here is an event dump of that inning with Ryan Dempster on the mound.

Inning Code Teamid playerid Count Pitches Play String Event
5:0:1:9:1 LAN lowed001 22 BCFBX 63/G OUT
5:0:2:1:1 LAN furcr001 31 BBBCB W WALK
5:0:3:2:2 LAN martr004 31 1BBBCX 9/F OUT
5:0:4:3:2 LAN ramim002 32 FSBFBB>B W.1-2 WALK
5:0:5:4:2 LAN ethia001 31 *BFBBB W.2-3;1-2 WALK
5:0:6:5:2 LAN lonej001 12 SSFBX HR/8/F.3-H;2-H;1-H HOME_RUN

The above presentation is also a work in progress.  Playerid is a retrosheet.org id which represents a key to a batter.  The player lonej001, who hit this grand slam to put Dodgers up 4-2 with a single swing of a bat,  is one Jim Loney, Dodgers’ first baseman.

I listened to Pat Hughes and Ron Santo call this inning.  Santo starts to groan after the third walk like he usually did when the Cubs faltered.  When Loney hit that grand slam my radio went dead silent for a very long time, long enough where I had to check to see if it was still on.  This meant not only Pat and Ron were speechless, the entire crowd at Wrigley was also.

At this moment in the 5th inning of Game 1 only down 2 runs, we knew the Cubs weren’t going to beat the Dodgers even after finishing 2008 with the best record in National League.   Most Cubs fans have been through this before and know the script — which played out exactly as expected.

Let’s drill down into this series because seasonal numbers like wins/losses and run differential can be deceptive.  Although the MLB commish places teams in post season based on team WAA (wins – loss), after that he makes them play each other.

2008 CHN Monthly

Date WAA BAT PITCH UR
20080501 6 42.8 3.8 0.9
20080601 15 63.9 25.3 -0.6
20080701 16 70.3 21.3 4.6
20080801 20 74.2 43.4 5.6
20080901 32 105.2 66.2 11.2
2008 33 97.6 69 12.8

The above shows Cubs had a tremendous +12 August then flat lined through the month of September finishing 97-34.  Both BAT and PITCH near top of MLB good. These are seasonal numbers however.  Here are the Dodgers.

2008 LAN Monthly

Date WAA BAT PITCH UR
20080501 2 12.8 19.8 -1.1
20080601 -2 -11.1 17.3 -2.6
20080701 -5 -42.7 39.3 1.6
20080801 -1 -52.8 66.4 3.6
20080901 -2 -65.6 70.2 1.2
2008 6 -52.3 102 3.8

Dodgers under water with BAT but extremely good PITCH.  Somehow they win the NL West with only 84 wins by going +8 in September.  It helped they acquired this guy from Boston.

Manny Ramirez 2008

DateID Rank WAA Teamid
20080501 +044+ 1.32 BOS
20080601 +032+ 2.33 BOS
20080701 +029+ 3.13 BOS
20080801 +023+ 4.22 LAN
20080901 +015+ 5.90 LAN
2008 +010+ 7.46 LAN

I don’t remember why Boston sent Ramirez to LA since they were also playoff contenders.  He was a big reason LAN went +8 in September and why the Cubs lost in three games.  I recall watching Ramirez run around the bases in LA with a genuine smile on his face like a kid having fun playing baseball in little league.

2008 Playoff Horse Race

TeamID W-L Total Hitters Starters Relief UR
BOS 28 46.06 21.31 12.22 12.53 9.5
CHN 33 43.26 21.09 16.06 6.11 11.5
PHI 22 35.55 15.92 6.55 13.08 3.5
ANA 38 32.4 10.2 10.42 11.78 5.5
LAN 6 28.94 8.01 13.31 7.62 1.5
TBA 32 22.78 1.09 8.6 13.09 5.5
CHA 15 21.37 9.66 7.26 4.45 -12.5
MIL 18 14.42 3.83 8.56 2.03 -7.5

Playoff Horse Race tables are sorted by the Total value of a team’s roster based upon this data model.  The W-L column is their real team WAA (wins – losses).  Although Anaheim had the best record in baseball, Boston had the best set of hitters, starters, and relief.  Boston takes care of Anaheim in 4 and then loses to Tampa in 7 games.  After the Dodgers beat Cubs they lose to the Phillies who end up winning the World Series.  Had the Phillies played Boston the outcome of that World Series could have been different.

Below are game summaries for the three games Cubs lost to Dodgers in 2008 divisional series (dv).  Presentation of this is still a work in progress but most of the elements for this level of report are present.  L, S, and R columns show tier numbers for Lineup, Starter, and Relief.

Relief is constant throughout a series for each team and lineups are almost constant.  Cubs lineup went from tier 2.41 in the first game to almost tier 3 in the third game.  Lineups change because managers start different players for various reasons.  Dodgers lineup below varied from tier 1.3 to tier 1.4 which isn’t much.  LAN relief was around 1/2 tier better which would be consistent with their excellent PITCH shown in team monthly above.

Every tier = 2 is one complete standard deviation above league average.  League averages are based on end of August rosters for that year using end of year data.  Expansion in September cause shifts away from a true league average.

Cubs fielded a better lineup each day and a better starter in games 1 and 3.  With Zambrano on the mount in game 2 Dodgers might have been favored that game.  Tier numbers are entered into simulation but the simulator for post season is different from regular season and hasn’t been completed yet.

GAME 1 dv 20081001 — LAN CHN

Teamid L S R Line Runs Starter
LAN 1.39 2.79 1.26 000040111 7 Derek_Lowe
CHN 2.41 3.49 0.78 020000000 2 Ryan_Dempster

GAME 2 dv 20081002 — LAN CHN

Teamid L S R Line Runs Starter
LAN 1.3 2.93 1.26 050010121 10 Chad_Billingsley
CHN 2.79 0.74 0.78 000000102 3 Carlos_Zambrano

GAME 3 dv 20081004 — CHN LAN

Teamid L S R Line Runs Starter
CHN 2.95 4.27 0.78 000000010 1 Rich_Harden
LAN 1.33 1.18 1.26 20001000 3 Hiroki_Kuroda
end of pogames

That is all for this chapter in post season history.  More coming soon.  Until then ….

1919 World Series Part 10

Epilogue

The series is over and the favored White Sox lost.  The Mafia who bet the entire series collected their winnings while players in on the fix were hung out to dry.  This post will cover post season totals for White Sox players throughout the 1919 World Series and then some background into these players.

CHA Hitters

Rank WAA Name_TeamID PA RP OBP Pos
+010+ 6.57 Shoeless_Joe_Jackson_CHA 33 11 0.364 OF-LF-RF
+053+ 2.65 Chick_Gandil_CHA 31 6 0.258 1B
XXXXX 0.42 Ray_Schalk_CHA 28 5 0.393 CR
+021+ 4.98 Happy_Felsch_CHA 31 5 0.161 CF-OF
+020+ 5.00 Buck_Weaver_CHA 34 4 0.324 3B-SS
XXXXX -0.08 Swede_Risberg_CHA 30 3 0.233 SS-1B
+017+ 5.17 Eddie_Collins_CHA 35 3 0.257 2B
XXXXX 0.06 Shano_Collins_CHA 16 2 0.250 OF-RF
XXXXX 1.32 Fred_McMullin_CHA 2 0 0.500 3B
XXXXX 0.94 Nemo_Leibold_CHA 20 0 0.150 OF-RF-LF
XXXXX -0.59 Byrd_Lynn_CHA 1 0 0.000 CR
XXXXX 0.53 Eddie_Murphy_CHA 3 0 0.333 BAT

Game box scores typically involve a matrix of numbers which are mostly 0s.  The box score format this model now uses shows Plate Appearance which is a measure of time, Runs and RBIs, a measure of run production, Total Bases, Hits, and Walks, a measure of hitting.  Measures of hitting are game stats, measures of runs are value stats.  Runs win baseball games, hits can score runs with “can” being the key word.

OBP was chosen to display over batting average because it uses plate appearances  which is the measure of time for this data model and it incorporates Walks.  Batting averages are displayed in long form player reports taken from official sources.  Total Bases is used to calculate Slugging Percentage and the dreaded OPS which this model shuns as even a game stat.  Total Bases is useful to display in box scores however and it’s used as a basis for calculating run creation which is fodder for another post this off season.

The above is a consolidated summary table for the entire series sorted by the Run Production (RP) column from highest to lowest.  Run Production is simply Runs + RBIs — not too complicated.  RP is the foundation for calculating WAA.

Shoeless Joe had the highest Run Production of any White Sox player during the 1919 World Series with 11.   Five of those runs came in Game 8 which was a lost cause after the first inning.   Chick Gandil, the organizer of the fix for players comes in at #2.  Although Buck Weaver  hit 0.324 (he had no walks so his OBP=BA) as he claimed in the movie, he had very poor run production.

Top of the order guys like Shano and Eddie Collins, both not in on the fix, hit poorly and had very low run production.  Let’s look at errors and the unearned runs given up because of them.

CHA Errors

Name_TeamID Errors URuns
Happy_Felsch_CHA 2 0
Eddie_Cicotte_CHA 2 2
Swede_Risberg_CHA 4 1
Eddie_Collins_CHA 2 1
Ray_Schalk_CHA 1 1
Chick_Gandil_CHA 1 1
TOTAL 12 6

The White Sox gave up 6 runs over 12 errors.  Historically there are around 2 errors per unearned run.  In 1919 teams gave up an average of 113 unearned runs per team over 140 games or around 0.8 unearned runs per game.  That ratio was around 60/team in 2019 so there were about twice as many errors committed then than now.  Giving up six unearned runs as a team over 8 games is virtually league average and nothing out of the ordinary.  Let’s see what the Reds’ error numbers for this series look like.

CIN Errors

Name_TeamID Errors URuns
Larry_Kopf_CIN 1 1
Morrie_Rath_CIN 2 2
Heinie_Groh_CIN 3 2
Bill_Rariden_CIN 1 0
Edd_Roush_CIN 2 1
Ray_Fisher_CIN 1 1
Jake_Daubert_CIN 2 0
Greasy_Neale_CIN 1 0
TOTAL 13 7

They look about the same even though Reds had a much better Unearned Runs above average than White Sox in regular season.

CHA Pitchers

Rank WAA Name_TeamID Outs PA ER R SO ERA
+004+ 8.76 Eddie_Cicotte_CHA 65 87 7 9 7 2.91
XXXXX 0.80 Dickey_Kerr_CHA 57 72 3 4 6 1.42
+047+ 2.81 Lefty_Williams_CHA 49 66 12 12 4 6.61
XXXXX 0.50 Roy_Wilkinson_CHA 22 34 3 4 3 3.68
-052- -1.76 Bill_James_CHA 14 23 3 4 2 5.79
XXXXX 1.22 Grover_Lowdermilk_CHA 3 7 1 1 0 9.00
-001- -5.82 Erskine_Mayer_CHA 3 5 0 1 0 0.00

Cicotte only threw one bad game which was Game 1.  He signaled to the mafia guys the fix was on by hitting the first batter which he did according to retrosheet event data.  Dickey Kerr pitched above his weight and Lefty Williams was the guy who blew three games in a row almost on his own.

Reds still needed 5 to win.  According to the movie the mob guys started stiffing players when they won Game 3  after an awesome pitching performance by Dickey Kerr, the only CHA starter  not in on the fix and the pitcher most likely to lose on his own.

None of the above numbers are unusual.   Nothing here proves or disproves a rigged series or who is in on the fix  just by looking at them.  When you get 8 players in on a conspiracy along with all the handlers funneling money to them it’s a  virtual impossibility this could be kept secret for long — especially after players in on the fix got ripped off by the mob guys just like Charles Comiskey ripped them off over their pennant bonus.

Based upon Eddie Collins’ poor performance one could suspect him being in on it.  It would seem impossible the other players didn’t know — especially catcher Ray Shaik.  In the end, according to the movie, the new commish Kenesaw Mountain Landis proclaimed just knowing there was a conspiracy and not reporting it was punishable.   Buck Weaver may not have taken money but he did nothing to stop it.

It is possible that after CHA won Game 5 and 6 they might have wanted to win out to double cross the mob guys screwing them.  The movie kind of suggested that.  Had Williams not totally blown Game 8 in the first inning the White Sox should have won that with 5 runs and Reds’ weak lineup.  Dickey Kerr would have pitched Game 9 with CHA’s poor relief to back him up.  Eyeballing the handicapping; that game could have been around a 50/50, even steven event.  The mob guys got mad after CHA won a few games, imagine how mad they would be if they lost the series after Cicotte signaled the fix was on with the first CIN batter in Game 1.

The conspiracy didn’t really unfold  until almost the end of the 1920 season.  Comiskey suspended Gandil for that season on his own which may have cost White Sox another World Series appearance.  The rest of the 8 played one more year before being banished from baseball for good.  Let’s look at some careers.

Eddie_Cicotte

Year Rank WAA TeamID Pos
1905 XXXXX -0.27 DET PITCH
1908 XXXXX -0.19 BOS PITCH
1909 +047+ 2.44 BOS PITCH
1910 XXXXX -0.15 BOS PITCH
1911 +046+ 3.11 BOS PITCH
1912 XXXXX -2.46 BOS PITCH
1912 XXXXX 1.49 CHA PITCH
1913 +004+ 9.55 CHA PITCH
1914 +023+ 5.42 CHA PITCH
1915 XXXXX -0.10 CHA PITCH
1916 +027+ 4.05 CHA PITCH
1917 +002+ 9.03 CHA PITCH
1918 XXXXX 0.32 CHA PITCH
1919 +004+ 8.76 CHA PITCH
1920 XXXXX 1.45 CHA PITCH
Total 42.45 1247

Cicotte had a mediocre last season in 1920.  The movie mentioned outlawing the spit ball in 1920  which presumably was a pitch Cicotte relied upon.  He had three tremendous seasons in 1913, 1917, and 1919.  White Sox played the Giants in 1917 in the World Series.  15 years is a decent career.  His total WAA probably doesn’t get him into HOF had he not been banished.

Shoeless_Joe_Jackson

Year Rank WAA TeamID Pos
1908 XXXXX -0.08 OAK BAT
1909 XXXXX 0.29 OAK BAT
1910 XXXXX 1.11 CLE OF
1911 +010+ 7.31 CLE OF
1912 +011+ 6.99 CLE OF
1913 +014+ 5.80 CLE OF
1914 +111+ 1.62 CLE OF-RF-CF
1915 +032+ 2.10 CLE OF-RF-1B
1915 +032+ 2.04 CHA OF-CF-LF
1916 +013+ 5.56 CHA OF-LF-RF
1917 +008+ 5.94 CHA OF-LF-RF
1918 +067+ 1.57 CHA OF-LF
1919 +010+ 6.57 CHA OF-LF-RF
1920 +005+ 9.53 CHA OF-LF
Total 56.35 1887

Shoeless Joe had a career year in 1920 and his total WAA puts him borderline into HOF.  Ron Santo has a career WAA of 56.53 and he just barely made it in.  Had Jackson played a bunch more years he probably would be clearly eligible according to this data model but that’s all water under the bridge now.  Even if they remove his banishment MLB can’t put him in for value he would have gained had he played several more years.

Lefty_Williams

Year Rank WAA TeamID Pos
1913 XXXXX -1.26 DET PITCH
1916 XXXXX -0.94 CHA PITCH
1917 -047- -1.74 CHA PITCH
1918 XXXXX 0.25 CHA PITCH
1919 +047+ 2.81 CHA PITCH
1920 -027- -3.09 CHA PITCH
Total -3.91 -173

Lefty Williams had a terrible 1920.  He was probably traumatized over being threatened by mob guys and standing out as the most visible among the eight after single handedly  blowing 3 World Series games.  White Sox fans probably weren’t very kind to him at home games.  1919 was his career year and it wasn’t that great either.

Buck_Weaver

Year Rank WAA TeamID Pos
1912 -032- -2.37 CHA SS
1913 -051- -1.83 CHA SS
1914 -054- -2.18 CHA SS
1915 XXXXX 0.69 CHA SS
1916 XXXXX -0.19 CHA 3B-SS
1917 XXXXX 0.61 CHA 3B-SS
1918 -043- -1.68 CHA SS-3B
1919 +020+ 5.00 CHA 3B-SS
1920 +043+ 3.36 CHA 3B-SS
Total 1.41 -283

Buck Weaver had a pretty average career with 1919 and 1920 being his best.  We saw above how Buck had a Run Production (R+RBI) of 4 during the 1919 World Series, one of the lowest of the CHA lineup.  He also had a low RP of 4 in 1917 when White Sox played the Giants in that  World Series.  By the time he figured out how to play baseball he got banished.  He wouldn’t have been HOF material but he could have helped the White Sox during the 1920s.

Comiskey suspended Chick Gandil for 1920.  Swede Risberg and Fred McMullen had their short careers ended after all 8 got banished.

That’s all for the 1919 World Series.  It was interesting to do a game by game analysis of this series because it is the only major fixing scandal in over a century of US professional sports.  The numbers we examined during this series don’t prove or disprove this series was fixed.  The only reason we know is because the probability of a leak in a conspiracy is proportional to the number of people involved in that conspiracy.   Now that we know the picture became clear.  And they made a movie and wrote books about it.  It’s also prominently featured in Ken Burns’ 9 part documentary on baseball — along with Disco Demolition. :-)

Probably won’t do a game by game playoff series again but one offs of certain high profile  games might be interesting.  Next up minor league compilation and a look see into who the Cubs and White Sox have (or don’t have) waiting in the wings for next Spring.  Until then ….

1919 World Series Part 9

I Netflixed the movie Eight Men Out on 4/17/2009 according to their records and finally watched it again before making this post about the 8th and last game of the 1919 World Series.  The purpose of this exercise was to integrate regular season data with post season and automate generating these reports accurately for every playoff series and  every game played in baseball since 1919.  I wanted to interpret this data  day by day without outside influence from movie script writers who often take artistic liberties with a story.

After watching Eight Men Out again the lens from which I see  data from these games has changed.

My assumption was that White Sox players may have been coy  providing cover for the fix by winning 3 games.  According to the movie they weren’t supposed to win Game 3. The mafia guys didn’t care about the players, they were betting Reds every game.  Losing Game 3 cost them money as well as possibly games 6 and 7.  The series is now 4-3 and the White Sox need to win the next two.  The mafia needs them to lose.   Let’s look at Game 8, the final game of the 1919 World Series.

CIN CHA 191910090

WAA Vegas TC Sim EV L S R
CIN 52 x X X 0.62 1.61 1.93
CHA 36 x X X 4.08 1.10 -2.34

The handicapping report says this game was played in Chicago which meant another 9 hour sleep on the train trip from Cincinnati to Chicago for both teams.

Starters WAA WinPct IP Tier
Hod_Eller_CIN 3.65 0.566 248.3 1.61
Lefty_Williams_CHA 2.81 0.543 297 1.10

Hod Eller must face a Tier 4 CHA lineup giving White Sox an advantage.  Lefty Williams is pretty much even steven with Reds’ relatively weak lineup.  Under normal circumstances White Sox would again be favored, but not as heavily as when Cicotte pitches.

According to the movie the mob guys threatened to kill Lefty’s wife if he didn’t throw this game which is a pretty big incentive to do what they say.

1 2 3 4 5 6 7 8 9 Score
CIN 4 1 0 0 1 3 0 1 0 10
CHA 0 0 1 0 0 0 0 4 0 5

And Lefty gives up 4 runs in the first and gets pulled and in comes someone from the CHA Tier -2.34 Relief staff, not in the fix,  in the first inning who allow Reds’ weak lineup to score another 6 runs.  CHA scores 5 and according to the movie Shoeless Joe and Buck Weaver may not have been “in on the fix.”  The movie suggests Weaver never received any money even though all the players in on the fix got ripped off by the mafia guys.  ProTip: Don’t make deals with the mafia.

CHA Pitchers

Rank WAA Name_TeamID Outs PA ER R SO
+047+ 2.81 Lefty_Williams_CHA 1 5 4 4 0 SP
-052- -1.76 Bill_James_CHA 14 23 3 4 2 RP
XXXXX 0.50 Roy_Wilkinson_CHA 12 21 2 2 2 RP
TOTAL X X 27 49 9 10 4

Williams got one out, faced 5 players and gave up 4 unearned runs which pretty much meant game over in the first inning.  Then comes the seldom seen CHA relief staff, the dark underbelly of this great White Sox team.  Bill James is ranked #52 in the bottom 100 and gives up 3 earned runs, 1 unearned.  Wilkinson comes in and pitches the final 4 innings giving up 2.  Finally Reds bats faced pitchers they could hit which won them the NL pennant.

The unearned run comes from catcher Ray Schaik, not in on the fix.  His was the only error CHA made in this game.

CHA Lineup

Rank WAA Name_TeamID PA R RBI TB H W Pos
XXXXX 0.94 1 Nemo_Leibold_CHA 5 0 0 1 1 0 OF-RF-LF
+017+ 5.17 2 Eddie_Collins_CHA 5 1 0 4 3 0 2B
+020+ 5.00 3 Buck_Weaver_CHA 5 1 0 3 2 0 3B-SS
+010+ 6.57 4 Shoeless_Joe_Jackson_CHA 5 2 3 6 2 0 OF-LF-RF
+021+ 4.98 5 Happy_Felsch_CHA 4 0 0 0 0 0 CF-OF
+053+ 2.65 6 Chick_Gandil_CHA 4 1 1 3 1 0 1B
XXXXX -0.08 7 Swede_Risberg_CHA 4 0 0 0 0 1 SS-1B
XXXXX 0.42 8 Ray_Schalk_CHA 4 0 0 1 1 0 CR
XXXXX 0.53 10 Eddie_Murphy_CHA 1 0 0 0 0 1 BAT
XXXXX -0.27 11 Bill_James_CHA 2 0 0 0 0 0 PITCH
XXXXX 0.15 12 Roy_Wilkinson_CHA 1 0 0 0 0 0 PITCH
TOTAL 25.65 X TIER=4.11 40 5 4 18 10 2 X

Shoeless Joe had a fantastic day driving in 3 and scoring twice with a solo home run in the third.  Buck Weaver went 2/5 scoring once.  The movie suggested both these players did not lay down. Even Chick Gandil, the alleged organizer of the fix, scored and drove in a run but by the 8th inning they were so far behind it didn’t matter.

The White Sox scored enough runs to win this game had an untainted Lefty Williams pitched to match his regular season performance.

We’ll skip Cincinnati’s box score since they racked up their runs on White Sox extremely poor relief.  Relief wasn’t very important in that era of baseball and only became important around 1980 but that’s fodder for another analysis.  In modern baseball teams typically carry 8 relievers and they all work.  In this World Series there are probably guys sitting on the bench who never played.

That is all for the 1919 World Series; a series that launched a lot of changes in baseball.  Tomorrow an epilogue will be posted looking at player totals and more reflections on the movie about this series.  Until then …