2008 Chicago Cubs

It has been over a month since the last post.  Now that NFL is done pitchers and catchers report to Spring training soon.  When pre season starts we’ll take a look at the new guys on both Cubs and White Sox.   Not a big fan of following pre-season but this model has minor leagues covered back to A+ league.

The last month was spent moving this entire data model into mysql and sqlite databases for the baseball-handbook.com site which will allow people to explore any player, any team, any season since the year 1900 like this log book covers the Cubs.  Many scripts had to be rewritten to do sql lookups.   The prototype web interface coming in April will require those scripts  to be rewritten again in php and then in java for the app.

We’re also getting more detailed box scores on a daily basis for current season  from mlb.com which will be used to better estimate rosters and hopefully have a more accurate evaluation of relief.  We know exactly who is starting and the lineup for each game.  Relief however relies on our source for rosters which can lag a day or two.   More on this later.

In December the 1919 World Series was covered  day by day giving me a chance to test and improve scripts that query the post season data set and to test various formats to improve presentation.  The purpose behind baseball handbook is to allow users to easily navigate players, teams, and seasons while not overloading them with lots of unnecessary and sometimes deceptive numbers.

It took 9 days to cover the 1919 World Series.  Today we’ll cover the entire 2008 divisional series between LAN and CHN — which only lasted three games.   What made this series memorable to me was what happened in the 5th inning of game 1 with the Cubs up 2-0.  Here is an event dump of that inning with Ryan Dempster on the mound.

Inning Code Teamid playerid Count Pitches Play String Event
5:0:1:9:1 LAN lowed001 22 BCFBX 63/G OUT
5:0:2:1:1 LAN furcr001 31 BBBCB W WALK
5:0:3:2:2 LAN martr004 31 1BBBCX 9/F OUT
5:0:4:3:2 LAN ramim002 32 FSBFBB>B W.1-2 WALK
5:0:5:4:2 LAN ethia001 31 *BFBBB W.2-3;1-2 WALK
5:0:6:5:2 LAN lonej001 12 SSFBX HR/8/F.3-H;2-H;1-H HOME_RUN

The above presentation is also a work in progress.  Playerid is a retrosheet.org id which represents a key to a batter.  The player lonej001, who hit this grand slam to put Dodgers up 4-2 with a single swing of a bat,  is one Jim Loney, Dodgers’ first baseman.

I listened to Pat Hughes and Ron Santo call this inning.  Santo starts to groan after the third walk like he usually did when the Cubs faltered.  When Loney hit that grand slam my radio went dead silent for a very long time, long enough where I had to check to see if it was still on.  This meant not only Pat and Ron were speechless, the entire crowd at Wrigley was also.

At this moment in the 5th inning of Game 1 only down 2 runs, we knew the Cubs weren’t going to beat the Dodgers even after finishing 2008 with the best record in National League.   Most Cubs fans have been through this before and know the script — which played out exactly as expected.

Let’s drill down into this series because seasonal numbers like wins/losses and run differential can be deceptive.  Although the MLB commish places teams in post season based on team WAA (wins – loss), after that he makes them play each other.

2008 CHN Monthly

20080501 6 42.8 3.8 0.9
20080601 15 63.9 25.3 -0.6
20080701 16 70.3 21.3 4.6
20080801 20 74.2 43.4 5.6
20080901 32 105.2 66.2 11.2
2008 33 97.6 69 12.8

The above shows Cubs had a tremendous +12 August then flat lined through the month of September finishing 97-34.  Both BAT and PITCH near top of MLB good. These are seasonal numbers however.  Here are the Dodgers.

2008 LAN Monthly

20080501 2 12.8 19.8 -1.1
20080601 -2 -11.1 17.3 -2.6
20080701 -5 -42.7 39.3 1.6
20080801 -1 -52.8 66.4 3.6
20080901 -2 -65.6 70.2 1.2
2008 6 -52.3 102 3.8

Dodgers under water with BAT but extremely good PITCH.  Somehow they win the NL West with only 84 wins by going +8 in September.  It helped they acquired this guy from Boston.

Manny Ramirez 2008

DateID Rank WAA Teamid
20080501 +044+ 1.32 BOS
20080601 +032+ 2.33 BOS
20080701 +029+ 3.13 BOS
20080801 +023+ 4.22 LAN
20080901 +015+ 5.90 LAN
2008 +010+ 7.46 LAN

I don’t remember why Boston sent Ramirez to LA since they were also playoff contenders.  He was a big reason LAN went +8 in September and why the Cubs lost in three games.  I recall watching Ramirez run around the bases in LA with a genuine smile on his face like a kid having fun playing baseball in little league.

2008 Playoff Horse Race

TeamID W-L Total Hitters Starters Relief UR
BOS 28 46.06 21.31 12.22 12.53 9.5
CHN 33 43.26 21.09 16.06 6.11 11.5
PHI 22 35.55 15.92 6.55 13.08 3.5
ANA 38 32.4 10.2 10.42 11.78 5.5
LAN 6 28.94 8.01 13.31 7.62 1.5
TBA 32 22.78 1.09 8.6 13.09 5.5
CHA 15 21.37 9.66 7.26 4.45 -12.5
MIL 18 14.42 3.83 8.56 2.03 -7.5

Playoff Horse Race tables are sorted by the Total value of a team’s roster based upon this data model.  The W-L column is their real team WAA (wins – losses).  Although Anaheim had the best record in baseball, Boston had the best set of hitters, starters, and relief.  Boston takes care of Anaheim in 4 and then loses to Tampa in 7 games.  After the Dodgers beat Cubs they lose to the Phillies who end up winning the World Series.  Had the Phillies played Boston the outcome of that World Series could have been different.

Below are game summaries for the three games Cubs lost to Dodgers in 2008 divisional series (dv).  Presentation of this is still a work in progress but most of the elements for this level of report are present.  L, S, and R columns show tier numbers for Lineup, Starter, and Relief.

Relief is constant throughout a series for each team and lineups are almost constant.  Cubs lineup went from tier 2.41 in the first game to almost tier 3 in the third game.  Lineups change because managers start different players for various reasons.  Dodgers lineup below varied from tier 1.3 to tier 1.4 which isn’t much.  LAN relief was around 1/2 tier better which would be consistent with their excellent PITCH shown in team monthly above.

Every tier = 2 is one complete standard deviation above league average.  League averages are based on end of August rosters for that year using end of year data.  Expansion in September cause shifts away from a true league average.

Cubs fielded a better lineup each day and a better starter in games 1 and 3.  With Zambrano on the mount in game 2 Dodgers might have been favored that game.  Tier numbers are entered into simulation but the simulator for post season is different from regular season and hasn’t been completed yet.

GAME 1 dv 20081001 — LAN CHN

Teamid L S R Line Runs Starter
LAN 1.39 2.79 1.26 000040111 7 Derek_Lowe
CHN 2.41 3.49 0.78 020000000 2 Ryan_Dempster

GAME 2 dv 20081002 — LAN CHN

Teamid L S R Line Runs Starter
LAN 1.3 2.93 1.26 050010121 10 Chad_Billingsley
CHN 2.79 0.74 0.78 000000102 3 Carlos_Zambrano

GAME 3 dv 20081004 — CHN LAN

Teamid L S R Line Runs Starter
CHN 2.95 4.27 0.78 000000010 1 Rich_Harden
LAN 1.33 1.18 1.26 20001000 3 Hiroki_Kuroda
end of pogames

That is all for this chapter in post season history.  More coming soon.  Until then ….