2019 Playoff Horse Race Part 1

Normally these playoff horse race posts start around the end of August through September using current year data.  The race actually starts now at the beginning of the season and unlike a real horse race like the Kentucky Derby, the race for a playoff spot is more like a 26 mile marathon run by humans.

This post could also be Part 2 of the Prediction Racket but we’re not going to make predictions or projections.  Since we won’t have enough current year data until beginning of May we have to use career data.  It has been determined that a 3 year snapshot is a good indicator of talent and levels the playing field for young guys with veterans.  Guys like Albert Pujols who has the highest active career stats in MLB by far barely ranks in the top 200 in the 2016, 2017, 2018 snapshot.

The below playoff horse race table shows all teams sorted by actual team WAA (W-L) for the last three years which is the only stat the Commissioner of MLB cares about.  The Total column is the sum of Hitters and Pitchers.  Pitchers is the sum of Starters and Relief.   Since new guys start out at WAA=0 there is no way to project what impact they may have on each category.  This data model does not project.  These values represent the past.  There are a lot of numbers in these horse race tables so more explanation below the fold.

Team Ranking

TeamID W-L Total Hitters Pitchers Starters Relief
BOS 102 103.3 65.8 37.5 25.1 12.4
CHN 94 118.9 45.8 73.1 42.7 30.4
HOU 90 106.1 50.5 55.7 29.3 26.3
CLE 89 77.0 15.7 61.3 47.0 14.3
LAN 87 73.2 33.1 40.1 18.4 21.7
NYA 64 98.9 48.3 50.6 23.7 27.0
WAS 62 86.4 45.0 41.4 29.0 12.5
SLN 28 56.9 38.9 18.0 3.8 14.2
MIL 23 61.3 32.3 28.9 12.0 16.9
SEA 20 10.2 5.9 4.3 -4.3 8.7
COL 19 82.9 75.4 7.6 -2.0 9.6
ARI 2 9.8 -0.5 10.3 5.8 4.5
OAK -4 37.8 23.5 14.3 -14.1 28.4
TEX -6 2.1 5.2 -3.1 -6.9 3.8
TOR -10 -17.8 -10.4 -7.4 0.2 -7.6
TBA -10 19.1 1.6 17.5 13.1 4.3
PIT -14 15.1 0.9 14.2 5.0 9.2
NYN -18 58.9 8.4 50.5 31.5 19.0
ANA -18 9.5 11.9 -2.4 -7.9 5.5
ATL -25 30.2 25.2 5.0 4.9 0.1
SFN -38 27.6 -5.0 32.6 13.4 19.2
MIN -42 36.5 39.3 -2.8 -3.5 0.8
MIA -46 -28.8 -16.9 -11.8 -2.6 -9.3
KCA -48 -29.3 -25.4 -3.9 -0.6 -3.3
PHI -52 37.9 14.2 23.6 3.0 20.7
DET -57 -47.2 -9.1 -38.1 -28.9 -9.2
BAL -64 -23.5 -17.6 -5.9 -9.1 3.2
CHA -72 -8.1 -8.3 0.1 -5.0 5.1
SDN -76 21.6 6.5 15.2 3.2 12.0
CIN -80 28.5 8.3 20.2 2.2 18.0

The Cubs (CHN) had three very good consecutive years despite losing early last September putting them near the top of this list.  This and trades for high career value guys like Hamels give the Cubs the highest career Total of all 30 teams according to this data model.  This should be expected but as we saw the last three real games against Texas, high value career guys can tank just as easily as anyone.

Note: This table was made automatically based upon incoming roster data which I did not thoroughly check for accuracy — except for CHN.   If a good player is on DL his numbers won’t be part of that team’s total because he’s not on the roster.   Rosters change daily and processing this is a big part of the current year dataset.

If we were joining the Prediction Racket, the above table would make for a nice template.  Move teams around on a whim and you’ll probably be very close to being right at the end of the season.  Teams like ATL who played well last season but struggled the two before last are in the bottom half but their Total 3 year career split has risen significantly from last season.

The White Sox will probably do better than their position on this table too and CIN, the worst team in MLB over the past three years, have pretty decent WAA value.  By getting rid of Homer Baily CIN raised their Starters, Pitching, and Total numbers by over 10.  Teams get better by getting rid of bad players.

Now let’s check the above numbers by taking a look at CHN roster.   Our roster source is missing two players and not sure who they are.  Since no data can be crunched in April all the scripts that process this data flow from May to November have to be reworked.  Below are CHN Hitters, Starters, and Relief; the three categories crucial for daily simulation and estimating winning probabilities.

CHN Hitters

Rank WAA Name_TeamID Pos
+019+ 13.23 Anthony_Rizzo_CHN IF
+029+ 11.42 Javier_Baez_CHN IF
+034+ 11.03 Kris_Bryant_CHN IF
XXXXX 3.23 Ben_Zobrist_CHN IF
XXXXX 3.00 Daniel_Descalso_CHN IF
XXXXX 2.44 Kyle_Schwarber_CHN OF
XXXXX 1.01 David_Bote_CHN IF
XXXXX 0.88 Albert_Almora_CHN OF
XXXXX -0.10 Jason_Heyward_CHN OF
XXXXX -0.38 Mark_Zagunis_CHN OF
Total  45.76

The Rank is the same process used for current year.  Top 200 get ranked with a + , bottom 200 with a – , and everyone else gets XXXXX meaning unranked.  Hitters and Pitchers, AL and NL are ranked together and the Cubs have 3 guys in the above list in the top 50 which is very good.

CHN Starters

Rank WAA Name_TeamID Pos
+013+ 16.09 Kyle_Hendricks_CHN SP
+033+ 11.21 Jon_Lester_CHN SP
+108+ 6.30 Cole_Hamels_CHN SP
+120+ 5.86 Jose_Quintana_CHN SP
XXXXX 3.21 Yu_Darvish_CHN SP
Total  42.67

This is a good starting staff on paper but starting pitching can be extremely unreliable.  Hendricks and Lester are pretty solid each year but it’s hard to predict this.  No team can complain about 2 starters in the top 50 and 4 in the top 200.   The PECOTA projection system banked their troll giving the Cubs 79 wins this season by predicting these old guys will tank.   Hendricks hasn’t reached free agency yet.  Whatever.

CHN Relief

Rank WAA Name_TeamID Pos
+076+ 7.54 Steve_Cishek_CHN RP
+092+ 6.93 Mike_Montgomery_CHN RP
+099+ 6.72 Brad_Brach_CHN RP
+117+ 6.03 Pedro_Strop_CHN RP
+183+ 4.18 Carl_Edwards_CHN RP
XXXXX 2.83 Brandon_Kintzler_CHN RP
XXXXX -1.01 Randy_Rosario_CHN RP
XXXXX -2.79 Tyler_Chatwood_CHN RP
Total  30.43

Another 5 guys in the top 200 on relief staff.  If MLB games were played on paper the Cubs would be in extremely good shape.  Unfortunately the Commish makes them play the games so anything can happen.

That is all for now.  Perhaps Part 2 of this 2019 version of the playoff horse race will get posted end of May using current data.  We’ll see.  The next post will cover a topic brought up at the local pub.  Someone suggested that first pitches have a higher probability of being strikes than all the other pitches.   This can be easily proven using event data from retrosheet.org.

This season all betting opportunity will get posted which should be around 3 or 4 games per day starting in May plus the usual bi-weekly CHN team status and series analysis and more.  Presentation of game data will be different and hopefully more intuitive.  Until then ….