Category Archives: Team Ranking

2019 Playoff Horse Race Part 2

Below is a prototype of the new horse race table which ranks all 30 MLB teams by value of players currently on their roster.  Date for below was taken on 5/18.  Horse race data seems to move at a glacial pace throughout the season.

The pitchers column which added starters and relief was eliminated for simplicity.  The below table is sorted by Total column which  is the sum of values for hitters, starters, and relief according to this data model. Only top half of MLB listed.  The off WordPress site will list all 30 teams every day some time in the future.

The UR column at the end was added recently to show Unearned Runs above average which is a fielding indicator for an entire team.  A team like SEA went through some terrible defense this season and it’s so underwater I’m still wondering if it’s a bug in this model’s data flow.  Haven’t found the bug yet.

Still working on a color scheme for this table.

TeamID W-L Total Hitters Starters Relief UR
HOU 15 17.1 9.4 5.2 2.4 8.3
LAN 14 15.8 9.6 4.1 2.1 -5.7
CHN 10 15.1 6.9 6.5 1.6 -3.7
MIN 14 14.9 5.6 4.3 5.0 3.3
ATL 3 10.8 7.3 2.8 0.7 -7.7
ARI 5 9.9 7.1 1.0 1.8 2.3
TBA 10 9.7 -2.5 3.3 8.8 9.3
PHI 6 9.6 5.1 3.0 1.5 -0.7
NYA 11 8.9 1.8 4.7 2.4 1.3
SEA -3 8.2 10.2 -1.8 -0.1 -26.7
BOS 2 8.1 6.2 -0.9 2.9 -1.7
MIL 7 7.7 9.6 2.0 -3.9 1.3
CIN -5 6.8 -1.0 3.9 3.9 9.3
OAK -4 5.6 4.1 -0.3 1.9 4.3
SLN 1 3.9 5.0 -2.3 1.2 0.3

The values are WAA which is mathematically wins – losses.  Teams wins – losses are easy to calculate.   The math behind this data model calculates it for players with team totals represented above.

That’s all for now.  While thinking about doing Cubs status I had to fix the scripts that make this table so it ended up here today.  New Cubs status showing how the above numbers are calculated coming soon.  Until then ….

2019 Playoff Horse Race Part 2

After looking at that 30 team table in Part 1 of this series it seemed that sorting based upon the output of this data model will be useful.  Last April when we did this 9 of the top 12 teams made the playoffs.   Sorting on simple W-L will depress certain teams like Atlanta and others who have improved even though they had a bad run these last three years.  Here is the same table as in Part 1 but sorting on Total, as measured by value assigned by this data model,  instead of simple win/loss records.

Team Ranking by Total value

TeamID W-L Total Hitters Pitchers Starters Relief
CHN 94 118.9 45.8 73.1 42.7 30.4
HOU 90 106.1 50.5 55.7 29.3 26.3
BOS 102 103.3 65.8 37.5 25.1 12.4
NYA 64 98.9 48.3 50.6 23.7 27.0
WAS 62 86.4 45.0 41.4 29.0 12.5
COL 19 82.9 75.4 7.6 -2.0 9.6
CLE 89 77.0 15.7 61.3 47.0 14.3
LAN 87 73.2 33.1 40.1 18.4 21.7
MIL 23 61.3 32.3 28.9 12.0 16.9
NYN -18 58.9 8.4 50.5 31.5 19.0
SLN 28 56.9 38.9 18.0 3.8 14.2
PHI -52 37.9 14.2 23.6 3.0 20.7
OAK -4 37.8 23.5 14.3 -14.1 28.4
MIN -42 36.5 39.3 -2.8 -3.5 0.8
ATL -25 30.2 25.2 5.0 4.9 0.1
CIN -80 28.5 8.3 20.2 2.2 18.0
SFN -38 27.6 -5.0 32.6 13.4 19.2
SDN -76 21.6 6.5 15.2 3.2 12.0
TBA -10 19.1 1.6 17.5 13.1 4.3
PIT -14 15.1 0.9 14.2 5.0 9.2
SEA 20 10.2 5.9 4.3 -4.3 8.7
ARI 2 9.8 -0.5 10.3 5.8 4.5
ANA -18 9.5 11.9 -2.4 -7.9 5.5
TEX -6 2.1 5.2 -3.1 -6.9 3.8
CHA -72 -8.1 -8.3 0.1 -5.0 5.1
TOR -10 -17.8 -10.4 -7.4 0.2 -7.6
BAL -64 -23.5 -17.6 -5.9 -9.1 3.2
MIA -46 -28.8 -16.9 -11.8 -2.6 -9.3
KCA -48 -29.3 -25.4 -3.9 -0.6 -3.3
DET -57 -47.2 -9.1 -38.1 -28.9 -9.2

This provides a different perspective on the situation and most likely 8 teams in the top half of this table will make the playoffs.  This post could be considered an addendum to Part 1.  If we had the web site built you would be able to drill down into each team to see how the sausage is made.

The above does not take into account the potential of new guys or new guys from AAA or AA in waiting to come up mid season and help propel their teams into the playoffs.  That is fodder for a different kind of data model.  That is all for now.  Until then ….

2019 Playoff Horse Race Part 1

Normally these playoff horse race posts start around the end of August through September using current year data.  The race actually starts now at the beginning of the season and unlike a real horse race like the Kentucky Derby, the race for a playoff spot is more like a 26 mile marathon run by humans.

This post could also be Part 2 of the Prediction Racket but we’re not going to make predictions or projections.  Since we won’t have enough current year data until beginning of May we have to use career data.  It has been determined that a 3 year snapshot is a good indicator of talent and levels the playing field for young guys with veterans.  Guys like Albert Pujols who has the highest active career stats in MLB by far barely ranks in the top 200 in the 2016, 2017, 2018 snapshot.

The below playoff horse race table shows all teams sorted by actual team WAA (W-L) for the last three years which is the only stat the Commissioner of MLB cares about.  The Total column is the sum of Hitters and Pitchers.  Pitchers is the sum of Starters and Relief.   Since new guys start out at WAA=0 there is no way to project what impact they may have on each category.  This data model does not project.  These values represent the past.  There are a lot of numbers in these horse race tables so more explanation below the fold.

Team Ranking

TeamID W-L Total Hitters Pitchers Starters Relief
BOS 102 103.3 65.8 37.5 25.1 12.4
CHN 94 118.9 45.8 73.1 42.7 30.4
HOU 90 106.1 50.5 55.7 29.3 26.3
CLE 89 77.0 15.7 61.3 47.0 14.3
LAN 87 73.2 33.1 40.1 18.4 21.7
NYA 64 98.9 48.3 50.6 23.7 27.0
WAS 62 86.4 45.0 41.4 29.0 12.5
SLN 28 56.9 38.9 18.0 3.8 14.2
MIL 23 61.3 32.3 28.9 12.0 16.9
SEA 20 10.2 5.9 4.3 -4.3 8.7
COL 19 82.9 75.4 7.6 -2.0 9.6
ARI 2 9.8 -0.5 10.3 5.8 4.5
OAK -4 37.8 23.5 14.3 -14.1 28.4
TEX -6 2.1 5.2 -3.1 -6.9 3.8
TOR -10 -17.8 -10.4 -7.4 0.2 -7.6
TBA -10 19.1 1.6 17.5 13.1 4.3
PIT -14 15.1 0.9 14.2 5.0 9.2
NYN -18 58.9 8.4 50.5 31.5 19.0
ANA -18 9.5 11.9 -2.4 -7.9 5.5
ATL -25 30.2 25.2 5.0 4.9 0.1
SFN -38 27.6 -5.0 32.6 13.4 19.2
MIN -42 36.5 39.3 -2.8 -3.5 0.8
MIA -46 -28.8 -16.9 -11.8 -2.6 -9.3
KCA -48 -29.3 -25.4 -3.9 -0.6 -3.3
PHI -52 37.9 14.2 23.6 3.0 20.7
DET -57 -47.2 -9.1 -38.1 -28.9 -9.2
BAL -64 -23.5 -17.6 -5.9 -9.1 3.2
CHA -72 -8.1 -8.3 0.1 -5.0 5.1
SDN -76 21.6 6.5 15.2 3.2 12.0
CIN -80 28.5 8.3 20.2 2.2 18.0

The Cubs (CHN) had three very good consecutive years despite losing early last September putting them near the top of this list.  This and trades for high career value guys like Hamels give the Cubs the highest career Total of all 30 teams according to this data model.  This should be expected but as we saw the last three real games against Texas, high value career guys can tank just as easily as anyone.

Note: This table was made automatically based upon incoming roster data which I did not thoroughly check for accuracy — except for CHN.   If a good player is on DL his numbers won’t be part of that team’s total because he’s not on the roster.   Rosters change daily and processing this is a big part of the current year dataset.

If we were joining the Prediction Racket, the above table would make for a nice template.  Move teams around on a whim and you’ll probably be very close to being right at the end of the season.  Teams like ATL who played well last season but struggled the two before last are in the bottom half but their Total 3 year career split has risen significantly from last season.

The White Sox will probably do better than their position on this table too and CIN, the worst team in MLB over the past three years, have pretty decent WAA value.  By getting rid of Homer Baily CIN raised their Starters, Pitching, and Total numbers by over 10.  Teams get better by getting rid of bad players.

Now let’s check the above numbers by taking a look at CHN roster.   Our roster source is missing two players and not sure who they are.  Since no data can be crunched in April all the scripts that process this data flow from May to November have to be reworked.  Below are CHN Hitters, Starters, and Relief; the three categories crucial for daily simulation and estimating winning probabilities.

CHN Hitters

Rank WAA Name_TeamID Pos
+019+ 13.23 Anthony_Rizzo_CHN IF
+029+ 11.42 Javier_Baez_CHN IF
+034+ 11.03 Kris_Bryant_CHN IF
XXXXX 3.23 Ben_Zobrist_CHN IF
XXXXX 3.00 Daniel_Descalso_CHN IF
XXXXX 2.44 Kyle_Schwarber_CHN OF
XXXXX 1.01 David_Bote_CHN IF
XXXXX 0.88 Albert_Almora_CHN OF
XXXXX -0.10 Jason_Heyward_CHN OF
XXXXX -0.38 Mark_Zagunis_CHN OF
Total  45.76

The Rank is the same process used for current year.  Top 200 get ranked with a + , bottom 200 with a – , and everyone else gets XXXXX meaning unranked.  Hitters and Pitchers, AL and NL are ranked together and the Cubs have 3 guys in the above list in the top 50 which is very good.

CHN Starters

Rank WAA Name_TeamID Pos
+013+ 16.09 Kyle_Hendricks_CHN SP
+033+ 11.21 Jon_Lester_CHN SP
+108+ 6.30 Cole_Hamels_CHN SP
+120+ 5.86 Jose_Quintana_CHN SP
XXXXX 3.21 Yu_Darvish_CHN SP
Total  42.67

This is a good starting staff on paper but starting pitching can be extremely unreliable.  Hendricks and Lester are pretty solid each year but it’s hard to predict this.  No team can complain about 2 starters in the top 50 and 4 in the top 200.   The PECOTA projection system banked their troll giving the Cubs 79 wins this season by predicting these old guys will tank.   Hendricks hasn’t reached free agency yet.  Whatever.

CHN Relief

Rank WAA Name_TeamID Pos
+076+ 7.54 Steve_Cishek_CHN RP
+092+ 6.93 Mike_Montgomery_CHN RP
+099+ 6.72 Brad_Brach_CHN RP
+117+ 6.03 Pedro_Strop_CHN RP
+183+ 4.18 Carl_Edwards_CHN RP
XXXXX 2.83 Brandon_Kintzler_CHN RP
XXXXX -1.01 Randy_Rosario_CHN RP
XXXXX -2.79 Tyler_Chatwood_CHN RP
Total  30.43

Another 5 guys in the top 200 on relief staff.  If MLB games were played on paper the Cubs would be in extremely good shape.  Unfortunately the Commish makes them play the games so anything can happen.

That is all for now.  Perhaps Part 2 of this 2019 version of the playoff horse race will get posted end of May using current data.  We’ll see.  The next post will cover a topic brought up at the local pub.  Someone suggested that first pitches have a higher probability of being strikes than all the other pitches.   This can be easily proven using event data from retrosheet.org.

This season all betting opportunity will get posted which should be around 3 or 4 games per day starting in May plus the usual bi-weekly CHN team status and series analysis and more.  Presentation of game data will be different and hopefully more intuitive.  Until then ….

2018 World Series Report Part 1

Today will be part one of a World Series Report series.  These will be similar to the preceding playoff report series which mean mostly database dumps with some commentary about the results if they’re interesting. Today we’ll start with a complete playoff horse race sorted by team total WAA, the value stat generated by this data model.

Playoff Horse Race Part 8

TeamID W-L Total Hitters Pitchers Starters Relief
LAN 21 50.7 28.1 22.6 14.9 7.6
BOS 54 42.2 31.6 10.6 8.9 1.7

… and then there were two.  Totals not much different from Part 7 of the playoff horse race series.   LAN has a better set of starters and better relief.  Boston fields a much better starting lineup as we’ll see below even though both teams have similar hitting.   Dodger hitting is spread out so they’ll have better late game pinch hitting which could be a factor in close games between two high caliber teams.  Let’s hear what the people think about tonight.

Handicapping Report

DATE 10_23_8:05_PM LAN BOS

LINEAWAY LAN [ 0.426 ] < 0.417 > +140 $240
STARTAWAY 5.02(0.640) Clayton_Kershaw_LAN TIER 1
--------------------------------------------
LINEHOME BOS [ 0.592 ] < 0.600 > -150 $166
STARTHOME 7.41(0.711) Chris_Sale_BOS TIER 1
--------------------------------------------

TIER COMBOS
LAN Lineup 1 ==> BOS Starter 1 / Relief 3 == 0.492 LAN 4.38 runs
BOS Lineup 1 ==> LAN Starter 1 / Relief 2 == 0.508 BOS 4.45 runs

EXPECTED VALUE LAN BOS
Tier Combo 118 84
Home Field 110 90

Two top tier MLB starters pitching tonight who must each face top tier lineups.  LAN has a slight edge with relief.  Boston fields a much better lineup even though both considered Tier 1 which gives them a slight edge in simulation.  Simulations use deltas between tiers not hard boundaries.  More about that in the off season.

The market however favors Boston significantly higher which causes Dodgers Expected Value for TC simulation to be 118, almost betting opportunity.  Home field disadvantage for LAN drops this to 110.  Right now claw back into historical lines data has not been done.  The above shows Boston is over valued based upon current data.  Kershaw can be flaky in the playoffs which are anomalies that can’t be quantified so we’ll see.

Let’s look at a lineup snapshot for each team taken a couple days ago which shouldn’t be much different than the official ones posted later.  We’ll also show Tier data and relief rosters according to our source.  For some reason (probably because our source is not that reliable) Boston is missing a player but that shouldn’t affect the below summary much.

Update 10/24:  That missing Boston player has been found and it’s starter Nathan Eovaldi who has a WAA=0.86 for the year which is solid Tier 3.  That 0.86 is not included in the playoff horse race table above.

LAN Lineup

WAA Name_TeamID Pos PA 10172018
2.10 Cody_Bellinger_LAN 1B-CF 632
2.06 Justin_Turner_LAN 3B 426
1.78 David_Freese_LAN 3B-1B 312
3.04 Manny_Machado_LAN SS-3B 709
4.98 Max_Muncy_LAN 1B-3B-2B 481
1.49 Chris_Taylor_LAN SS-CF-LF-2B 604
1.74 Enrique_Hernandez_LAN CF-2B-SS-RF-LF 462
-0.71 Austin_Barnes_LAN CR-2B 238
-0.48 Clayton_Kershaw_LAN PR 57
Total WAA=16.00 PA=3921 WinPct=0.578

This is considered Tier 1 but just barely.  The Tier 1/2 boundary is +15.36 which is based upon a distribution of lineups from all 30 teams.

BOS Lineup

WAA Name_TeamID Pos PA 10162018
7.54 Mookie_Betts_BOS RF-CF 614
4.45 Andrew_Benintendi_BOS LF-CF 661
10.08 J.D._Martinez_BOS DH-LF-RF 649
4.77 Xander_Bogaerts_BOS SS 580
2.12 Steve_Pearce_BOS 1B-DH-LF 251
-1.28 Eduardo_Nunez_BOS 2B-3B-DH 502
-0.86 Ian_Kinsler_BOS 2B 534
-2.10 Christian_Vazquez_BOS CR 269
1.62 Jackie_Bradley_BOS CF-RF 535
Total WAA=26.33 PA=4595 WinPct=0.610

This far exceeds the Tier 1/2 boundary and since simulations use deltas to determine differences, Boston’s lineup against Clayton Kershaw and LAN relief is greater than Dodgers barely Tier 1 lineup against a higher rated pitcher in Chris Sale.  This is why TC simulation sees the game more even steven than the blowout predicted by the market.

LAN Tier Data

Type Tier Name_Teamid WAA
Lineups 1 LAN 16.00
SP 1 Walker_Buehler_LAN 4.64
SP 3 Rich_Hill_LAN 1.26
SP 1 Clayton_Kershaw_LAN 5.02
SP 1 Hyun-jin_Ryu_LAN 4.03
RP 2 LAN 7.64

Dodgers are showing their four starters while Boston below only shows 3.  The missing Boston player might be their fourth starter because it’s hard to make it through a seven game series with only 3 starters.

BOS Tier Data

Type Tier Name_Teamid WAA
Lineups 1 BOS 22.60
SP 3 Rick_Porcello_BOS -0.73
SP 3 David_Price_BOS 2.21
SP 1 Chris_Sale_BOS 7.41
RP 3 BOS 1.68

Lineups in tier data took a different snapshot than the listed lineup shown above.  All lineups vary a little from day to day.  Boston relief also took a big hit making them close to the Tier 3/4 border which is at WAA=0 for all 30 teams.

Note: Relief distributions are taken for the other 28 teams from end of August rosters using end of year data for each player.

LAN Relief

Rank WAA Name_TeamID Pos
+096+ 2.83 Dylan_Floro_TOT PITCH
+192+ 1.76 Kenley_Jansen_LAN PITCH
XXXXX 1.57 Pedro_Baez_LAN PITCH
XXXXX 1.39 Alex_Wood_LAN PITCH
XXXXX 0.78 Kenta_Maeda_LAN PITCH
XXXXX 0.59 Scott_Alexander_LAN PITCH
XXXXX 0.38 Julio_Urias_LAN PITCH
-186- -1.62 Ryan_Madson_TOT PITCH
Total 7.68

The Tier 2/3 border for relief is +6.26 so LAN is has a relief staff almost a complete tier above Boston.  This will be true for the entire series.

BOS Relief

Rank WAA Name_TeamID Pos
+163+ 1.99 Craig_Kimbrel_BOS PITCH
+168+ 1.97 Ryan_Brasier_BOS PITCH
XXXXX 0.92 Eduardo_Rodriguez_BOS PITCH
XXXXX 0.67 Matt_Barnes_BOS PITCH
XXXXX -0.10 Heath_Hembree_BOS PITCH
XXXXX -0.40 Joe_Kelly_BOS PITCH
-025- -3.38 Drew_Pomeranz_BOS PITCH
Total 1.67

I double checked this and Drew Pomeranz appears to be on the playoff roster.  Not sure why.  There maybe matchup considerations with Dodger hitting that we don’t know about.   There must be a good reason — including our source for rosters made a mistake.  We’ll see.

That is all for showing playoff roster information.  Except for lineups the above won’t change much.  The rest of the world series reports will only show the handicapping with commentary if necessary.

2018 Playoff Horse Race Part 7

Here’s a dump of the playoff horse race featuring the last 4 teams in the race.

TeamID W-L Total Hitters Pitchers Starters Relief
LAN 21 52.0 28.1 23.9 14.9 9.0
HOU 44 49.1 17.3 31.8 19.7 12.2
BOS 54 47.7 31.6 16.0 9.8 6.3
MIL 29 33.5 15.0 18.5 5.4 13.1

The above is sorted by Total WAA as calculated by this data model.  Hitters and Pitchers add to make Total, Starters and Relief add to make Pitchers.  Not much different from Part 6.  Milwaukee had a collapse of their top notch relief staff yesterday and still won because Dodgers had a collapse of their top notch starter.  Let’s look at handicap reports for the two games today.

DATE 10_13_4:05_PM LAN MIL

LINEAWAY LAN [ 0.535 ] < 0.556 > -125 $180
STARTAWAY 4.03(0.720) Hyun-Jin_Ryu_LAN TIER 1
--------------------------------------------
LINEHOME MIL [ 0.512 ] < 0.465 > +115 $215
STARTHOME 2.92(0.663) Wade_Miley_MIL TIER 2
--------------------------------------------

TIER COMBOS
LAN Lineup 1 ==> MIL Starter 2 / Relief 1 == 0.521 LAN 4.46 runs
MIL Lineup 2 ==> LAN Starter 1 / Relief 2 == 0.479 MIL 4.27 runs

EXPECTED VALUE LAN MIL
Tier Combo 94 103
Home Field 83 116

Los Angeles favored slightly by Tier Combo simulations and a little more than slightly by the market.  Ryu is technically Tier 1, Miley Tier 2 and Dodgers have a Tier 1 lineup and Milwaukee brings a Tier 1 relief staff.   Since the market pretty much agrees with simulations there is no point in betting this game.

DATE 10_13_8:05_PM HOU BOS

LINEAWAY HOU [ 0.476 ] < 0.488 > +105 $205
STARTAWAY 8.02(0.669) Justin_Verlander_HOU TIER 1
--------------------------------------------
LINEHOME BOS [ 0.545 ] < 0.535 > -115 $186
STARTHOME 7.41(0.711) Chris_Sale_BOS TIER 1
--------------------------------------------

TIER COMBOS
HOU Lineup 2 ==> BOS Starter 1 / Relief 3 == 0.495 HOU 4.34 runs
BOS Lineup 1 ==> HOU Starter 1 / Relief 1 == 0.505 BOS 4.38 runs

EXPECTED VALUE HOU BOS
Tier Combo 101 94
Home Field 94 100

Two top top of MLB pitchers starting tonight.  HOU has better relief, BOS has better lineup and TC simulations call this an even steven game.  BOS is favored by almost exactly at historical home field advantage which would make sense.  Both lines a discard.

That is all for today.  Handicap dumps will happen throughout the playoffs.  In the off season when the code gets finished we can reminisce handicapping playoffs from past seasons.  Since we’re from the future we know the outcomes.