Featured post

About this site

This site is a public logbook on the development of a baseball data model that measures baseball player value and ranks them from best to worst.  This model contains the current 30 MLB franchises, their minor league affiliates, and their historical teams.   It covers all seasons and all players from 1900 – 2017.

Browse the Table of Contents for more information.  We covered the 2017 season extensively.  Not much published here in 2016 even though the Cubs won and it has been sporadic the years before starting in September 2013.

The goal of this data model is to become an app that user can quickly evaluate a player being talked without knowing anything about baseball.   They can then become the smartest person in the room about that player.  There will be a handicapping component but that is a work in progress and hasn’t been proven.  We have a solid proof for the WAA measure, something WAR does not have.

Cubs Status 4/20/2019

This will be the first Cubs status of the year and it will be short because there isn’t enough data for displaying player rankings yet.  Cody Belllinger and Christian Yelich are #1 and #2 and only 19 players qualified to rank in the top 200 so far.  There is about 12 more days before handicap season can begin.  May is still rough waters for handicapping due to not enough data but we’ll talk through that then.

CHN Team Status

18.3 7.9 108 86 9 9 -4.7 0.6 CHN 2019

The above is the team status record used here the last many years.  BAT and PITCH columns are derived from Runs Scored (Rs) and Runs Against (Ra) as follows:

BAT = Rs – Ravg – LR/2

PITCH = Ravg – Ra – UR

Not too complicated.  UR is Unearned Runs above average, Ravg is average runs scored per team which is the same for runs scored and scored against for an obvious reason: for every run scored there must be a run scored against.  UR represents unearned runs above average which is the only official unbiased metric on fielding measured since the beginning of baseball.  LR are lucky runs above average which are runs scored where a batter does not get an RBI but the runner gets an R.  Those runs count too.

The purpose of this model is to simplify things and not be like all other stat sites with huge tables of columns and rows.  Since BAT and PITCH are derived from run differential numbers there is no need to include them in team status records because they are redundant.  UR and LR were included for symmetry.  UR indicates poor or good fielding as a team.  That stays.  There is nothing a team can do to improve lucky runs, runs scored on balks or wild pitches or other kinds of tom foolery.  That column will be eliminated.

Below is the new simplified Team Status record

18.3 7.9 9 9 -4.7 CHN

Cubs are underwater on unearned runs which cost them a couple games early in the season.  BAT and PITCH are both above water and this team is exactly even steven for April so far.  Not much more to say and although the scripts have the league compiled with current rosters, that won’t be displayed until May so there’s not much more to say about this than what does Vegas think of today’s game at Wrigley.

DATE 04_20_2:20_PM ARI CHN
LINEAWAY ARI 0.455 0.426 +135 235
STARTAWAY -0.84 0.338 Zack_Greinke_ARI 23.3 -1.72
LINEUPAWAY 3.26 1.47
RELIEFAWAY -0.67 -1.51
LINEHOME CHN 0.565 0.592 -145 168
STARTHOME -0.57 0.356 Yu_Darvish_CHN 17.7 -1.16
LINEUPHOME 3.63 1.70
RELIEFHOME 0.65 0.16

The above is a shortened preview of the new Ouija Board which will be much more intuitive.  Cubs are favored by almost 3-2 with a 0.592 break even probability.  Both pitchers are under water so far, both lineups around equal, and Cubs have slightly better relief.  Both team W-L records almost equal.

So.  There’s not enough data yet to make a determination on this yet.  The new report will be much more intuitive than the above and all the pertinent numbers used to simulate will be explained.  Until then ….

More Useless Stats

April is a dead zone for this data model as player stats cannot be accurately compiled until May and team stats around mid April.  This is because there are huge fluctuations throughout the league rending deceiving results.  That doesn’t stop certain Cubs announcers from rattling off meaningless team slash lines that make no sense (Hi JD!)  but whatever.  He just reads what some stat heads write on a cue card thinking it adds value to the color commentary.  it doesn’t.  But I digress ….

In the meantime let’s bide some time and waste it on even more useless stats.  The other day at the local pub a person who played at a pretty high level of baseball mentioned an interesting theory; he said pitchers  throw more strikes on the first pitch hoping the batter will be taking.  Is this true?

Since we have event data from retrosheet.org that show pitch sequences back to 1988 this is something that can be either proven or disproven.  First a counting pitches script needed to be written.  In order to not get too crazy, only years 2015 – 2018 were processed which should be enough.

My first question when writing this script was what are the average pitch count per batter.  This comes to almost exactly 4.  Next I made these calculations:

  • Average # pitches / STRIKEOUT = 4.9
  • Average # pitches / OUT = 3.5
  • Average # pitches / WALK = 5.9
  • Average # pitches / HIT = 3.5

Whether batter gets an OUT or a HIT are the same at 3.5 pitches / batter.  Pitchers who like to throw strikeouts add another 1.4 pitches / batter to their pitch count and that goes up another whole pitch if they walk a batter.  Since many Sabermetric stats demand pitchers throw strikeouts this can spike pitch counts for no reason other than a pitcher needs to game FIP for his next contract or help his Draft Kings teams win.

More Fun With Numbers

The following table shows 4 event types and the percentages they occur based upon pitch number and type of pitch.  Row 1 B means first pitch Ball, then what happens.  C means Called Strike, F means Foul ball, S means swinging strike.  The percentages of columns in each row must add to 1.  More explanation below the fold.

1 B 0.179 0.447 0.141 0.233
1 C 0.293 0.444 0.049 0.214
1 F 0.295 0.436 0.046 0.222
1 S 0.362 0.389 0.052 0.196
2 B 0.215 0.410 0.164 0.211
2 C 0.339 0.395 0.070 0.195
2 F 0.358 0.392 0.054 0.196
2 S 0.447 0.338 0.053 0.162
3 B 0.266 0.351 0.203 0.179
3 C 0.396 0.335 0.106 0.163
3 F 0.378 0.368 0.069 0.185
3 S 0.624 0.214 0.059 0.104
4 B 0.271 0.299 0.277 0.153
4 C 0.408 0.249 0.215 0.128
4 F 0.378 0.358 0.086 0.177
4 S 0.790 0.107 0.050 0.053

The above shows the first 4 pitches which is almost exactly a per batter league average pitch count.  How it came to exactly 4 is as  fascinating as how Hits/2 almost exactly equals runs scored.

The second pitch above does not care what happened in the first pitch.  Ditto for pitches 3 and 4.  You would need to do some conditional probability to figure out anything in more detail and whether that would be worthwhile — it’s probably not worthwhile.

This post last April showed that MLB average Batting Average was 0.255 for all batters between 2010 and 2017.  The above Hit % is not a batting average as it uses Plate Appearances (PA) instead of At Bats (AB) as a divisor.  For this exercise using PA is a more accurate and less confusing measure.

Scanning this table you’ll see both Hits and Walks are most likely when a Ball is thrown which seems intuitively obvious.  Not sure how useful any of the above data is other than gaining an advantage on a bunch of friends at a game who like to bet on every pitch and batter.

The last table is the crux of this entire study.  Aside from Pitch number there are 5 categories of things that can happen listed:

  • SWING – batter swings and misses
  • CALLED – called strike
  • FOUL – batter makes contact hits a foul ball
  • BALL – ball
  • CONTACT – batter puts ball in play ( out of hit )

Swing , Foul, and Contact are don’t know because we don’t know if the ball was in the strike zone when that happened.  We know a Called strike was in the strike zone and a called Ball was not.  This table shows pitches 1 through 4.  All % columns in each row must add to 1.

Pitch Swing Called Foul Ball Contact
1 0.066 0.321 0.104 0.397 0.112
2 0.107 0.165 0.167 0.386 0.174
3 0.120 0.118 0.192 0.375 0.195
4 0.124 0.111 0.206 0.352 0.207

If a batter lays back and does nothing the above suggests it’s more likely the pitch will be a ball instead of a called strike.  Called strikes on the first pitch are almost double that of subsequent pitches so my friend does have a point.  Pitchers do throw more accurately on the first pitch compared to all the others.

Clarification 4/16/2019:  The above statement is wrong.  Batters may tend to lay back on the first pitch which is why Called Strikes are so high.  A high percentage of Swings and Fouls  would be called strikes.  What that percentage is we can’t tell from event data.  The radar guys keeping track of every pitch thrown would know.  Pitches in the strike zone and out of the strike zone could be estimated by estimating this probability.  It appears Called and Balls  would be somewhat equal according to the above table.

That is all for now.  Cubs are having a tumultuous April.  We can do a brief CHN team status with no player rankings in perhaps  7 – 10 days.  Hopefully things settle out for them.  This model had CHN with the highest Total value in MLB based upon 2016, 2017, and 2018 splits.  This should be expected since the Cubs have had an incredible run of winning these last three years and most of the players who racked up those wins are still on the team.

As always, past results do not affect future results and we really witnessed that these last 10 games even though they won their home opener today 10-0.  No one can predict the future and anyone making claims that they did with respect to the Cubs are lying.  The only thing this model can do is provide an accurate view of the past.  Most other systems can’t even do that.

Through our view of current year data starting in May when 1/6 of the season is in the books and many players have more than 100 PAs, we can use current year data to estimate a handicap for  single upcoming games.  We can’t estimate what will happen for the next 133 games because that is impossible.  Things change daily and weekly and, as we have shown here over an over, stats like BA, OPS, WAR, etc. etc. do not react to changes as quickly as this model does.  That’s our advantage which is why we beat ELO by over 10% last two seasons, Vegas by 10% in 2018 but only 2.5% in 2017 ( this is currently being looked into ).  More on this later.  Until then ….

The Simulation Part 4

This post will show new output from the simulation that will be employed here this season.  This data model cannot compile current year data until May when there is enough of it so in the meantime checking bugs and verifying results of  the handicapping system needed to done and it’s almost complete.

In past years this log book usually showed the starting game of each Cubs series with a dump of data from the data model showing Vegas and simulation estimated probabilities and then I had to talk through the numbers.   The Vegas probabilities were the gold standard to beat.  This simulation had no proof.  Winning or losing a couple of games proves nothing.

Today we’ll run through the new Game entity which contains all relevant data from each game down to the lineups, starters, relief, who won, lines, etc.  Our simulations are based upon all games from 1970 – 2018 excluding March and April.  This comes to around 100K games, two teams per game.

Although the model has been verified against Vegas automatically, it still needs to be spot checked manually for anomalies and other things that can be exploited to make it better.  I picked one game at random used for debugging output so decided to just go with it here.

Here’s a game between the Pirates and Dodgers on May 8, 2017.  The format of all of this is still a work in progress.  Commentary on what every section means will be interspersed.

GAME PIT LAN 201705080

201705080 is a game number using retrosheet.org nomenclature.  A 0 attached to the YYYYMMDD date means single game.  A double header will attach a 1 or 2 depending upon which game.  Keeping track of double headers was a big problem.

VEGAS PIT 0.345 LAN 0.688
SIM PIT 0.421 LAN 0.579
NSIM PIT 0.400 LAN 0.600
ELO PIT 0.417 LAN 0.583
DELTA PIT 0.435 LAN 0.565
ELO PIT 121 LAN 85

In the last few years we showed Vegas lines with LINEHOME and LINEAWAY text records.  The above consolidates those records and shows the other systems we compare to.   Line records now contain

  • Type of system
  • away team – away teamid
  • away probability – break even probability for away team
  • home team – home teamid
  • home probability – break even probability for home team

There are 5 different systems shown under Lines:

  • VEGAS – These are probabilities derived from end of day betting lines
  • SIM – These probabilities derived from old simulation (deprecated)
  • NSIM – Probabilities derived from new simulation
  • ELO – Probabilities derived from Nate Silver’s ELO system
  • DELTA – Probabilities derived from old DeltaWAA (deprecated)

SIM was the original system used second half of last season.  Those simulations had too much error and needed to be fixed.  DeltaWAA is the away team WAA – home team WAA where WAA = W-L, the value that is the foundation of this data model.  Last season we had a table lookup showing a probability based upon that derived from historical data.  That now has been integrated into the simulation called NSIM above which is the system that beat ELO and Vegas in Part 3 of this series.

In May when we start doing this for live games only VEGAS and NSIM will be shown.  If we can easily acquire ELO data through wget that will be included but not counting on that right now.

The EV section above shows Expected Value on a $100 bet calculated by the differences between various break even probabilities and VEGAS the house break even probability.   EV records show both away and home bets.

In the above example, the AWAY bet is above our threshold of 115 and it happens that both ELO and NSIM are betting Pittsburgh this game.  And they both lose LOL.  There is a lot of give and take in handicapping and just because a system loses does not mean it wasn’t a good bet.  Let’s take a look at that.

AWAY PIT 000001000 --> 1
HOME LAN 60220020 --> 12

Above is a line score for this game.  We really got demolished in this game.  You can get info about how ELO works from the source.    The last part to this series showed ELO beat Vegas in our first accuracy test — which I’m still researching its accuracy.

Let’s dive into the simulation data for this game.

TIERDATA PIT LAN 201705080 -3 -4 3 -1 -2 LAN
---- AWAY L -> HOME S ----> -3 ---- AWAY L -> HOME R ----> 4
AWAY LINEUP -1.89 PIT --> -1.73
HOME STARTER 0.48 Alex_Wood_LAN 24.7 --> 0.85
HOME RELIEF 2.54 LAN --> 2.11
---- HOME L -> AWAY S ----> 3 ---- HOME L -> AWAY R ----> -2
HOME LINEUP 3.21 LAN --> 1.82
AWAY STARTER -0.36 Trevor_Williams_PIT 11.7 --> -0.74
AWAY RELIEF 2.86 PIT --> 2.52

The first two numbers in the TIERDATA record ( -3 , -4 ) are what we call tier deltas.  They are an integer difference between home lineup , away starter and home lineup , away relief.  They range from -6 to +6 and are discrete integers.  A +6 means the best lineup (+3) against the worst starter (-3) or ALS = +3 – (-3).

The second two numbers is the opposite; home lineup against away starter , home lineup against away relief.  Pluses in all these numbers means lineup team is favored, minuses means lineup team is not favored in this category.  Clear as mud?

This gets very confusing and I got confused writing this.  These reports were produced to debug the output.  Because it’s so confusing bugs could be introduced or things get assigned backwards.

Since WAA has additive properties the value of a lineup and relief staff is merely the sum of player WAAs.    Each day these, along with single starter values, get calculated for each team at the beginning of the day — much like what we’ll see at the beginning of each day starting this May.  Averages and standard deviations are taken among all 30 MLB teams for that day and tiering is assigned to each group or starter.

The numbers in brown above are measured in 1/2 standard deviations away from the mean +/-.   An AWAY lineup-starter tier would be calculated like this:

ALS = Lineup – Starter = –1.730.85 = -2.58 = -3 ( rounded down for negative )

The bold blue -3 , the AWAY lineup starter combo is the first simulation number in the TIERDATA record used for simulation.  The other three combo numbers are calculated similarly.  The hard part is taking snapshots of 100K games and curating those numbers, the foundation of which rests upon the WAA player value generated by this data model.  ALR, HLS, and HLR are calculated similarly with their numbers shown above.

The last number is DeltaWAA which we talked about last season.

DeltaWAA = Away WAA – Home WAA

Here is DeltaWAA for this game.

---- DELTAWAA -----------> -2
DELTAWAA PIT 14 17 LAN 17 14 (-6 -2)

A team WAA is simply W-L which is -3 for PIT , + 3 for LAN.  The numbers highlighted in brown show this deltaWAA and a calculated tier which is another discrete integer between -21 and +21 .  DeltaWAA for this game is -6 which favors HOME team and that gets assigned to a -2 tier in the simulator.

The simulation takes these numbers and runs a Monte Carlo simulation of ! million games calculating wins and losses and converting that into an expected win percentage or probability.

ELO, NSIM and Vegas all had the Dodgers favored in this game.  The above breakdown shows why.   Even at 0.400, NSIM’s probability and ELO’s 0.412 for Pittsburgh was much higher than Vegas’  0.345 so the underdog became a betting opportunity.  Irrational exuberance for home games in Los Angeles has been observed but is fodder for another time.

The above shows the ingredients that go into simulation producing a handicapping probability that can be compared to Vegas lines.   There are many more variables that this model does not take into account which may influence the outcome.  The purpose of posting all the variables used is so people view the bet no bet decisions here critically.  It’s possible a way to adjust inputs can be done on the fly if you disagree with the model’s input.  There are no guarantees when it comes to handicapping future events.

That is all for now.  The above will become a standard feature this season.  Might have to break out of this WordPress format though to properly show this in a more intuitive manner.  Until then ….

2019 Playoff Horse Race Part 2

After looking at that 30 team table in Part 1 of this series it seemed that sorting based upon the output of this data model will be useful.  Last April when we did this 9 of the top 12 teams made the playoffs.   Sorting on simple W-L will depress certain teams like Atlanta and others who have improved even though they had a bad run these last three years.  Here is the same table as in Part 1 but sorting on Total, as measured by value assigned by this data model,  instead of simple win/loss records.

Team Ranking by Total value

TeamID W-L Total Hitters Pitchers Starters Relief
CHN 94 118.9 45.8 73.1 42.7 30.4
HOU 90 106.1 50.5 55.7 29.3 26.3
BOS 102 103.3 65.8 37.5 25.1 12.4
NYA 64 98.9 48.3 50.6 23.7 27.0
WAS 62 86.4 45.0 41.4 29.0 12.5
COL 19 82.9 75.4 7.6 -2.0 9.6
CLE 89 77.0 15.7 61.3 47.0 14.3
LAN 87 73.2 33.1 40.1 18.4 21.7
MIL 23 61.3 32.3 28.9 12.0 16.9
NYN -18 58.9 8.4 50.5 31.5 19.0
SLN 28 56.9 38.9 18.0 3.8 14.2
PHI -52 37.9 14.2 23.6 3.0 20.7
OAK -4 37.8 23.5 14.3 -14.1 28.4
MIN -42 36.5 39.3 -2.8 -3.5 0.8
ATL -25 30.2 25.2 5.0 4.9 0.1
CIN -80 28.5 8.3 20.2 2.2 18.0
SFN -38 27.6 -5.0 32.6 13.4 19.2
SDN -76 21.6 6.5 15.2 3.2 12.0
TBA -10 19.1 1.6 17.5 13.1 4.3
PIT -14 15.1 0.9 14.2 5.0 9.2
SEA 20 10.2 5.9 4.3 -4.3 8.7
ARI 2 9.8 -0.5 10.3 5.8 4.5
ANA -18 9.5 11.9 -2.4 -7.9 5.5
TEX -6 2.1 5.2 -3.1 -6.9 3.8
CHA -72 -8.1 -8.3 0.1 -5.0 5.1
TOR -10 -17.8 -10.4 -7.4 0.2 -7.6
BAL -64 -23.5 -17.6 -5.9 -9.1 3.2
MIA -46 -28.8 -16.9 -11.8 -2.6 -9.3
KCA -48 -29.3 -25.4 -3.9 -0.6 -3.3
DET -57 -47.2 -9.1 -38.1 -28.9 -9.2

This provides a different perspective on the situation and most likely 8 teams in the top half of this table will make the playoffs.  This post could be considered an addendum to Part 1.  If we had the web site built you would be able to drill down into each team to see how the sausage is made.

The above does not take into account the potential of new guys or new guys from AAA or AA in waiting to come up mid season and help propel their teams into the playoffs.  That is fodder for a different kind of data model.  That is all for now.  Until then ….

2019 Playoff Horse Race Part 1

Normally these playoff horse race posts start around the end of August through September using current year data.  The race actually starts now at the beginning of the season and unlike a real horse race like the Kentucky Derby, the race for a playoff spot is more like a 26 mile marathon run by humans.

This post could also be Part 2 of the Prediction Racket but we’re not going to make predictions or projections.  Since we won’t have enough current year data until beginning of May we have to use career data.  It has been determined that a 3 year snapshot is a good indicator of talent and levels the playing field for young guys with veterans.  Guys like Albert Pujols who has the highest active career stats in MLB by far barely ranks in the top 200 in the 2016, 2017, 2018 snapshot.

The below playoff horse race table shows all teams sorted by actual team WAA (W-L) for the last three years which is the only stat the Commissioner of MLB cares about.  The Total column is the sum of Hitters and Pitchers.  Pitchers is the sum of Starters and Relief.   Since new guys start out at WAA=0 there is no way to project what impact they may have on each category.  This data model does not project.  These values represent the past.  There are a lot of numbers in these horse race tables so more explanation below the fold.

Team Ranking

TeamID W-L Total Hitters Pitchers Starters Relief
BOS 102 103.3 65.8 37.5 25.1 12.4
CHN 94 118.9 45.8 73.1 42.7 30.4
HOU 90 106.1 50.5 55.7 29.3 26.3
CLE 89 77.0 15.7 61.3 47.0 14.3
LAN 87 73.2 33.1 40.1 18.4 21.7
NYA 64 98.9 48.3 50.6 23.7 27.0
WAS 62 86.4 45.0 41.4 29.0 12.5
SLN 28 56.9 38.9 18.0 3.8 14.2
MIL 23 61.3 32.3 28.9 12.0 16.9
SEA 20 10.2 5.9 4.3 -4.3 8.7
COL 19 82.9 75.4 7.6 -2.0 9.6
ARI 2 9.8 -0.5 10.3 5.8 4.5
OAK -4 37.8 23.5 14.3 -14.1 28.4
TEX -6 2.1 5.2 -3.1 -6.9 3.8
TOR -10 -17.8 -10.4 -7.4 0.2 -7.6
TBA -10 19.1 1.6 17.5 13.1 4.3
PIT -14 15.1 0.9 14.2 5.0 9.2
NYN -18 58.9 8.4 50.5 31.5 19.0
ANA -18 9.5 11.9 -2.4 -7.9 5.5
ATL -25 30.2 25.2 5.0 4.9 0.1
SFN -38 27.6 -5.0 32.6 13.4 19.2
MIN -42 36.5 39.3 -2.8 -3.5 0.8
MIA -46 -28.8 -16.9 -11.8 -2.6 -9.3
KCA -48 -29.3 -25.4 -3.9 -0.6 -3.3
PHI -52 37.9 14.2 23.6 3.0 20.7
DET -57 -47.2 -9.1 -38.1 -28.9 -9.2
BAL -64 -23.5 -17.6 -5.9 -9.1 3.2
CHA -72 -8.1 -8.3 0.1 -5.0 5.1
SDN -76 21.6 6.5 15.2 3.2 12.0
CIN -80 28.5 8.3 20.2 2.2 18.0

The Cubs (CHN) had three very good consecutive years despite losing early last September putting them near the top of this list.  This and trades for high career value guys like Hamels give the Cubs the highest career Total of all 30 teams according to this data model.  This should be expected but as we saw the last three real games against Texas, high value career guys can tank just as easily as anyone.

Note: This table was made automatically based upon incoming roster data which I did not thoroughly check for accuracy — except for CHN.   If a good player is on DL his numbers won’t be part of that team’s total because he’s not on the roster.   Rosters change daily and processing this is a big part of the current year dataset.

If we were joining the Prediction Racket, the above table would make for a nice template.  Move teams around on a whim and you’ll probably be very close to being right at the end of the season.  Teams like ATL who played well last season but struggled the two before last are in the bottom half but their Total 3 year career split has risen significantly from last season.

The White Sox will probably do better than their position on this table too and CIN, the worst team in MLB over the past three years, have pretty decent WAA value.  By getting rid of Homer Baily CIN raised their Starters, Pitching, and Total numbers by over 10.  Teams get better by getting rid of bad players.

Now let’s check the above numbers by taking a look at CHN roster.   Our roster source is missing two players and not sure who they are.  Since no data can be crunched in April all the scripts that process this data flow from May to November have to be reworked.  Below are CHN Hitters, Starters, and Relief; the three categories crucial for daily simulation and estimating winning probabilities.

CHN Hitters

Rank WAA Name_TeamID Pos
+019+ 13.23 Anthony_Rizzo_CHN IF
+029+ 11.42 Javier_Baez_CHN IF
+034+ 11.03 Kris_Bryant_CHN IF
XXXXX 3.23 Ben_Zobrist_CHN IF
XXXXX 3.00 Daniel_Descalso_CHN IF
XXXXX 2.44 Kyle_Schwarber_CHN OF
XXXXX 1.01 David_Bote_CHN IF
XXXXX 0.88 Albert_Almora_CHN OF
XXXXX -0.10 Jason_Heyward_CHN OF
XXXXX -0.38 Mark_Zagunis_CHN OF
Total  45.76

The Rank is the same process used for current year.  Top 200 get ranked with a + , bottom 200 with a – , and everyone else gets XXXXX meaning unranked.  Hitters and Pitchers, AL and NL are ranked together and the Cubs have 3 guys in the above list in the top 50 which is very good.

CHN Starters

Rank WAA Name_TeamID Pos
+013+ 16.09 Kyle_Hendricks_CHN SP
+033+ 11.21 Jon_Lester_CHN SP
+108+ 6.30 Cole_Hamels_CHN SP
+120+ 5.86 Jose_Quintana_CHN SP
XXXXX 3.21 Yu_Darvish_CHN SP
Total  42.67

This is a good starting staff on paper but starting pitching can be extremely unreliable.  Hendricks and Lester are pretty solid each year but it’s hard to predict this.  No team can complain about 2 starters in the top 50 and 4 in the top 200.   The PECOTA projection system banked their troll giving the Cubs 79 wins this season by predicting these old guys will tank.   Hendricks hasn’t reached free agency yet.  Whatever.

CHN Relief

Rank WAA Name_TeamID Pos
+076+ 7.54 Steve_Cishek_CHN RP
+092+ 6.93 Mike_Montgomery_CHN RP
+099+ 6.72 Brad_Brach_CHN RP
+117+ 6.03 Pedro_Strop_CHN RP
+183+ 4.18 Carl_Edwards_CHN RP
XXXXX 2.83 Brandon_Kintzler_CHN RP
XXXXX -1.01 Randy_Rosario_CHN RP
XXXXX -2.79 Tyler_Chatwood_CHN RP
Total  30.43

Another 5 guys in the top 200 on relief staff.  If MLB games were played on paper the Cubs would be in extremely good shape.  Unfortunately the Commish makes them play the games so anything can happen.

That is all for now.  Perhaps Part 2 of this 2019 version of the playoff horse race will get posted end of May using current data.  We’ll see.  The next post will cover a topic brought up at the local pub.  Someone suggested that first pitches have a higher probability of being strikes than all the other pitches.   This can be easily proven using event data from retrosheet.org.

This season all betting opportunity will get posted which should be around 3 or 4 games per day starting in May plus the usual bi-weekly CHN team status and series analysis and more.  Presentation of game data will be different and hopefully more intuitive.  Until then ….