Featured post

About this site

This site is a public logbook on the development of a baseball data model that measures baseball player value and ranks them from best to worst.  This model contains the current 30 MLB franchises, their minor league affiliates, and their historical teams.   It covers all seasons and all players from 1900 – 2017.

Browse the Table of Contents for more information.  We covered the 2017 season extensively.  Not much published here in 2016 even though the Cubs won and it has been sporadic the years before starting in September 2013.

The goal of this data model is to become an app that user can quickly evaluate a player being talked without knowing anything about baseball.   They can then become the smartest person in the room about that player.  There will be a handicapping component but that is a work in progress and hasn’t been proven.  We have a solid proof for the WAA measure, something WAR does not have.

Cubs Rockies Matchup

We usually do these on the first game of a series.  I was going to skip this series since we don’t have enough data to properly rank players.  The lines today and yesterday are kind of interesting however which is the nexus for this post.

Here’s what the Ouija Board says about today.

DATE 04_21 8:10_PM CHN COL
LINEAWAY CHN [ 0.556 ] < 0.574 >
LINEHOME COL [ 0.488 ] < 0.444 >
CHN 9 8 COL 11 10

Both teams have approximately the same record so the DeltaWAA is  0 making this an even steven game based on win/loss records.  Darvish has a  2015-2017 career split of +4 as shown here earlier.  Here is Anderson’s career.

Year WAA Name_TeamID Pos Rank
2016 1.8 Tyler_Anderson_COL PITCH +187+
2017 -0.9 Tyler_Anderson_COL PITCH XXXXX
Total 0.9

OK.  Based on career of starters CHN has an advantage.  Here is a rundown of COL BAT based on that three year career split.

Rank WAA Name_TeamID Pos
+002+ 25.1 Nolan_Arenado_COL IF
+020+ 12.2 Charlie_Blackmon_COL OF
+039+ 10.0 Carlos_Gonzalez_COL OF
+082+ 6.6 Trevor_Story_COL IF
+188+ 3.3 DJ_LeMahieu_COL IF
+XXXXX+ 2.8 Gerardo_Parra_COL OF
+XXXXX+ 2.5 Ian_Desmond_COL IF
+XXXXX+ 2.4 Pat_Valaika_COL IF
+XXXXX+ -0.3 Ryan_McMahon_COL IF
+XXXXX+ -0.4 Mike_Tauchman_COL OF
+XXXXX+ -1.3 Tony_Wolters_COL C
+XXXXX+ -3.4 Chris_Iannetta_COL C
Total 59.5

The above is based on around opening day roster but should be close. COL has around +60 and CHN is around +50.  Both teams have around the same value relief pitching as well.

Currently the Cubs line requires the Cubs having greater than 57.4% probability of winning, COL   44.4%.   Here is the line yesterday on 4/20.

DATE 04_20 8:40_PM CHN COL
LINEAWAY CHN [ 0.535 ] < 0.512 >
LINEHOME COL [ 0.512 ] < 0.512 >
CHN 8 8 COL 11 9

This is as even steven as betting lines get.  Each team is equal with a probability of 51.2%.  We aren’t ready to look at current year data but with the three year split Hendricks (+13.1) is far superior to Gray (-1.1).  Eyeballing this Hendricks would be considered Tier 1, Gray perhaps Tier 4.  With the hitting and relief about the same CHN should have had a clear advantage yet they have that advantage today with the market instead.

The market on the opening game of this series didn’t know what was going to happen.  Hendricks pitching in the thin air of Colorado would have more issues than a normal pitcher because the balls don’t break as much.  He did get shelled for 3 runs the first inning but recovered after that.

The DeltaWAA expected probability made yesterday’s game an even steven match.  Hendricks would have moved the needle in favor of the Cubs as well as the lineup/starter/relief combos.  Would it have moved it to 58.2% in favor of the Cubs?  Can’t tell.  We can only look at career data for April games and don’t have that modeled.   Next month we’ll have better numbers to analyze.

That is all for now.  The next part to the career series coming and a multi-part series to explain OPS and why it’s such deceptive stat.  Until then….


Cubs Cardinals Matchup

45F is way too cold to play baseball but apparently they’ll try and get this game over with today after two postponed games.  We’re almost at the point in the season where we can do team status but let’s skip that for now.  We’ll focus on current SLN players the Cubs will face for the rest of the season.

First let’s look at what the Ouija Board says:

DATE 04_19 2:20_PM SLN CHN
LINEAWAY SLN [ 0.505 ] < 0.481 >
LINEHOME CHN [ 0.519 ] < 0.541 >
SLN 10 7 CHN 7 8

The Cards are 10-7, Cubs 7-8, and the DeltaWAA is +4 in favor of the Cardinals.  Per our lookup table this gives SLN a  54.2% advantage today with no other information.  The Cubs have one of their best starters pitching and the Cardinals have a relatively new guy who pitched 60 innings last season.  The above shows the betting market may have soured on the Cubs a bit as Lester usually commands 60% + for home games.  We’ll have better trajectories on this as the season progresses this year.

A DeltaWAA of 0.542 almost makes SLN a betting opportunity today being at 0.481.  We need a margin of at least 0.07 so it doesn’t quite cut it plus we don’t have a firm grasp of  how these players are performing this season.  That information is needed to push the DeltaWAA needle in either direction.  Player rankings can start around May 7.

Let’s drill down into the Cardinals’ opening day roster and check out the last 3 year career splits.  According to the table we posted a few days ago, SLN is middle of the pack of 30 teams.  Here are their total numbers.

TeamID Hitters Pitchers Starters Relief Total W-L
SLN 15.2 4.1 8.7 -4.6 19.3 0

The Cardinals are infamous for rotating new guys from their farm system and then dominating the league.  They are perennial contenders because of this.  See this post written here during the 2013 playoffs.

Career numbers don’t tell you the potential of the new guys.  New guys start at zero.  They add nothing to the Total and subtract nothing to the Total.  Everyone starts at zero.  These career numbers show strength teams have in veterans and newly established players.

That said, let’s get to it and look at the Cards’ opening day roster.

April 2018 SLN Starters

Rank WAA Name_TeamID Pos
+021+ 12.2 Carlos_Martinez_SLN SP
+XXXXX+ 0.0 Michael_Wacha_SLN SP
+XXXXX+ -0.7 Luke_Weaver_SLN SP
+XXXXX+ -2.8 Adam_Wainwright_SLN SP
Total 8.7

Again, these are 2015-2017 career splits *not* total career value. What have you done for me lately. I keep panicking thinking there is a bug in the code scanning these results but they are correct. Wainwright has a very high positive total career value. He just had a bad run these last 3 years as well as Wacha. Michael Wacha was MVP of NLCS in 2013, when we first started doing all of this.

April 2018 SLN Relievers

Rank WAA Name_TeamID Pos
+XXXXX+ 3.0 Tyler_Lyons_SLN RP
+XXXXX+ 1.7 Sam_Tuivailala_SLN RP
+XXXXX+ 1.6 Matthew_Bowman_SLN RP
+XXXXX+ 0.4 Ryan_Sherriff_SLN RP
+XXXXX+ -0.1 Dominic_Leone_SLN RP
+XXXXX+ -3.6 Mike_Mayers_SLN RP
+XXXXX+ -7.5 Bud_Norris_SLN RP
Total -4.5

Not familiar with the above players.  Lyons is home grown SLN  who started in 2013.  The nice thing about having high negative guys like Norris and Mayers is if they don’t pitch well they get replaced raising total relief value, if they pitch well that also raises total relief value.  In May we can do these roster tables using current year data.

Edit:  Greg Holland is pitching for SLN and not listed above.  The rosters used were from the beginning of the season.  Holland is +1.1 for 2015-2017 (missed 2016) and came to SLN from Colorado.  Next time we do this matchup we’ll have current rosters.  Probably missing a starter as well.

April 2018 SLN Hitters

Rank WAA Name_TeamID Pos
+064+ 7.6 Matt_Carpenter_SLN IF
+110+ 5.5 Tommy_Pham_SLN OF
+123+ 5.0 Marcell_Ozuna_SLN OF
+XXXXX+ 2.4 Dexter_Fowler_SLN OF
+XXXXX+ 2.2 Jose_Martinez_SLN IF
+XXXXX+ 1.6 Paul_DeJong_SLN IF
+XXXXX+ -0.2 Harrison_Bader_SLN OF
+XXXXX+ -2.4 Kolten_Wong_SLN IF
+XXXXX+ -2.7 Yadier_Molina_SLN C
+XXXXX+ -4.0 Greg_Garcia_SLN IF
Total 15.0

And that about sums up  the team the Cardinals entered this season with.  They are currently playing very well — as they usually do every season.  Hopefully the Cubs can keep up with them — as they have the last three seasons.

That is all for now.  Total MLB career rankings coming soon and first Cubs team status next week — without player rankings.  Hopefully we get some Spring weather here!!!!!  Until then….

Career Rankings Part 4

Today we’ll go back 2 years to the beginning of the 2015 season.  At the beginning of the 2016 season, the season which the Cubs won a World Series, they were ranked middle of the pack of 30 teams

The Cubs were last of 30 teams based upon career data at the start of the 2015 season.  They were probably last, or close to last the last 5 seasons before that too but that’s all water under the bridge now.  How did the Cubs go from last in valuation at the start of a season to making the playoffs that season?  First let’s look at a truncated table showing the top 5 and bottom 5 teams in April of 2015.

April 2015 Team Career Valuation

TeamID Hitters Pitchers Starters Relief Total W-L
DET 51.2 32.2 27.5 4.8 83.5 0
WAS 10.5 67.2 47.1 20.1 77.7 0
ANA 40.7 28.3 12.0 16.4 69.0 0
SEA 22.5 44.7 30.1 14.6 67.2 0
SLN 36.8 26.4 20.2 6.2 63.2 0

TBA -0.4 5.5 -2.3 7.9 5.1 0
PHI -6.4 11.1 2.0 9.1 4.7 0
MIN 16.8 -18.7 -23.4 4.7 -1.9 0
HOU -5.0 -2.2 -13.8 11.6 -7.2 0
CHN -12.6 -9.0 -15.0 6.0 -21.6 0

The above are sum of career value from 2012-2014 of players on each team’s opening day roster for 2015.  Notice how Detroit is #1 in 2015 but this season, right now, they’re at the bottom like the Cubs were in 2015.  HOU is also at the bottom in 2015 and now at the top.  Both these bottom two teams wins a World Series in the next 3 years!  This is quite a switcheroo showing fortunes can change, good and bad, for a team in only a few years.

Let’s see who the Cubs had pitching in April that year.

April 2015 CHN Starters

Rank WAA Name_TeamID Pos
+188+ 3.1 Jon_Lester_CHN SP
XXXXX 2.3 Kyle_Hendricks_CHN SP
XXXXX -1.3 Jason_Hammel_CHN SP
XXXXX -2.3 Travis_Wood_CHN SP
XXXXX -3.4 Jake_Arrieta_CHN SP
XXXXX -13.5 Edwin_Jackson_CHN SP
Total -15.1

Lester was their big off season acquisition and WAA=3.1 was his 2012-2014 split.  The Cubs starting rotation was saddled with Edwin jackson who was one of Theo’s (we’ll spare Jed on that one :) first acquisitions as a Cub.  Just cutting Jackson greatly increases their starter value.  Joe Maddon makes Jackson a reliever and then shortly after they cut ties with him.

April 2015 CHN Relievers

Rank WAA Name_TeamID Pos
+170+ 3.5 Pedro_Strop_CHN RP
XXXXX 2.3 Neil_Ramirez_CHN RP
XXXXX 1.5 Jason_Motte_CHN RP
XXXXX 0.8 Hector_Rondon_CHN RP
XXXXX -0.6 Brian_Schlitter_CHN RP
XXXXX -1.5 Phil_Coke_CHN RP
Total 6.0

Pedro Strop had the highest 3 year split of any Cub starting the season in 2015.

EDIT:  Anthony Rizzo (below) has the highest career 3 year split at the start of 2015.

April 2015 CHN Hitters

Rank WAA Name_TeamID Pos
+120+ 4.9 Anthony_Rizzo_CHN 1B
XXXXX 1.1 Jorge_Soler_CHN RF
XXXXX 0.7 Miguel_Montero_CHN CR
XXXXX 0.5 Dexter_Fowler_CHN CF
XXXXX 0.0 Mike_Olt_CHN BAT
XXXXX -0.3 Matt_Szczur_CHN LF
XXXXX -0.3 Arismendy_Alcantara_CHN BAT
XXXXX -1.1 David_Ross_CHN CR
XXXXX -2.4 Tommy_La_Stella_CHN 2B-3B
XXXXX -3.2 Chris_Coghlan_CHN LF-RF-2B
XXXXX -3.3 Jonathan_Herrera_CHN 2B-3B
XXXXX -4.1 Welington_Castillo_CHN BAT
XXXXX -5.1 Starlin_Castro_CHN SS-2B
Total -12.6

Those are all three year splits and may not be reflective of their overall careers.  These last three tables verify the sums in the total table and show how it was tabulated.  You should not read this that Mike Olt is better than David Ross.   This model measures offensive production and catchers are the most important defensive fielding asset on the field.  They’re involved in every play. This model is limited to showing value derived from generating or not generating runs.  The defensive value of a catcher is outside the scope of this data model.  For more thoughts on defensive related positions see our All Star picks article last July.

Values that hover around 0 are not that meaningful in the context of evaluating a player.  It shows they haven’t done much above or below average.  Sometimes an average player is useful for other purposes a manager may need — like pinch running and being a fast guy in the outfield who can run down errant fly balls late in a close game.  Mike Olt, should be hitting well above average as 3B is usually a productive position on most playoff contending teams.

Kris Bryant comes up from the Iowa later and replaces Mike Olt.  Jake Arrieta starts his Cy Young award winning performance mid June, and Maddon gets everyone to click.

This kind of report will be available for any team any year.  You’ll be able to look up to see how the Cubs or Detroit ranked based upon 3 year career splits at the start of 1935 or 1945 or whatever year.  Once we compile all the years I’ll run some numbers to see how well these rankings predict the end of season results.  As always, past results do not affect future results, they only show capability.  It is important however to have an accurate evaluation of past results.  Much of Sabermetrics is far from accurate.

Enough of this table.  In Part 5 we’ll look at top MLB careers from 1900 – present.  We have all 15,000+ players ranked from top to bottom but we only assign rank to the top 1000.   Until then….

Career Rankings Part 3

Since we don’t have current year data to crunch until next month, career numbers are all we have to look at.  In Part 2 of this series we ranked teams based  upon opening day rosters.  Each career only included last three years which was sum of WAA value for  seasons 2015-2017.  In Part 3 we will look at opening rosters of the 2017 season and use seasons 2014-2016 for each player’s valuation.   Players get categorized as relief, starter, and hitter and everything adds to give a team total.

We have to estimate historical rosters using our daily snapshots taken from retrosheet.org event data.  The code was already written to estimate the changing  team relief squads  each day, each year for our lineup/starter/relief simulations.  We take a snapshot on April 12 and assume every player has made an appearance.  Then separate them in their role and team, add them up, and sort.

Since we are from the future when this table could have been made we can predict it.

Note:  Unfortunately there are a lot of numbers in this table and ironically this data model is about consolidating baseball statistics.  We’ll walk though it after the fold.   There is no other way to present this.

TeamID Hitters Pitchers Starters Relief Total W-L
CHN 27.9 89.8 57.6 32.2 117.7 0
TOR 61.2 35.3 23.9 11.4 96.5 0
LAN 25.9 49.1 39.7 9.4 75.0 0
CLE 31.3 43.2 18.4 24.8 74.5 0
NYN 36.7 34.1 28.8 5.2 70.7 0
SFN 19.1 51.1 26.5 24.5 70.2 0
WAS 35.4 33.9 21.9 12.0 69.3 0
BOS 35.5 30.2 25.3 4.9 65.7 0
HOU 23.0 31.9 11.4 20.5 54.8 0
SLN 9.2 35.5 22.5 13.0 44.7 0
TEX 13.3 30.1 10.9 19.2 43.4 0
BAL 26.3 14.2 -14.6 28.8 40.5 0
NYA 5.7 33.5 7.2 26.3 39.2 0
COL 34.9 -3.3 -0.2 -3.0 31.6 0
DET 23.4 7.9 12.1 -4.2 31.3 0
SEA 14.6 16.5 16.6 -0.2 31.1 0
CHA 11.4 19.0 4.5 14.5 30.4 0
OAK 14.4 15.6 1.7 13.9 30.0 0
TBA 0.6 20.0 11.2 8.9 20.7 0
KCA 2.0 17.7 12.0 5.7 19.7 0
PIT 8.0 7.9 4.3 3.6 15.9 0
MIL 6.8 4.1 -5.0 9.1 10.9 0
ARI 17.2 -6.6 6.4 -13.0 10.6 0
MIA -7.5 6.5 -2.3 8.8 -0.9 0
ANA 9.6 -12.1 -2.1 -9.9 -2.5 0
MIN 6.3 -11.2 -6.5 -4.8 -5.0 0
ATL -4.7 -0.5 12.3 -12.8 -5.2 0
CIN -2.3 -4.0 -0.8 -3.2 -6.3 0
PHI -14.6 5.3 -4.7 10.0 -9.3 0
SDN -5.8 -21.5 -21.2 -0.2 -27.3 0

The colored teamids are teams that will make the playoffs this year.  Since we are from the future we know HOU wins the World Series beating LAN with CHN and NYA as DS winners.  NYA and COL are middle of the pack so the top half of this chart picked 8/10 teams who made the playoffs with MIN and  ARI as outliers.

This table is sorted by Total of all career value between 2014-2016 for each team.  The blue bold highlight numbers are the leader in each category.  The Cubs clearly dominate in all categories except hitting.  Hitting will be a big problem for them all the way up to All Star Break.  We know this because we are from the future and we will write about it every day.

Not going to get into what this chart might say or might not say.  SFN turned out to be one of the worst teams in baseball yet they have high value.   The Cubs had a very good above average run between 2014-2016 and they kept the good guys and acquired even more good guys.  Does that mean they had the best team in April?  Apparently not!

In this part we’ll drill down into the Cubs and check their numbers.  This will be streamlined in subsequent parts as we go farther back in time.  First let’s look at CHN starters and relievers.

2017 CHN Starters

Rank WAA Name_TeamID Pos
+004+ 20.6 Jake_Arrieta_CHN SP
+008+ 17.2 Jon_Lester_CHN SP
+034+ 11.1 Kyle_Hendricks_CHN SP
+055+ 8.8 John_Lackey_CHN SP
XXXXX -0.0 Brett_Anderson_CHN SP
Total 57.6

The above Total number is what you see in the Starter column for CHN in the team ranking table above.  The Rank is based upon 2014-2016 career value.  Jake Arrieta had a good run these last three years and is ranked 4th in MLB of all 30 teams, both pitchers and batters ranked together.

2017 CHN Relievers

Rank WAA Name_TeamID Pos
++028++ 11.7 Wade_Davis_CHN RP
++098++ 6.3 Hector_Rondon_CHN RP
++118++ 5.1 Pedro_Strop_CHN RP
++154++ 4.2 Koji_Uehara_CHN RP
XXXXX 2.3 Mike_Montgomery_CHN RP
XXXXX 2.2 Justin_Grimm_CHN RP
XXXXX 0.3 Carl_Edwards_CHN RP
Total 32.1

Wade Davis was the big acquisition in the off season that year.  He turns out to be very useful this season and this relief squad kept the Cubs in contention at All Star Break.

2017 CHN Hitters

Rank WAA Name_TeamID Pos
++018++ 13.9 Anthony_Rizzo_CHN 1B-2B
++027++ 11.8 Kris_Bryant_CHN 3B
++140++ 4.6 Ben_Zobrist_CHN 2B-LF-RF
++172++ 3.6 Kyle_Schwarber_CHN LF
++192++ 3.0 Addison_Russell_CHN SS
XXXXX 0.5 Willson_Contreras_CHN CR
XXXXX 0.2 Matt_Szczur_CHN LF-CF-RF
XXXXX 0.2 Albert_Almora_CHN CF
XXXXX -0.2 Miguel_Montero_CHN CR
XXXXX -0.4 Javier_Baez_CHN 2B-SS
XXXXX -2.3 Jason_Heyward_CHN RF-CF
XXXXX -3.6 Tommy_La_Stella_CHN 2B-3B
Total 27.9

Hitting very good but it becomes a problem first half of the season.  For those reading from the present, below is a post made during All Star Break from this season.  All teams throughout the season move and acquire players.   Career value may not make any sense in the context of April baseball games.

No matter how good a player was the last three years, the MLB Baseball Commissioner requires that he play and prove himself again.  Many players do it over and over for a very long time, many don’t.  In the next part we’ll quickly run through a bunch of opening day roster career value years and then we’ll bring guys like Babe Ruth and Cy Young into the mix and see how well they scored here.   Until then….

Note: I had to double check Heyward’s number above.  He has a very above average career.  His 2016 value dragged him underwater on the three year split (2014-2016).

Cubs Pirates Matchup and Opening Day

Apparently the game has been postponed until tomorrow.  Tomorrow will be around 50F but they should play the double header Wednesday when it will be in the 60s.  Whatever.  We don’t have enough current data to make any analysis.  According to the team career chart the Cubs are near the top and the Pirates are near the bottom.   If the Pirates have a lot of young guys ready to break out career totals won’t tell the story.  Let’s see what the Ouija board has to say.

DATE 04_09 3:20_PM PIT CHN
LINEAWAY PIT [ 0.397 ] < 0.426 >
LINEHOME CHN [ 0.618 ] < 0.600 >
PIT 6 2 CHN 4 4

The code is pulling in current year wins and losses but there is not enough playing time where that makes a difference.  The Cubs are favored around 3/2.  You’ll need to think they have a better than 60% chance of winning today to bet the Cubs, 42.6% for the Pirates.  Sheer home field advantage is 56%.

Without any current year data let’s look at career data of starters and relievers the Cubs will face these next few days.  The career data below is sum based upon 2015-2017 seasons.

Rank WAA Name_TeamID Pos
+XXXXX+ 1.60 Jameson_Taillon_PIT SP
+XXXXX+ -0.00 Chad_Kuhl_PIT SP
+XXXXX+ -0.10 Trevor_Williams_PIT SP
+XXXXX+ -0.80 Steven_Brault_PIT SP
+XXXXX+ -1.40 Ivan_Nova_PIT SP
Total -0.70

Nova may be their worst starter these last three years but most experienced.  Pittsburgh will need him to perform if they have any chance this season.

Rank WAA Name_TeamID Pos
+090+ 6.20 George_Kontos_PIT RP
+092+ 6.10 Felipe_Rivero_PIT RP
+XXXXX+ 0.40 Edgar_Santana_PIT RP
+XXXXX+ 0.30 Dovydas_Neverauskas_PIT RP
+XXXXX+ -1.20 Josh_Smoker_PIT RP
+XXXXX+ -2.70 Michael_Feliz_PIT RP
+XXXXX+ -4.90 Tyler_Glasnow_PIT RP
Total 4.20

Relief not bad.  If some of their bottom guys get washed out of the league their relief staff will improve.  If they don’t wash out and pitch well this season their RP will also improve.

Rank WAA Name_TeamID Pos
+019+ 13.00 Kyle_Hendricks_CHN SP
+028+ 10.80 Jon_Lester_CHN SP
+050+ 8.50 Jose_Quintana_CHN SP
+153+ 4.00 Yu_Darvish_CHN SP
+XXXXX+ 0.20 Tyler_Chatwood_CHN SP
Total 36.50

Very good starting staff.  Darvish was out of service in 2015.

Rank WAA Name_TeamID Pos
+117+ 5.20 Mike_Montgomery_CHN RP
+121+ 5.10 Pedro_Strop_CHN RP
+129+ 4.90 Steve_Cishek_CHN RP
+147+ 4.20 Brandon_Morrow_CHN RP
+XXXXX+ 2.60 Justin_Wilson_CHN RP
+XXXXX+ 2.40 Carl_Edwards_CHN RP
+XXXXX+ 2.00 Brian_Duensing_CHN RP
+XXXXX+ -7.40 Eddie_Butler_CHN RP
Total 19.00

As always Maddon likes a decent relief staff.  Relievers carried the Cubs the first half of last season.  Eddie Butler had a rough 2015 and 2016 but pitched a little above average last season and pitched well for the Iowa Cubs.  If he figured out how to pitch in MLB the Cubs RP is even better.

That is all for now.   In the upcoming days we’re going to crunch more career snapshots from past seasons to hopefully provide perspective as to how predictive they are.  Every year a new crop of superstars emerges from the minors with a career WAA=0.  Everyone starts out at zero.  Until then….