Featured post

About this site

This site is a public logbook on the development of a baseball data model that measures baseball player value and ranks them from best to worst.  This model contains the current 30 MLB franchises, their minor league affiliates, and their historical teams.   It covers all seasons and all players from 1900 – 2017.

Browse the Table of Contents for more information.  We covered the 2017 season extensively.  Not much published here in 2016 even though the Cubs won and it has been sporadic the years before starting in September 2013.

The goal of this data model is to become an app that user can quickly evaluate a player being talked without knowing anything about baseball.   They can then become the smartest person in the room about that player.  There will be a handicapping component but that is a work in progress and hasn’t been proven.  We have a solid proof for the WAA measure, something WAR does not have.

Baines Morris and Lee

Recently some idiot tweeted this reply:


That idiot turned out to be me.  There is apparently a lot of controversy over this Harold Baines pick as well as Morris and Lee Smith.  Let’s break these three down according to this data model.  When I made the above tweet reply I hadn’t looked up any of these players in this model which was a mistake.

tl;dr Baines should be in, Smith a big maybe with caveats, and Morris should not.

Let’s start this diatribe off with a blurb from the Daily Herald.

Baines never drew more than 6.1 percent in five elections by the Baseball Writers’ Association of America, far from the 75 percent required. In the key WAR stat compiled by baseballreference.com, Baines’ lifetime total was tied for 545th.

Yes, 545th.

The first error in the above is WAR does not have additive properties as has been proven here over and over again.  You can certainly add yearly WAR numbers together but the total has little meaning in most cases and most certainly should not be used for ranking purposes.   The WAA value according to this data model has proven additive properties.  This is how we simulate, seasonal rank both players and teams, and how we rank careers.

That said, Baines is ranked 545 in the WAR system.  According to the Google there are 222 MLB players in HOF.  WAR rank as described above would put Baines well out of that range.  One would think future HOF players need to break into the 200s to qualify.  Maybe that number becomes relaxed as years go by.  Perhaps top 250 would be fine for today.

As of the end of 2017, this model has the Baines career ranked #249 from the top, the top being Babe Ruth.  According to the compilation made by this model at the beginning of 2014 Baines was ranked 236 so 13 players exceeded his career number in 4 years.

Note added 12/11: Following the 2014 compilation link above leads to an html output of his career made at the time.  Back then this model ranked the top 300 and bottom 300, everyone else unranked.  Since then this model has moved to top 200 and bottom 200.  The ranking score in those tables was used to directly compare WAR results with WAA results.  The rank of a value system means more than the weighting number used to rank. </end of Note>

Let’s look at Harold Baines’ career.

Year WAA Name_TeamID Pos Rank
1980 -1.3 Harold_Baines_CHA RF -184-
1981 2.4 Harold_Baines_CHA RF +061+
1982 5.2 Harold_Baines_CHA RF +027+
1983 3.3 Harold_Baines_CHA RF-CF +067+
1984 3.1 Harold_Baines_CHA RF +076+
1985 4.6 Harold_Baines_CHA RF +036+
1986 1.9 Harold_Baines_CHA RF +130+
1987 2.0 Harold_Baines_CHA DH +139+
1988 -0.9 Harold_Baines_CHA DH-RF XXXXX
1989 2.9 Harold_Baines_CHA DH-RF +114+
1989 -0.6 Harold_Baines_TEX DH +114+
1990 0.5 Harold_Baines_TEX DH +198+
1990 0.7 Harold_Baines_OAK DH +198+
1991 4.5 Harold_Baines_OAK DH-RF +046+
1992 2.3 Harold_Baines_OAK DH-RF +122+
1993 3.4 Harold_Baines_BAL DH +087+
1994 0.9 Harold_Baines_BAL DH XXXXX
1995 1.1 Harold_Baines_BAL DH XXXXX
1996 3.7 Harold_Baines_CHA DH +092+
1997 0.5 Harold_Baines_CHA DH XXXXX
1997 -0.6 Harold_Baines_BAL DH XXXXX
1998 1.9 Harold_Baines_BAL DH +184+
1999 4.2 Harold_Baines_BAL DH +074+
1999 0.4 Harold_Baines_CLE DH +074+
2000 -1.2 Harold_Baines_BAL DH -172-
2000 -0.7 Harold_Baines_CHA DH -172-
2001 -1.5 Harold_Baines_CHA DH XXXXX
Total 42.7

Edit for clarification: Since WAA has additive properties it can be calculated for each team a player played for.  The sum of a team’s player WAA should translate into their actual win/loss record.  If a player played for multiple teams in a season the script that makes the above show WAA for each team.  Added together for the year is used for ranking purposed (last column).

That’s 21 years in MLB.  Highlighted in blue are years he made top 100.  His career high was 1982.   I played Rotisserie baseball leagues in the late 80s.  Baines may have been a top tier player in bidding during those years, another mistake in my tweet.  He was a pretty consistent player and didn’t put up many negative years until the 2000s and had many positive years.

IMHO this is a border line case based upon rank and longevity.   Let’s look at Jack Morris.

Year WAA Name_TeamID Pos Rank
1977 0.3 Jack_Morris_DET PITCH XXXXX
1978 -1.7 Jack_Morris_DET PITCH -142-
1979 3.6 Jack_Morris_DET PITCH +066+
1980 -2.1 Jack_Morris_DET PITCH -092-
1981 2.4 Jack_Morris_DET PITCH +057+
1982 -1.1 Jack_Morris_DET PITCH XXXXX
1983 3.7 Jack_Morris_DET PITCH +054+
1984 1.0 Jack_Morris_DET PITCH XXXXX
1985 3.3 Jack_Morris_DET PITCH +070+
1986 4.5 Jack_Morris_DET PITCH +038+
1987 5.5 Jack_Morris_DET PITCH +027+
1988 -1.1 Jack_Morris_DET PITCH XXXXX
1989 -4.3 Jack_Morris_DET PITCH -018-
1990 -3.5 Jack_Morris_DET PITCH -022-
1991 2.8 Jack_Morris_MIN PITCH +098+
1992 -1.4 Jack_Morris_TOR PITCH -176-
1993 -7.1 Jack_Morris_TOR PITCH -002-
1994 -3.6 Jack_Morris_CLE PITCH -026-
Total 1.2

Highlighted in blue are his top 100 appearances, in red his bottom 100 appearances.  Morris was an up and down pitcher and ended his career almost completely average.   He wouldn’t be ranked by this model and completely out of HOF consideration other than perhaps his longevity and personality.  His worst year was 1993 and best year 1987.

Update: Morris had 7 top 100 years.  If you can count all your best holes on the golf course then perhaps he should be in:-)

Relief pitching has always been, and still is, underrated by all baseball stats.  Fantasy baseball leagues use saves as one of their metrics for winning so closers tend to get most of the glory for what little glory relievers get.  Lee Smith was a closer and IIRC, he was highly valued in Rotisserie leagues because of his role racking up saves.

A player who helps a fantasy baseball team may not help as much on their real team.  Modern baseball stats cater to players who help the their fantasy teams.  Let’s look at Lee Smith’s career according to this data model.

Year WAA Name_TeamID Pos Rank
1980 0.4 Lee_Smith_CHN PITCH XXXXX
1981 0.2 Lee_Smith_CHN PITCH XXXXX
1982 3.3 Lee_Smith_CHN PITCH +079+
1983 5.5 Lee_Smith_CHN PITCH +019+
1984 0.5 Lee_Smith_CHN PITCH XXXXX
1985 2.0 Lee_Smith_CHN PITCH +131+
1986 1.9 Lee_Smith_CHN PITCH +133+
1987 2.4 Lee_Smith_CHN PITCH +111+
1988 2.0 Lee_Smith_BOS PITCH +133+
1989 0.2 Lee_Smith_BOS PITCH XXXXX
1990 0.7 Lee_Smith_BOS PITCH +054+
1990 2.8 Lee_Smith_SLN PITCH +054+
1991 2.8 Lee_Smith_SLN PITCH +100+
1992 1.0 Lee_Smith_SLN PITCH XXXXX
1993 -0.4 Lee_Smith_SLN PITCH XXXXX
1993 0.8 Lee_Smith_NYA PITCH XXXXX
1994 1.2 Lee_Smith_BAL PITCH +198+
1995 1.1 Lee_Smith_ANA PITCH XXXXX
1996 0.6 Lee_Smith_ANA PITCH XXXXX
1996 0.6 Lee_Smith_CIN PITCH XXXXX
1997 -0.7 Lee_Smith_WAS PITCH XXXXX
Total 28.9

According to this data model Lee Smith ranks #488 as of the end of 2017.  That isn’t high enough for HOF but maybe it’s high enough with respect to relief pitching.  He qualifies for longevity at 17 years but had many average unranked years.  HOF is for players who excel.

In the context of relief pitching Smith might be ranked high enough to qualify.  I don’t know.

Of these three picks described above, Harold Baines should be the least controversial.

An Important Moment in Baseball History Captured in a Panoramic Photo

Every U.S. president from William Howard Taft to John F. Kennedy tossed out a ceremonial first pitch from the ballpark’s stands. This was even true on that first Opening Day in 1911. Indeed, somewhere in our mystery photo, Taft is sitting in the stands, enjoying the ball game. Thankfully, his first pitch was captured in a different photo published in the Evening Star the following day:

Source: Baseball Researcher: An Important Moment in Baseball History Captured in a Panoramic Photo

Top MLB Players for 2016, 2017, 2018

Here is a table dump of the top 25 MLB players according to the accumulated WAA from the last three years.

2018 Rank 3 Year WAA Name_TeamID Pos
+006+ 23.20 Max_Scherzer_WAS PITCH
+013+ 23.12 Nolan_Arenado_COL 3B
+026+ 21.86 Clayton_Kershaw_LAN PITCH
+017+ 21.40 Corey_Kluber_CLE PITCH
+001+ 20.08 Jacob_deGrom_NYN PITCH
+002+ 19.53 J.D._Martinez_BOS DH-LF-RF
+007+ 19.32 Justin_Verlander_HOU PITCH
+010+ 19.00 Mookie_Betts_BOS RF-CF
+011+ 18.52 Chris_Sale_BOS PITCH
+023+ 17.14 Edwin_Encarnacion_CLE DH-1B
+009+ 16.99 Khris_Davis_OAK DH-LF
+037+ 16.51 Giancarlo_Stanton_NYA DH-RF-LF
+092+ 16.09 Kyle_Hendricks_CHN PITCH
+075+ 15.41 Charlie_Blackmon_COL CF
+044+ 15.16 Mike_Trout_ANA CF-DH
+147+ 14.85 Paul_Goldschmidt_ARI 1B
+027+ 14.68 Bryce_Harper_WAS RF-CF
+076+ 14.24 Nelson_Cruz_SEA DH
+091+ 13.23 Anthony_Rizzo_CHN 1B
+005+ 12.77 Christian_Yelich_MIL LF-RF-CF
+111+ 12.52 Madison_Bumgarner_SFN PITCH
+070+ 12.07 Aaron_Judge_NYA RF-DH
+016+ 12.03 Jose_Ramirez_CLE 3B-2B
+003+ 11.89 Blake_Snell_TBA PITCH
+032+ 11.80 Anthony_Rendon_WAS 3B

Highlighted are the only two current free agents in the top 25.  Not much more to say except WAR is probably vastly different.  :-)  The above is only last three year split.  Total career rankings will be different.  The snapshot for up to and including 2017 can be found here.

Free Agent Class of 2019

The official MLB site, mlb.com released a list of the free agent class of 2019.  Let’s run that list through this data model.

There are many ways to sort this list and calculate career numbers.  During the season this model ranks players according to total WAA value for that season.  Number of years can differ among free agents.  It was decided to simply use WAA valuations for the last 3 seasons.   This is a pretty good indicator showing what have you done for me lately.

The below table shows the top 25 free agents sorted by total WAA for their last three year splits.  The first column shows rank for the 2018 season which just concluded.

Top 2019 Free Agents

2018 Rank 3 Year WAA Name_TeamID Pos
+027+ 14.68 Bryce_Harper_WAS RF-CF
+076+ 14.24 Nelson_Cruz_SEA DH
XXXXX 11.51 Daniel_Murphy_TOT 2B-1B
XXXXX 9.39 Brian_Dozier_TOT 2B
XXXXX 9.37 Josh_Donaldson_TOT 3B-DH
+158+ 9.03 J.A._Happ_TOT PITCH
XXXXX 8.97 Andrew_Miller_CLE PITCH
+083+ 8.27 Manny_Machado_TOT SS-3B
XXXXX 8.02 Zach_Britton_TOT PITCH
XXXXX 7.67 Adrian_Beltre_TEX 3B-DH
+163+ 7.48 Craig_Kimbrel_BOS PITCH
+148+ 6.91 Carlos_Gonzalez_COL RF
XXXXX 6.87 Mark_Reynolds_WAS 1B-3B
XXXXX 6.72 Brad_Brach_TOT PITCH
+135+ 6.72 Matt_Adams_TOT 1B-LF
-093- 6.57 Ervin_Santana_MIN PITCH
XXXXX 6.49 David_Robertson_NYA PITCH
+051+ 6.38 Charlie_Morton_HOU PITCH
+119+ 6.24 Evan_Gattis_HOU DH
+181+ 5.02 Dallas_Keuchel_HOU PITCH
+049+ 4.93 Hyun-Jin_Ryu_LAN PITCH
+199+ 4.60 Jeurys_Familia_TOT PITCH
+154+ 4.47 DJ_LeMahieu_COL 2B
XXXXX 4.45 Gio_Gonzalez_TOT PITCH
+194+ 4.39 Kelvin_Herrera_TOT PITCH

There is a lot more that goes into the calculus of deciding upon a free agent than simple run production.  Daniel Murphy who the Cubs acquired is ranked 3rd above but had a mediocre season according to his 2018 rank.  Does he have anything left in the tank going forward?   Ditto for Josh Donaldson,  Brian Dozier, and Andrew Miller.  This data model cannot predict the future.  The above simply shows a factual representation of how many wins each player contributed to their teams the last three seasons.

And as always, past results don’t affect future results.  They only show capability.  That is all for now.   A Giancarlo Stanton article is forthcoming.   Also working on historical daily simulations to compare them with historical vegas lines and historical Nate Silver predictions.  Until then ….

2018 World Series Report Part 5

With Boston up 3-1 this could be the last World Series report until 2019.

DATE 10_28_8:15_PM BOS LAN

LINEAWAY BOS [ 0.465 ] < 0.435 > +130 $230
STARTAWAY 2.21(0.556) David_Price_BOS TIER 3
LINEHOME LAN [ 0.556 ] < 0.583 > -140 $171
STARTHOME 5.02(0.640) Clayton_Kershaw_LAN TIER 1

BOS Lineup 1 ==> LAN Starter 1 / Relief 2 == 0.481 BOS 4.45 runs
LAN Lineup 1 ==> BOS Starter 3 / Relief 3 == 0.519 LAN 4.65 runs

Tier Combo 111 89
Home Field 106 92

Kershaw last pitched against Chris Sale in Boston for Game 1 so this match up is different.  Lines about the same as yesterday but TC Simulation has Dodgers favored at 0.519 break even probability.  This lowers Expected Value for Boston to 111 on a 100 risk which is close to their basic Home Field disadvantage Expected Value.

This could be the last game of the season.  If not, part 6 will be forthcoming.

Update:  It looks like Nate Silver’s model agrees.  Here’s a snapshot in case that link gets broken.


Update2: I’m an idiot again! It might be dyslexia but read the above table opposite of what it says. Nate had BOS at 52%.  TC Simulation had Dodgers at 52%.  Nate’s model would have generated an Expected Value for Boston at 120 on a 100 bet making it a betting opportunity.  Since we’re from the future it would have been a successful betting opportunity.  Home Field Expected Value would still be 106 however and how that factors into all of this is still a work in progress.

More historical analysis between vegas lines, TC simulation, and The above coming soon in the off season ( like now ).  This will start with 2011 – 2017.  Current year event files are needed for this analysis and retrosheet.org usually releases them mid December some time.   </ end of update>