Featured post

About this site

This site is a public logbook on the development of a baseball data model that measures baseball player value and ranks them from best to worst.  This model contains the current 30 MLB franchises, their minor league affiliates, and their historical teams.   It covers all seasons and all players from 1900 – 2017.

Browse the Table of Contents for more information.  We covered the 2017 season extensively.  Not much published here in 2016 even though the Cubs won and it has been sporadic the years before starting in September 2013.

The goal of this data model is to become an app that user can quickly evaluate a player being talked without knowing anything about baseball.   They can then become the smartest person in the room about that player.  There will be a handicapping component but that is a work in progress and hasn’t been proven.  We have a solid proof for the WAA measure, something WAR does not have.

The Prediction Racket Part 1

The DH series requires historical event data to be compiled for 2018.  Since I have once again been lazy this off season I have been reluctant to revisit those scripts because 1) I may have forgotten how they work and 2) scripts always seem to break when sitting around for a year.

Historical scripts need to be rewritten.  The historical dataset is the foundation for simulation and required to prove or disprove their accuracy.   That’s why today I was pleasantly amused to find some distraction by this tweet.


Troll level for @No_Little_Plans :  Expert!

What better way to get back into the baseball season than arguing over valuation systems and Rob delivers.  The first rule of the Prediction Racket is:

No one can predict the future

Unless you’re a time traveler from the future no one can possibly know what will happen.  Even time travelers can alter their future by affecting something in the past which is called the Butterfly Effect.  Second rule of the prediction racket:

Past results do not affect future results

If you roll a 6 three times in a row it doesn’t mean it’s more or less likely that you’ll roll a 6 again.  The probability is exactly the same no matter what happened in the past.  If a slot machine hasn’t paid out in a long time that doesn’t make it more likely to pay out no matter what compulsive gamblers want to believe.

Third rule in the Prediction Racket:

Use the past as a template for the future

The third rule is more of a how to for those interested in being part of the Prediction Racket.  The first step is take standings from last season and adjust up or down the teams.  If you’re wrong no one will remember.  If you’re right you make sure everyone knows.  Win Win.

Let’s look at standings from last season with a screenshot of baseball-reference.com before they update for this season.


Now let’s look at a current screenshot of standings as reported by Baseball Prospectus using PECOTA.  A screenshot is used because this page probably will get updated and changed.


They didn’t put much effort into NL West as that’s almost exactly the same.  They think ATL will be somewhat worse than last season, NYN somewhat better, PHI and MIA about the same.  No real insight except for the Cubs which is the click bait troll of this entire article.  They think NL West as a division will be about the same but NL Central will be worse.

This model uses 3 year career splits to rank teams by strength.  Below is a truncated table of top 15 MLB teams created at the beginning of 2018 when we had complete 25 man rosters.

Top 15 MLB teams April 2018

TeamID Hitters Pitchers Starters Relief Total W-L
HOU 49.53 70.04 35.28 34.76 119.57 0
CHN 49.94 55.63 36.47 19.16 105.57 0
CLE 32.33 59.88 26.13 33.75 92.21 0
WAS 36.34 55.19 40.93 14.26 91.53 0
BOS 55.55 35.24 23.90 11.34 90.79 0
LAN 17.97 65.55 48.05 17.50 83.52 0
TOR 40.74 41.30 26.12 15.18 82.04 0
NYA 30.25 51.65 17.22 34.43 81.90 0
COL 59.60 10.11 -3.68 13.79 69.71 0
NYN 25.85 34.70 25.38 9.32 60.55 0
BAL 35.79 9.31 -15.11 24.42 45.10 0
MIL 15.53 24.69 8.49 16.20 40.22 0
MIN 15.78 19.32 4.18 15.14 35.10 0
ANA 24.16 10.20 5.10 5.10 34.36 0
SFN 13.20 17.68 0.33 17.35 30.88 0

This table takes WAA career value for 2015, 2016, and 2017 and sorts by Total.  Total is the sum of Pitchers and Hitters.  Pitchers is the sum of Relief and Starters as we knew at the beginning of the 2018 season and posted here.   Since we’re from the future we know how this season turned out.  Colored in bold green are the NLDS contenders and bold blue the World Series contenders.

Eight of the top twelve ranked teams made it into the playoffs with only Atlanta and Oakland missing from the top half of MLB.  Atlanta was ranked 27 out of 30 teams and they ended up winning NL East.  OAK is literally the team Michael Lewis wrote about in Moneyball and ranked #18.  They are constantly cycling new players through their system.

The above only measures career so the potential of new players don’t get counted.  In 2015 both the Cubs and Astros were ranked at the bottom of this list.  Both had a lot of new guys.  The Cubs ended up in the NLCS that season and both teams rose to the top of the league with career players in subsequent years.

The above table will be reproduced for this season in April when we get a complete dataset of roster data.   Rules 1 and 2 of the Prediction Racket preclude making projections.  The above table demonstrates, however, a team with top career talent will most likely do well in the regular season.  A team that ends up in the bottom half of this list will probably not make the playoffs but as Rule 1 states, No one can predict the future.

That is all for now.  Subsequent parts to this series will be written if there are any funny tweets during Spring Training or in April when complete 25 man rosters are known and we can have some fun with PECOTA like we do with WAR.  Also, Part 2 of the DH series will be forthcoming.

Since the minor league database has been updated we’ll take a look at the new guys on the Cubs in Spring Training as well as the White Sox.  The White Sox are ranked 6th in AL according to the Vegas book as to who is going win the ALCS.  PECOTA isn’t so favorable according to this:


Refer to Rule #1.  Until then ….

The DH argument Part 1

This will be a multi part series that explores various aspects of this DH issue.  The first question that needs to be answered is which league has better hitting pitchers.  Now that AL and NL play each other AL pitchers must bat when they play in NL parks.  My initial conjecture was NL pitchers would be better because they get more practice at the plate.  Let’s examine this.

Since this data model produces a value metric that is what we will use for this determination.  The raw WAA value system used to rank individual players cannot be used because AL pitchers have more than an order of magnitude less plate appearances per year than NL pitchers.  Since all but a few pitchers are below average hitters that would skew the numbers in favor of AL.

This is where the rate, WinPct is needed.  This model uses WinPct to place minor league player stats into context because those players typically move from league to league.  WinPct provides context to the WAA weighting value.  WinPct is not shown for MLB players because it is deceptive at that level.

Why can WinPct be deceptive?  For example,  a typical 26 mile marathon can be finished by the best marathon runner in a little over 2 hours making their average rate of speed to be around 13 mph.  A good runner 3 hours or a 9 mph rate; average runner 4 hours, 6+ mph rate and so on and so on.

A top runner of a mile can do it in 4 minutes or 15 mph.  If you just look at rates, the mile runner runs faster than the top marathon runner.  Since 15 mph is higher than 13 mph does that make the mile runner a better runner?  Is a golfer who shoots 3 under par for 9 holes ( -0.333 shots/hole ) better than the golfer who shoots 3 under par for 18 holes ( -0.166  shots/hole )?

The answer is no.  They could be better but you can’t tell by the rate.  MLB ranks players and give awards based upon batting average because it is/was a sideshow for baseball to garner interest for the sport. If your favorite team wasn’t doing well then you could root for your favorite player instead.  Now with fantasy leagues and actual gambling sites like Draft Kings that reward certain stats over others this concept has become even more extreme.

That’s all fine and well but batting average or WHIP does not represent value anymore than average running speed represents a runner’s ability or value as a runner.  A high batting average and low ERA often does translate into value that can be ranked but the raw number itself cannot.

This model does not show rate for MLB nor does it ever rank on rates, unlike most  of Sabermetrics.  That said we must use the rate for dissimilar groups of players like  AL and NL pitchers and sometimes it’s useful to provide context for lineups, relief squads, and starting pitchers.  Tiering which has been discussed throughout however uses raw WAA weighting.

What does all of this have to do with DH?

Nothing other to explain why in these next few exercises we will be using rates instead of raw value.  First let’s explain how WinPct is calculated again.  By definition:

WAA = wins – losses

Not too complicated.  It’s easy to calculate for teams and this model calculates it for players.  Players with positive WAA provide more wins to their teams than losses, vice versa for negative valued players.   The following must also be true:

Sum Team(WAA) = 0

Add all wins – losses for all teams in any league  and it adds to 0.  IOW, for every team that  wins, a team must lose.  Not too complicated!  The following is also true according this this data model.

Sum Player(WAA) = 0

If you add WAA of every player who played in a season it adds to exactly 0.

Sum Player_Team(WAA) = Team(WAA)

The above states that the sum of all players who played for a team while they played on that team is equal to their real win/loss record.  The Cubs had a record of 95-68 last season which is a WAA=+27.  WAA for all players tagged CHN in 2018 will add to that number.

Therefore, Player(WAA) has the same properties as Team(WAA) where a winPct can be calculated as follows.

Win% =  0.5*WAA/(number of games played) + 0.5

For the Cubs last season that was

Win% = 0.5 * 27 / 163 + 0.5 = 0.583

To calculate a player Win% the number of games played is not the actual games they play in.  Time in baseball is measured by plate appearance for hitters, innings pitched for pitchers.  Baseball has always used 9 innings to represent a game when calculating ERA.  An average game in baseball is not exactly 9 innings but it’s a close enough approximation, easy to remember and easy to calculate before there were calculators.

This model uses the constant 38.4 plate appearances to represent a game for hitters.  Javier Baez had 645 PA last season which translates into 645/38.4 = 16.8 games.  His WAA for his almost MVP season was 7.29 thus,

Javier Baez Win% = 0.5 * 7.29 / 16.8 games  + 0.5 = 0.717

and for context:

Christian Yelich Win% = 0.5 * 8.,44 / 17 games + 0.5 = 0.749

The above is merely an illustration to how this is calculated.  The WAA value ( 8.44 for Yelich, 7.29 for Baez )  is all that matters for ranking purposes.  This model also gives Yelich MVP even though Baez led until the final week  of the  2018 season.

Would you get to the point of all this?

OK.  We meandered a bit with some background as to how all this is calculated showing it’s not very complicated.   The next set of tables will walk through the variables used to make Win%.  First let’s look at plate appearance numbers for AL and NL pitchers throughout the years.

AL and NL Pitching Plate Appearances

2008 637 4998
2009 642 4994
2010 638 5152
2011 621 5023
2012 605 4908
2013 345 4836
2014 332 4893
2015 333 4643
2016 361 4674
2017 329 4648
2018 311 4526

Plate appearances translates into baseball time.  The above table clearly shows what we already know — that NL pitchers bat far more often than AL pitchers — because NL does not have DH.  The number of plate appearances for both AL and NL  pitchers declined from a peak in 2010 until last season.  Not sure why but it is what it is.  Let’s look at total pitcher hitting WAA for each league.

 AL and NL Pitching BAT WAA

2008 -10.12 -73.75
2009 -9.03 -72.58
2010 -11.38 -68.54
2011 -8.95 -65.79
2012 -8.95 -66.49
2013 -5.23 -62.45
2014 -5.21 -66.13
2015 -5.08 -66.91
2016 -6.32 -61.76
2017 -4.72 -66.86
2018 -4.54 -67.20

This table does not tell you much other than pitchers bring losses to their teams from their poor hitting.  We saw in the previous table that plate appearances have gone down since 2010 yet WAA remains kind of constant.

With 15 teams in NL, pitchers contribute and average around -4 in the win/loss column per team due to hitting.   For AL it’s much less and the above shows AL pitchers have become much better hitters over the years.  Can’t really tell what’s going on without doing the Win% calculation.

AL and NL Pitching BAT Win%

YEAR AL Win% NL Win%
2008 0.195 0.217
2009 0.230 0.221
2010 0.157 0.245
2011 0.223 0.249
2012 0.216 0.240
2013 0.209 0.252
2014 0.199 0.241
2015 0.207 0.223
2016 0.164 0.246
2017 0.224 0.224
2018 0.220 0.215

It must be stressed that these only include hitting stats that have nothing to do with their pitching.  The last couple of years AL and NL pitchers are more or less equal in hitting ability but very very poor.  As shown above, MVP quality hitting is above 0.700.  A textbook completely average hitter would have a WAA = 0 translating to a Win% of exactly 0.500.

The above clearly shows just how bad pitchers in general are at hitting which is one of the reasons for DH.  In order to put the above in context we must compare the above numbers to the worst hitters in each lineup.

Since AL teams have DH they normally do not make pitchers hit.  In order to put the above in context we’ll look at the 9th hitter in each lineup last year and if I get motivated, the last ten years.  The bottom of a lineup is where managers put hitters they want to have the least amount of plate appearances.   What kind of Win% do these players put up?  We’ll see.  Until then ….

Hall of Fame Part 3

In this part we’ll cover the other 3 MLB hall of fame inductees from the latest vote which can be seen here.   Below is a bunch of career tables showing year by year valuations for both WAR and WAA value systems.  According to this data model all 3 deserve HOF induction with the weakest being Edgar Martinez who squeaked in on his last year of eligibility.

This data model abhors tables of numbers but there is no other way to present these long careers.  Comments will be interspersed among the tables.  Order is their appearance on the HOF voting ballot according to this baseball-reference web site.

Edgar Martinez WAA

Edgar Martinez is ranked #214 of all post 1900 MLB players according to this data model which just barely gets him in.  The threshold should be somewhere between 200 and 250.  Ranking score for this data model is 1120.  WAR has him ranked much higher with a ranking score of 1974.

Year Rank WAA Name_TeamID Pos
1987 XXXXX -0.02 Edgar_Martinez_SEA 3B
1988 XXXXX -0.34 Edgar_Martinez_SEA 3B
1989 XXXXX -0.15 Edgar_Martinez_SEA 3B
1990 XXXXX -0.32 Edgar_Martinez_SEA 3B
1991 XXXXX 1.13 Edgar_Martinez_SEA 3B
1992 +034+ 5.17 Edgar_Martinez_SEA 3B-DH
1993 XXXXX -0.53 Edgar_Martinez_SEA DH-3B
1994 XXXXX 0.10 Edgar_Martinez_SEA 3B-DH
1995 +005+ 8.53 Edgar_Martinez_SEA DH
1996 +024+ 7.33 Edgar_Martinez_SEA DH
1997 +041+ 5.50 Edgar_Martinez_SEA DH
1998 +117+ 2.88 Edgar_Martinez_SEA DH
1999 +180+ 1.83 Edgar_Martinez_SEA DH
2000 +013+ 8.13 Edgar_Martinez_SEA DH
2001 +030+ 6.36 Edgar_Martinez_SEA DH
2002 XXXXX 0.82 Edgar_Martinez_SEA DH
2003 +129+ 2.77 Edgar_Martinez_SEA DH
2004 -093- -2.44 Edgar_Martinez_SEA DH
Total 46.75  1120

Edgar Martinez WAR

Below is an extended WAR table available for hitters in WAR.  WAR has an offensive component oWAR and a defensive component dWAR.  The two cannot be added together to make WAR because WAR does not have additive properties.

Normally this model adheres to a Keep It Simple Policy (KISS) meaning fewer entries in a table the better.  The total (or whatever) WAR is used for sorting and ranking both pitchers and batters together, like what is done for this data model throughout.

Year Rank WAR oWAR dWAR PA Name_Tm Pos
1987 XXXXX 0.2 0.4 -0.2 46 Edgar_Martinez_SEA 3B
1988 XXXXX -0.1 0.1 -0.2 38 Edgar_Martinez_SEA 3B
1989 XXXXX 0.5 0.0 0.6 196 Edgar_Martinez_SEA 3B
1990 +022+ 5.5 4.2 1.5 572 Edgar_Martinez_SEA 3B
1991 +017+ 6.1 5.5 0.8 642 Edgar_Martinez_SEA 3B
1992 +011+ 6.6 7.1 -0.7 592 Edgar_Martinez_SEA 3B-DH
1993 XXXXX 0.2 0.5 -0.4 165 Edgar_Martinez_SEA DH-3B
1994 +079+ 3.0 2.4 0.5 387 Edgar_Martinez_SEA 3B-DH
1995 +006+ 7.0 7.2 -1.4 639 Edgar_Martinez_SEA DH
1996 +021+ 6.4 6.4 -1.1 634 Edgar_Martinez_SEA DH
1997 +020+ 6.2 6.1 -1.3 678 Edgar_Martinez_SEA DH
1998 +044+ 5.6 5.6 -1.4 672 Edgar_Martinez_SEA DH
1999 +046+ 4.9 4.8 -1.1 608 Edgar_Martinez_SEA DH
2000 +029+ 5.6 5.6 -1.2 665 Edgar_Martinez_SEA DH
2001 +051+ 4.8 4.8 -1.1 581 Edgar_Martinez_SEA DH
2002 +163+ 2.6 2.6 -0.8 407 Edgar_Martinez_SEA DH
2003 +117+ 3.3 3.3 -1.2 603 Edgar_Martinez_SEA DH
2004 XXXXX -0.3 -0.4 -1.0 549 Edgar_Martinez_SEA DH
Total 68.1 1974

WAR has a ranking score of 1974, significantly higher than this data model’s 1120.   We know from his poor dWAR numbers this is totally due to offense which makes him a good direct comparison between the two models.  This model shows why it took him 10 years to get in.  He should have gotten in much sooner according to WAR.

WAR tends to over value hitters based upon anecdotal observation.  We know the sum of WAR hitters consists of 60% of the league total of 1000 year after year.  This might be due to overvaluing oWAR.   Perhaps we’ll explore this further … perhaps not.  It doesn’t really matter.

Roy Halladay WAA

This data model has Roy Halladay ranked #125 out of all post 1900 MLB players so he clearly qualifies for HOF and he gets in on first ballot with 85% of the vote.  He had very bad years in 2000 and 2013 but made all that negative value back and more with many superb top ten years.

WAR and WAA are almost in complete agreement according to ranking scores highlighted in brown.  Both systems pegged him #1 in the bottom 200 in 2000.

Year Rank WAA Name_TeamID Pos
1998 XXXXX 0.80 Roy_Halladay_TOR PITCH
1999 +129+ 2.77 Roy_Halladay_TOR PITCH
2000 -001- -9.24 Roy_Halladay_TOR PITCH
2001 +111+ 2.94 Roy_Halladay_TOR PITCH
2002 +014+ 7.58 Roy_Halladay_TOR PITCH
2003 +015+ 7.33 Roy_Halladay_TOR PITCH
2004 XXXXX 1.09 Roy_Halladay_TOR PITCH
2005 +024+ 6.15 Roy_Halladay_TOR PITCH
2006 +017+ 7.08 Roy_Halladay_TOR PITCH
2007 +061+ 3.97 Roy_Halladay_TOR PITCH
2008 +006+ 8.84 Roy_Halladay_TOR PITCH
2009 +008+ 8.42 Roy_Halladay_TOR PITCH
2010 +003+ 9.37 Roy_Halladay_PHI PITCH
2011 +006+ 8.32 Roy_Halladay_PHI PITCH
2012 -128- -1.81 Roy_Halladay_PHI PITCH
2013 -027- -4.20 Roy_Halladay_PHI PITCH
Total 59.41  1362

Roy Halladay WAR

Year Rank WAR IP Name_Tm Pos
1998 XXXXX 0.4 14.0 Roy_Halladay_TOR PITCH
1999 +160+ 2.6 149.1 Roy_Halladay_TOR PITCH
2000 -001- -2.8 67.2 Roy_Halladay_TOR PITCH
2001 +125+ 3.0 105.1 Roy_Halladay_TOR PITCH
2002 +005+ 7.4 239.1 Roy_Halladay_TOR PITCH
2003 +004+ 8.1 266.0 Roy_Halladay_TOR PITCH
2004 +179+ 2.4 133.0 Roy_Halladay_TOR PITCH
2005 +025+ 5.5 141.2 Roy_Halladay_TOR PITCH
2006 +029+ 5.2 220.0 Roy_Halladay_TOR PITCH
2007 +098+ 3.5 225.1 Roy_Halladay_TOR PITCH
2008 +019+ 6.2 246.0 Roy_Halladay_TOR PITCH
2009 +011+ 6.9 239.0 Roy_Halladay_TOR PITCH
2010 +002+ 8.3 250.2 Roy_Halladay_PHI PITCH
2011 +001+ 8.9 233.2 Roy_Halladay_PHI PITCH
2012 XXXXX 0.9 156.1 Roy_Halladay_PHI PITCH
2013 -065- -0.9 62.0 Roy_Halladay_PHI PITCH
Total 65.6  1408

Mike Mussina WAA

Mike Mussina gets voted in after 6 years with 76% vote.  This data model has him ranked #123, almost exactly tied with Roy Halladay above.   WAR has his career valued much higher than all current HOF inductees based upon ranking score.

Even though Mussina and Halladay are virtually tied in career WAA,  Mussina has a much higher ranking score.   Career WAA is the only factor used for ranking purposes, both seasonal and year to year.  Ranking scores are only computed to compare how WAA values a player with WAR.

Year Rank WAA Name_TeamID Pos
1991 +134+ 2.10 Mike_Mussina_BAL PITCH
1992 +012+ 6.66 Mike_Mussina_BAL PITCH
1993 XXXXX -0.99 Mike_Mussina_BAL PITCH
1994 +008+ 6.38 Mike_Mussina_BAL PITCH
1995 +019+ 6.38 Mike_Mussina_BAL PITCH
1996 XXXXX -1.49 Mike_Mussina_BAL PITCH
1997 +030+ 5.96 Mike_Mussina_BAL PITCH
1998 +065+ 4.79 Mike_Mussina_BAL PITCH
1999 +037+ 5.84 Mike_Mussina_BAL PITCH
2000 +037+ 5.61 Mike_Mussina_BAL PITCH
2001 +026+ 6.64 Mike_Mussina_NYA PITCH
2002 XXXXX 1.03 Mike_Mussina_NYA PITCH
2003 +055+ 4.77 Mike_Mussina_NYA PITCH
2004 XXXXX -0.44 Mike_Mussina_NYA PITCH
2005 XXXXX -0.36 Mike_Mussina_NYA PITCH
2006 +051+ 4.64 Mike_Mussina_NYA PITCH
2007 -099- -2.44 Mike_Mussina_NYA PITCH
2008 +049+ 4.51 Mike_Mussina_NYA PITCH
Total 59.59 1776

Mike Mussina WAR

Year Rank WAR IP Name_Tm Pos
1991 XXXXX 2.2 87.2 Mike_Mussina_BAL PITCH
1992 +004+ 8.2 241.0 Mike_Mussina_BAL PITCH
1993 XXXXX 1.5 167.2 Mike_Mussina_BAL PITCH
1994 +012+ 5.4 176.1 Mike_Mussina_BAL PITCH
1995 +014+ 6.1 221.2 Mike_Mussina_BAL PITCH
1996 +086+ 3.6 243.1 Mike_Mussina_BAL PITCH
1997 +027+ 5.5 224.2 Mike_Mussina_BAL PITCH
1998 +055+ 5.0 206.1 Mike_Mussina_BAL PITCH
1999 +065+ 4.4 203.1 Mike_Mussina_BAL PITCH
2000 +026+ 5.6 237.2 Mike_Mussina_BAL PITCH
2001 +013+ 7.1 228.2 Mike_Mussina_NYA PITCH
2002 +056+ 4.5 215.2 Mike_Mussina_NYA PITCH
2003 +013+ 6.6 214.2 Mike_Mussina_NYA PITCH
2004 +177+ 2.4 164.2 Mike_Mussina_NYA PITCH
2005 +110+ 3.4 179.2 Mike_Mussina_NYA PITCH
2006 +035+ 5.0 197.1 Mike_Mussina_NYA PITCH
2007 XXXXX 1.0 152.0 Mike_Mussina_NYA PITCH
2008 +038+ 5.2 200.1 Mike_Mussina_NYA PITCH
Total 82.7  2269

Next season Clemens and Bonds will probably get in breaking the no PEDs seal.  We’ll run through a couple of interesting careers who didn’t make the 75% in the next, and possibly final, part to this series.  Until then ….

Hall of Fame Part 2

It appears the HOF voting table referenced in Part 1 had percentages from last season.  They just held a vote and Mariano Rivera got unanimous vote.  According to this data model the career of Mariano Rivera is slightly below Curt Schilling ( tables at the end).   He is ranked #95 in MLB history which should clearly put him into HOF but HOF voters are a fickle bunch.  They never agree on anything.  How could Rivera get unanimous first ballot?

Simple.  Mariano Rivera is by far the most valuable player in post season history according to this data model and anyone who looks at his play off stats.

Top 10 Players in Playoff Season

Rank WAA Name_TeamID Pos
+001+ 9.30 Mariano_Rivera_NYA PITCH
+002+ 6.17 Christy_Mathewson_SFN PITCH
+003+ 5.46 Bernie_Williams_NYA RF-DH-CF
+004+ 4.89 Manny_Ramirez_TOT LF
+005+ 4.62 Albert_Pujols_TOT 1B-DH
+006+ 4.37 John_Smoltz_TOT PITCH
+007+ 4.18 David_Ortiz_TOT DH
+008+ 4.05 Curt_Schilling_TOT PITCH
+009+ 4.01 Carlos_Beltran_TOT DH
+010+ 3.78 Babe_Ruth_TOT OF-RF-LF

All playoff games are treated as a single season.  A usual modern season will have around 1500 unique players.  From 1903 to present there have been more than 7800 unique players or around 5x a standard season.

Bernie Williams in the above list had 545 playoff season plate appearances which is almost a full regular season of playing time.   Babe Ruth, on the other hand, had only 167 playoff season plate appearances yet he still ranks #10.  He ranks #67 as a pitcher and if you add the two together, which you can,  Ruth would be ranked #5.  He’s the only player in MLB history with both a stellar PITCH and BAT record.

Due to the fact that there are far more playoff games played now than in the past modern players will eventually rise to the top of this list.  Mariano Rivera will probably never be topped however.   Christy Mathewson started his career in 1900 and held the top spot until Rivera came along.

Let’s look at Mariano’s extended record.

Rank WAA IP ERA Gs Gr Name_TeamID Pos
+001+ 9.30 141.0 0.70 0 97 Mariano_Rivera_NYA PITCH

141 innings pitched in 97 games in relief is around two complete reliever seasons and almost a complete season for a starter.  An ERA under 1 is simply phenomenal that may never happen again.  Curt Schilling is also in the top ten at #8.

Below are the details of Mariano Rivera’s career according to both this data model and WAR.

Mariano Rivera Career


Year Rank WAA Name_TeamID Pos
1995 -163- -1.64 Mariano_Rivera_NYA PITCH
1996 +033+ 6.40 Mariano_Rivera_NYA PITCH
1997 +070+ 4.07 Mariano_Rivera_NYA PITCH
1998 +090+ 3.57 Mariano_Rivera_NYA PITCH
1999 +064+ 4.64 Mariano_Rivera_NYA PITCH
2000 +101+ 3.49 Mariano_Rivera_NYA PITCH
2001 +081+ 3.86 Mariano_Rivera_NYA PITCH
2002 XXXXX 1.62 Mariano_Rivera_NYA PITCH
2003 +066+ 4.43 Mariano_Rivera_NYA PITCH
2004 +052+ 4.64 Mariano_Rivera_NYA PITCH
2005 +031+ 5.38 Mariano_Rivera_NYA PITCH
2006 +046+ 4.77 Mariano_Rivera_NYA PITCH
2007 +162+ 2.18 Mariano_Rivera_NYA PITCH
2008 +044+ 4.83 Mariano_Rivera_NYA PITCH
2009 +065+ 3.93 Mariano_Rivera_NYA PITCH
2010 +093+ 3.19 Mariano_Rivera_NYA PITCH
2011 +095+ 2.88 Mariano_Rivera_NYA PITCH
2012 XXXXX 0.36 Mariano_Rivera_NYA PITCH
2013 +110+ 2.65 Mariano_Rivera_NYA PITCH
Total 65.25  1960

His best year was 1996, his second season and second best year was 2005.  Overall he was consistently good every single year.  The ranking score highlighted in brown can be directly compared with the ranking score calculated for the WAR value system below.


Year Rank WAR IP Name_TeamID Pos
1995 XXXXX 0.2 67.0 Mariano_Rivera_NYA PITCH
1996 +048+ 5.0 107.2 Mariano_Rivera_NYA PITCH
1997 +088+ 3.7 71.2 Mariano_Rivera_NYA PITCH
1998 +151+ 2.8 61.1 Mariano_Rivera_NYA PITCH
1999 +100+ 3.5 69.0 Mariano_Rivera_NYA PITCH
2000 +156+ 2.6 75.2 Mariano_Rivera_NYA PITCH
2001 +111+ 3.3 80.2 Mariano_Rivera_NYA PITCH
2002 XXXXX 1.6 46.0 Mariano_Rivera_NYA PITCH
2003 +097+ 3.6 70.2 Mariano_Rivera_NYA PITCH
2004 +068+ 4.2 78.2 Mariano_Rivera_NYA PITCH
2005 +074+ 4.0 78.1 Mariano_Rivera_NYA PITCH
2006 +083+ 3.9 75.0 Mariano_Rivera_NYA PITCH
2007 XXXXX 1.9 71.1 Mariano_Rivera_NYA PITCH
2008 +071+ 4.3 70.2 Mariano_Rivera_NYA PITCH
2009 +099+ 3.5 66.1 Mariano_Rivera_NYA PITCH
2010 +179+ 2.4 60.0 Mariano_Rivera_NYA PITCH
2011 +112+ 3.2 61.1 Mariano_Rivera_NYA PITCH
2012 XXXXX 0.4 8.1 Mariano_Rivera_NYA PITCH
2013 +172+ 2.5 64.0 Mariano_Rivera_NYA PITCH
Total 56.6  1391

With a 1391 ranking score WAR has him less valued than this data model by a significant margin.  I do not have any WAR information about post season and doubt that can even be calculated.

Not much more to say about Mariano Rivera except he totally deserves a unanimous first ballot induction.  In subsequent parts will cover the other inductees and the ones who almost made it.  Until then ….

Hall of Fame Part 1

Curt Schilling was trending on Twitter with people discussing whether or not he should be in the Hall of Fame.  Being winter with no baseball games to follow let’s run this question through this data model.

One thing led to another which led yet again to baseball-reference.com with a table on the current HOF ballot.  In Part 1 of this series we’ll just look at Curt Shilling and then in subsequent parts explore the rest of the players on the ballot. It appears no one was elected except for the old timers in 2019.  The steroid era has hit retirement.

Edit: Apparently the vote percentages listed in the table were from last year.  Mariano Rivera gets unanimous vote and will be the subject of Part 2.

Should Curt Schilling be in the Hall of Fame?

How does one go about evaluating a career?  Here is how a Philadelphia newspaper summed up his entire career:

Schilling spent 20 years in the league as a pitcher, posting a 216-146 record and a career 3.46 ERA with three teams — the Phillies, Red Sox and Arizona Diamondbacks. 

Source: Trump wants Curt Schilling in the Hall of Fame, but his endorsement is a little too late

Can anyone tell if the above description of a pitcher makes him HOF worthy?  In order to do that you would have to see that player in the context of other HOF inductees.  This model ranks all players, hitters and pitchers, together and all careers for players who played in post 1900 are ranked together.

tl;dr Curt Schilling had a career total WAA=70.21 ranking him #85 out of all MLB players who played post 1900.  That puts him in the HOF.  He only got slightly higher than 50% of HOF votes, well short of the required 75%.

Let’s dive into the details…

Curt Schilling

WAA Table

Year Rank WAA Name_TeamID Pos
1988 -093- -2.06 Curt_Schilling_BAL PITCH
1989 XXXXX -0.50 Curt_Schilling_BAL PITCH
1990 XXXXX 1.45 Curt_Schilling_BAL PITCH
1991 XXXXX 0.19 Curt_Schilling_HOU PITCH
1992 +007+ 7.77 Curt_Schilling_PHI PITCH
1993 XXXXX 0.55 Curt_Schilling_PHI PITCH
1994 XXXXX 0.02 Curt_Schilling_PHI PITCH
1995 +129+ 2.35 Curt_Schilling_PHI PITCH
1996 +032+ 6.43 Curt_Schilling_PHI PITCH
1997 +010+ 8.86 Curt_Schilling_PHI PITCH
1998 +023+ 7.12 Curt_Schilling_PHI PITCH
1999 +053+ 4.93 Curt_Schilling_PHI PITCH
2000 +056+ 2.27 Curt_Schilling_PHI PITCH
2000 +056+ 2.46 Curt_Schilling_ARI PITCH
2001 +011+ 8.29 Curt_Schilling_ARI PITCH
2002 +028+ 6.28 Curt_Schilling_ARI PITCH
2003 +037+ 5.56 Curt_Schilling_ARI PITCH
2004 +024+ 6.32 Curt_Schilling_BOS PITCH
2005 -059- -2.96 Curt_Schilling_BOS PITCH
2006 +120+ 2.67 Curt_Schilling_BOS PITCH
2007 +160+ 2.21 Curt_Schilling_BOS PITCH
Total 70.21 1806

1997 was his best year according to this data model followed by 2001.  The number highlighted in brown is his total career ranking score.  A ranking score is used to compare two different value systems.  The WAR table is shown below also with a ranking score.

A value system must itself be evaluated by how it ranks players with each other.  This data model ranks the top and bottom 200 based upon a sort and reverse sort of the WAA weighting measure.  WAA can go negative as easily as it can go positive.

WAR is different.  Its weighting system rarely dives into negative territory allowing It to be  more forgiving to players.  We can’t directly compare the two weighting systems.  WAR for pitcher adds to 400 league wide, WAR for hitters adds to 600.   WAA adds to 0 for both hitters and batters.

The ranking system is computed as follows:

total ranking score += 200 - rank ( for top 200 players )
total ranking score += rank - 200  ( for bottom 200 players )

The higher you rank in the top 200 the more ranking score you add, vice versa for high bottom 200 ranks.  A rank of #1 in the bottom 200 is least value player in the league.  Let’s look at the WAR table.

WAR Table

Year Rank WAR IP Name_Tm Pos
1988 -064- -0.8 14.2 Curt_Schilling_BAL PITCH
1989 XXXXX -0.1 8.2 Curt_Schilling_BAL PITCH
1990 XXXXX 1.2 46.0 Curt_Schilling_BAL PITCH
1991 XXXXX -0.1 75.2 Curt_Schilling_HOU PITCH
1992 +027+ 5.9 226.1 Curt_Schilling_PHI PITCH
1993 +150+ 2.6 235.1 Curt_Schilling_PHI PITCH
1994 XXXXX 1.2 82.1 Curt_Schilling_PHI PITCH
1995 +164+ 2.2 116.0 Curt_Schilling_PHI PITCH
1996 +049+ 4.9 183.1 Curt_Schilling_PHI PITCH
1997 +019+ 6.3 254.1 Curt_Schilling_PHI PITCH
1998 +031+ 6.2 268.2 Curt_Schilling_PHI PITCH
1999 +047+ 4.8 180.1 Curt_Schilling_PHI PITCH
2000 +036+ 5.1 210.1 Curt_Schilling_TOT PITCH
2001 +006+ 8.8 256.2 Curt_Schilling_ARI PITCH
2002 +004+ 8.7 259.1 Curt_Schilling_ARI PITCH
2003 +021+ 6.0 168.0 Curt_Schilling_ARI PITCH
2004 +010+ 7.9 226.2 Curt_Schilling_BOS PITCH
2005 XXXXX 0.4 93.1 Curt_Schilling_BOS PITCH
2006 +023+ 5.5 204.0 Curt_Schilling_BOS PITCH
2007 +074+ 4.0 151.0 Curt_Schilling_BOS PITCH
Total 80.7 2003

The above table format is a work in progress.  IP is innings pitched which shows time.  The Total line is total WAR and WAR ranking score highlighted in brown.

WAR has Curt’s best seasons 2001 and 2002.  Their ranking score of 2003 means WAR thinks even more highly of Curt Schilling than this model — but not much.  The two systems are very close on this career.   I do not rank career WAR by weighting factor since WAR does not have additive properties and will return deceptive results.  The ranking score is more  accurate for evaluating a player with regard to WAR.

That is all for now.  More HOF career analysis coming soon.  Until then ….