Author Archives: mea

Bill James: Judge and Altuve

Update 2/18/2018: I started writing this a couple months ago and couldn’t finish after reading Bill James quote OPS, a very flawed baseball statistic which is a tangent I don’t really care about.  If people want to throw around these kind of stats that makes the results from this data model more valuable.

tl;dr This model reflects a team’s Win/Loss record based upon its players.  WAR does not.  This model uses the estimated Win/Loss record based upon Bill Jame’s own PE formula.  We could, like James stated with the Yankees, adjust to real wins and losses very easily but we don’t.  That is all….

——————————cut here——————————

I got directed to this article: Judge and Altuve | Articles | Bill James Online written by Bill James and there are some interesting tidbits that I need to comment on.  It’s difficult thinking about baseball in the winter and I have been putting this off.  This post will be updated throughout the winter as I think of something different to say.

The article is about the value of Judge and Altuve as MVP.  This data model is clear and unambiguous,  Aaron Judge is the MVP of AL right behind Giancarlo who we have as MVP of NL also.  Here are our top 5 MLB players.

Rank WAA Name_TeamID Pos
+001+ 10.00 Corey_Kluber_CLE PITCH
+002+ 9.66 Giancarlo_Stanton_MIA RF
+003+ 8.92 Aaron_Judge_NYA RF-DH
+004+ 8.55 Max_Scherzer_WAS PITCH
+005+ 8.38 Paul_Goldschmidt_ARI 1B

AL, NL, Pitchers and batters are all ranked together in this data model.   Apparently Bill James agrees with the MVP voters that Altuve is AL MVP.  Whatever.  He has some interesting things to say in the article which is a good read.  Here’s a blurb:

The first indication that there is a problem with applying the normal and general relationship is this.   The Yankees, by the normal and general relationship, should have won 102 games, when in fact they won only 91.   That’s a BIG gap. The Yankees played poorly in one-run games (18-26) and other close games, which is why they fell short of their expected wins.   I am getting ahead of my argument in making this statement now, but it is not right to give the Yankee players credit for winning 102 games when in fact they won only 91 games.   To give the Yankee players credit for winning 102 games when in fact they won only 91 games is what we would call an “error”.   It is not a “choice”; it is not an “option”.   It is an error.

When you express Judge’s RUNS. . .his run contributions. . . when you express his runs as a number of wins, you have to adjust for the fact that there are only 91 wins there, when there should be 102.  (The Astros should have won 101 games and did win 101 games, so that’s not an issue with Altuve.)  But back to the Yankees, one way to do that is to say that the Yankee win contributions, rather than being allowed to add up to 102, must add up to 91.

He makes an assumption which is not true.   WAR does not add up to anything as we have shown here over and over.  This model has the the sum of Yankees players adding up to 102 games exactly according to Bill james’ Pythagorean Expectation formula.  Bill James is talking about this model, not WAR.

There is a simple method to make this adjustment in this model.  We would tax NYA 11 games and ding every player according to playing time.  According to our above table Aaron Judge has a WAA=8.92.    He would lose 0.6 on an adjustment and drop to 8.32.  Since everyone in the league would be adjusted the rankings could change but in no way change enough for Jose Altuve to move ahead.

Right now I don’t want to do this.  Runs are the currency that achieves wins and they are what players accumulate above or below average.  We can assign run production with virtually 100% accuracy.  This gets converted to wins according to Pythagorean Expectation which is the WAA value measure players carry from team to team when they get traded.  This value measure is the same for all leagues from MLB to A+ to JPL to even little league.  This model must work for all leagues the same. The disparity between PE and real wins and losses can be magnified in lower leagues which could obfuscate players who are only there to prove themselves, where wins and losses may not even matter to those teams.

I’m torn by this.  It can easily be done with this model.  it would create a split in valuations and, like Sabermetrics, which value is correct.  I prefer the value that reflects the estimated wins and losses.  In the end I don’t think it would matter that much anyway.  Perhaps we’ll run some numbers and see.

The logic for applying the normal and usual relationship is that deviations from the normal and usual relationship should be attributed to luck. There is no such thing as an “ability” to hit better when the game is on the line, goes the argument; it is just luck.   It’s not a real ability.

We don’t know what causes a team to exceed or not exceed expectations.   We can’t predict the future.  We can only estimate it.   Reality is the goal post, all estimates are a source of error.  Luck has nothing to do with it.


Update 2/18/2018:  This is where I need to stop commenting.

Lazy days of Winter

I haven’t done much since the end of the 2017 season.   Spring training is starting up but we can’t make any analysis until May, possibly mid May.   Hopefully by that time the simulations will have been completed and run against our dataset that consists of the last 7 years of daily lines for each game.   If this system can show a clear margin we have something.  If not then we need to figure out why.

A prototype web portal to this data model will be developed throughout the season.  This blog concentrates on the Cubs but this analysis can be done for any team, for any season.  The web portal will be turned into a prototype app for anyone to quickly look up anything about baseball through the lens of this data model.

The next post will be a career based ranking based upon whatever we can discern from various team’s player rosters.  The Cubs have quite a few high ranking career players now compared to the earlier years after the Ricketts purchased the team.  You can click on and peruse various careers by drilling down.  This data covers everything up to including 2013 so it’s quite out of date.  That will be made current.

Apparently now publishes detailed box scores and event data for each game in XML.  We used to rely on who publish this yearly in December well after the season is over.  We have certain stats like RISP which we introduce here. that can’t be calculated without event data.  We should probably write another RISP article before May as well.

And finally, the Cubs just acquired Yu Darvish for $21/year for 6 years.  I am not an MBA  we don’t have a good way to determine whether that’s a good deal.  All we can do is look at his career.

Year WAA Name_TeamID Pos Rank
2012 0.5 Yu_Darvish_TEX PITCH XXXXX
2013 5.1 Yu_Darvish_TEX PITCH +029+
2014 2.5 Yu_Darvish_TEX PITCH +121+
2016 1.8 Yu_Darvish_TEX PITCH +188+
2017 1.2 Yu_Darvish_TEX PITCH +157+
2017 1.1 Yu_Darvish_LAN PITCH +157+
Total 12.2

He has had a solid career with his best season in 2013 which is his upside potential.  If he pitches like that the Cubs will be in good shape.  For 2017 you have to add TEX and LAN together giving him a WAA=2.3.  This ranks him #157 for the season, top 200.

The above numbers don’t mean much unless put into context with the entire MLB player dataset from 1900-2016.   That will be fodder for another before May article.  Until then….

About this site

This site is a public logbook on the development of a baseball data model that measures baseball player value and ranks them from best to worst.  This model contains the current 30 MLB franchises, their minor league affiliates, and their historical teams.   It covers all seasons and all players from 1900 – 2017.

Browse the Table of Contents for more information.  We covered the 2017 season extensively.  Not much published here in 2016 even though the Cubs won and it has been sporadic the years before starting in September 2013.

The goal of this data model is to become an app that user can quickly evaluate a player being talked without knowing anything about baseball.   They can then become the smartest person in the room about that player.  There will be a handicapping component but that is a work in progress and hasn’t been proven.  We have a solid proof for the WAA measure, something WAR does not have.

The Simulation Part 1

It has been difficult to think about baseball these last few months.  Still have a Giancarlo Stanton article and a rebuttle to a Bill James article in the works and then there are these simulations that need to be done before next season.  We’ll describe these simulations  a little at a time.

The basic concept of tiers is not too complicated and first described here and below.

  1. average + 1 standard deviation
  2. average + 1/2 standard deviation
  3. average
  4. average – 1/2 standard deviation
  5. average – 1 standard deviation.

We compute a moving average WAA and standard deviation and separate all groups into tiers.  A starter is a single player, a lineup is the sum of WAA for those in the starting lineup, and relief is the sum of relievers listed on the roster.  This is estimated for past years using event data from

Throughout the year we described the concept of lineup-starter pairs.  There is also a lineup-relief pair as well.  These two pairs represent matchups for an entire game.   Since the lineup tier will be the same for both pairs we get the following tuples.

lineup-(starter-relief)  for visitors-(home) 
lineup-(starter-relief)  for home-(visitors)

There are 5 tiers for each lineup starter and relief making 125 different combinations.  There are two tuples per game and we know the number of runs scored for all games since 1970.  Each tuple will have its own run distribution which we will use to draw from in the simulation.

The simulation will run a number of iterations.  Each iteration will generate a random number which will pull a value out of lineup, starter, and relief to determine who wins that game.

Innings pitched by starters will vary by tier level.  Tier 1 starters pitch more innings than Tier 5 for obvious reasons.  The run value pulled from the starter distribution will also have innings pitched.  This will be used to calculate the relief runs.

Each simulated game will have two tuples.  Let’s run through an example.  Suppose we see a home team with a Tier 1 starter, Tier 5 lineup, and a Tier 1 relief.  The visiting team has a Tier 5 starter, Tier 1 lineup, and a Tier 5 relief.  The two game tuples are.

home 5-(1-1)
visitor 1-(5-5)

The simulation runs 10K, 100K, or even 1M iterations pulling numbers from the distribution.  Which ever lineup gets more runs wins that game.  Ties are discarded and we calculate a win percentage which can translate into our estimated line for the game.  We’ll run the above estimate when the simulation scripts are completed and post and update.

This line can then be compared with the real lines to determine whether or not this is a betting opportunity based upon a different set of requirements which will be described later.   From this we can determine whether or not this method can beat the lines and by how many percent based upon our historical data.

There are 125 different tuples.  Since the order of two tuples in a game does not matter that means the number of possible pairs of tuples is:

n*(n+1)/2 where n=125

125*126/2 = 7875 combinations

If we do 100K iterations per game pair that would be almost a billion games to simulate.

In subsequent parts we will explore different pairs perhaps using real games from last season that I kind of eyeballed and guessed at.  Since the later playoff games had such close matchups perhaps that’s what makes home/away field advantage/disadvantage more important.  We observed this in both ALCS and NLCS.  This will have to be explored when we go further down this rabbit hole.  Until then….

Lineup Relief Table Part 2

We recalculated the lineup-relief tables using innings pitched instead of games.  This is a more accurate measure.  The table below is reformatted to show the lineup-relief combo pair,  innings pitched per game and the average runs per inning scored by  lineup, also given up by relief.

Lineup-Relief IP/Game Avg Runs
1-1 2.77 0.481
1-2 2.79 0.493
1-3 2.84 0.500
1-4 2.73 0.522
1-5 2.91 0.540
2-1 2.83 0.438
2-2 2.75 0.476
2-3 2.81 0.468
2-4 2.90 0.501
2-5 2.88 0.502
3-1 2.72 0.416
3-2 2.75 0.440
3-3 2.73 0.454
3-4 2.76 0.477
3-5 2.83 0.495
4-1 2.77 0.415
4-2 2.72 0.426
4-3 2.78 0.438
4-4 2.72 0.466
4-5 2.78 0.458
5-1 2.77 0.388
5-2 2.65 0.413
5-3 2.63 0.407
5-4 2.61 0.421
5-5 2.69 0.467

Not sure the innings pitched/game column means anything for each lineup-relief pair.  The average runs show a low of 0.388 for the worst lineup against the best relief to 0,540 for the best lineup against the worst relief (highlighted in green above).  This should be expected and the range is rather significant and should provide for interesting results in simulation.

Below is the table above condensed making it easier to see the trend.

Lineup-Relief IP/Game Avg Runs
1-5 2.91 0.540
2-4 2.90 0.501
3-3 2.73 0.454
4-2 2.72 0.426
5-1 2.77 0.388

Average runs scored goes down with worse lineups facing better relief squads as we would expect.  The data looks correct so far.  It’s possible that the best lineup against the worst relief has highest IP/Game because the best lineup will  knock out starters faster than worse lineups making relief pitchers pitch more innings regardless of value.

Since we’re here let’s do this for lineup-starters as well.  Same table format as above.

Lineup-Starter IP/Game Avg Runs
1-1 6.81 0.471
1-2 6.38 0.533
1-3 5.99 0.567
1-4 5.84 0.575
1-5 5.87 0.602
2-1 6.74 0.466
2-2 6.44 0.484
2-3 6.00 0.540
2-4 5.85 0.555
2-5 5.90 0.571
3-1 6.85 0.426
3-2 6.52 0.466
3-3 6.09 0.510
3-4 5.93 0.534
3-5 5.95 0.542
4-1 6.86 0.419
4-2 6.49 0.450
4-3 6.13 0.484
4-4 5.92 0.521
4-5 5.95 0.521
5-1 6.94 0.400
5-2 6.70 0.410
5-3 6.18 0.469
5-4 6.07 0.470
5-5 6.02 0.502

The difference between the two extremes in lineup-starter combos is around 0.2 runs per inning.  For lineup-relief combos that difference is around 0.15 runs per inning.  The innings pitched per game column shows how higher tier pitchers pitch more innings which should  be expected.  The high is 6.94 for the 5-1 lineup-starter pair and drops to a low of 5.87 for the 1-5 pair.

Below is a condensed version of the above table.

Lineup-Starter IP/Game Avg Runs
1-5 5.87 0.602
2-4 5.85 0.555
3-3 6.09 0.510
4-2 6.49 0.450
5-1 6.94 0.400

The trend of average runs follows what we expect with the best linups facing the worst starters to score the most runs which decreases as starter value increases and lineup value decreases.  We will use the inning numbers for simulation.

That is all for now.  The next step is running simulations.