Tag Archives: waa

Best pitching the game has ever seen

On Tuesday, April 26, 2016 at 10:23:49 AM UTC-5, Michael Sacks wrote:
> > Perhaps the Feldman and Clevenger for Arrieta and Strop is karma paying us
> > back for Brock for Broglio.
> Over the short term (~24 starts between last year and this year),

>  there’s a case to be made he’s the best pitcher period baseball has
> ever seen.  His last 20-something starts stacks up with Gibson’s best
> streak during his ’68 season, and other dominant parts of seasons.

Oh boy.  Making this statement is like saying Tiger Woods was the
best golf has ever seen between the last 5 holes of the 2nd round
to the first 13 holes of the 3rd round.

That said, let’s have some fun with this.  I can’t do 24 starts
periods because event data only goes back to mid 60s and before
that even daily box scores get sketchy so let’s just do seasonal
compares.  Let’s assume extrapolating 24 games is like 2/3 of
season even though you shouldn’t extrapolate.

Bob Gibson had a very low ERA in 1968 of 1.12 over 300 innings
pitched which is phenomenal but the league average for runs
scored that year was very low. ERA does not take into
account league averages.  Here are my model’s top single year pitching
performances post 1900.

1 20.6 Cy_Young_BOS 1901
2 16.8 Walter_Johnson_MIN 1912
3 16.1 Pete_Alexander_PHI 1915
4 15.5 Carl_Hubbell_SFN 1936
5 15.3 Lefty_Grove_OAK 1930
6 15.2 Walter_Johnson_MIN 1913
7 15.2 Dolf_Luque_CIN 1923
8 15.1 Pedro_Martinez_BOS 2000
9 15.0 Jack_Taylor_CHN 1902
10 14.9 Kevin_Brown_MIA 1996

So it’s Cy Young, his recent award namesake, Arrieta must beat.

Looking back at 2015 dailies, 24 starts ago started on 6/22/2015 for
Arietta.  He went into that game with a +1.0 WAA.  He ended the
season at +11.5 so the difference is +10.5 between 6/22 and the end
of 2015 season.  He’s at +2.3 this year so together he’s at
+12.8 WAA for his last 24 starts.

If we extrapolate 24 starts that +12.8 becomes +19.2
for a full season.  That still doesn’t get him over Cy Young
but it’s damned close.  I must repeat; extrapolating
like this is inappropriate since Cy Young actually played
a full season and extrapolating simply says what if for
1/3 of Arrieta’s.

It’s possible Pedro Martinez, Kevin Brown, Greg Maddux, Clemens,
Gooden, Johnson, etc. who played in the modern era might have had
24 game stretches like Arrieta’s as well.

And as always.  Don’t mean to jinx Arrieta.  Knock on Kerry Wood.

2014 Event Files have arrived

From http://retrosheet.org/

What’s New

  • 12/14/2014: Game accounts, boxes, and play-by-play data files for 2014; many other updates and additions. See [Games/Regular Season]

The 2014 event files have arrived at retrosheet.org.  I need to process them to add game state information. After the files have been processed they go through a second set of scripts to generate run data and error data.  Run data is a set of records, one for every run, that marks batter who scored, batter who made RBI, pitcher credited for that run, type of play that caused that run, etc.  Error records identify the fielder credited with that error and how many runs resulted in that error along with other info.

Once all this data is generated the third set of scripts can count various things and not have to worry about figuring out game state or who hit in what run.  Calculations like RISP and GWRBI are two stats dependent on traversing game events.  Those stats will be coming soon as well as day by day graphs of player WAA for every player, every year since 1974, the earliest year game event data is considered complete.

Brute Force Proof of Pythagorean Expectation

Except for stats that occur in the future, the set of baseball statistics is finite so we should be able to run the Pythagorean Expectation formula through Proof by Exhaustion or Brute Force Proof.   First let’s run through an example of the original Bill James’  simple PE formula:

We’ll use the 2013 Chicago Cubs as an example.

2013 Chicago Cubs

The Cubs won 66 and lost 96 games in 2013.  This means

W-L = Actual WAA = 66 – 96 = -30

Actual WAA is the WAA not estimated, the WAA that really happened.  We will call the WAA as estimated by Pythagorean Expection PE WAA.

In 2013 the Cubs scored 602 runs and gave up 689 runs.  Thus:

Rs = 602

Ra = 689

Based upon the simple PE formula stated above

PE Win% = (Rs)**2/(Rs**2 +Ra**2) = (602)**2/(603**2 + 689**2) = 0.433

#Wins = PE Win% * (Number Games) = 0.433 * 162 = 70.15

#Loss = (Number Games) – #Wins = 91.85

PE WAA = #Wins – #Loss = 70.15 – 91.85 = -21.7

There is a difference between estimated WAA (PE WAA) and Actual WAA.   This difference in the estimation happens because other factors also contribute to generating wins and losses.  We can guess at some of those factors like efficient field managers, players that choke under pressure, or simple bad luck but none of those factors are part of the formula we want to prove.

The only thing we know for fact is its error.

Error = | Actual WAA – PE WAA | = | -30 – (-21.7) | = 8.3

The summation of players who played for the Cubs in 2013 add up to the PE WAA (-21.7) and not the Actual WAA.   There is a proof of the formula used by this data model to compute WAA that shows the above to be true.

Now that we know how to calculate error we can run these numbers for each team in 2013, add them together and get a total error for all 30 teams.  In the next post we will show error results for 3 different variations of Pythagorean Expectation including the original, the one we showed in the above example.

MLB Relief staff ranking

Note:  This site experienced a 72 hour outage recently.  Currently there is no fail over site when the Internet goes out.

Below are the top 5 relief staffs in MLB.  This was computed by adding up all relievers listed for  each team and adding their WAAs and innings pitched.  WAAs are additive amongst any set of players.  The sum of all WAAs for an entire team must and always does (because the formulae have proofs) add up to the W-L delta for a team according to the Pythagorean Expectation derived formula that converts runs into winning percentage.

In this case we chose as our set of players all relievers for each team.

SDN 8.7 339.6
SEA 7.3 351.4
OAK 7.1 375.9
SFN 6.6 352.3
WAS 6.0 307.9

Although San Diego doesn’t have a very good team this year, their relief squad stands out as best in MLB.  First let’s do a Winning Percentage calculation for SDN as shown in the previous two posts:

Win% =  0.5*WAA/(number of games played) + 0.5

number of games played = 339.6/9 = 37.7

Winning % = 0.5*(8.7)/(37.7) + 0.5 = 0.615

Here are all relievers registered to having played for SDN this season:

Rank WAA IP ERA G W L Name_Tm Pos
1 2.2 33.0 1.09 33 1 0 Huston_Street_SDN PITCH
2 2.0 43.0 1.88 42 4 2 Joaquin_Benoit_SDN PITCH
3 2.0 45.0 2.00 49 3 3 Dale_Thayer_SDN PITCH
4 1.5 40.0 2.25 48 1 0 Alex_Torres_SDN PITCH
5 1.0 32.0 2.53 34 1 2 Kevin_Quackenbush_SDN PITCH
6 0.9 21.0 2.14 17 0 0 Blaine_Boyer_SDN PITCH
7 0.7 10.3 0.87 3 0 1 Jason_Lane_SDN PITCH
8 0.2 7.3 2.45 8 0 0 Troy_Patton_SDN PITCH
9 -0.0 2.0 4.50 1 0 0 Hector_Ambriz_SDN PITCH
10 -0.2 43.0 3.98 28 2 2 Tim_Stauffer_SDN PITCH
11 -0.7 32.7 4.68 36 0 2 Nick_Vincent_SDN PITCH
12 -0.7 30.3 4.75 16 1 0 Donn_Roach_SDN PITCH

Perhaps the biggest advantage the WAA weighting has over any other measure in baseball Sabermetrics is its ability to accurately compare not only individual players but sets of players; in this case each team’s relief staff.

In the last few posts we showed how to calculate winning % for a starting pitcher, an entire relief staff backing him up, and a batting lineup. We can take a harmonic mean of 2/3 starting pitcher to 1/3 relief staff winning percentages to get a pitching component percentage. Then we can take a harmonic mean of pitching and batting (lineup) components to get an overall winning percentage for a particular day. Compare this with the winning percentage derived for an opposing team and it’s possible to estimate a winning probability for each team where:

P(home team) + P(away team) = 1

But this is fodder for a future post.   The WAA  weighting value derived from this data model  makes it possible to make these kinds of calculations.

Update 8/22/2018:  We’re from the future to correct the above.  It was unclear when the above was written how to take relief value, lineup value, and starter value to compute a probability.  The above is not how it’s done.   If was attempted but it could never pass the historical test against a past lines dataset.  WinPct is included to provide context for the WAA value used for ranking.

Converting WAA to Winning Percentage Ctd.

In this post we’ll take a look at how to convert WAAs into a winning percentage for batters.  There might not be a lot of value doing this since the WAA value is much easier and a more accurate measure to compare and contrast different players.  Below are the current top three batters in MLB.

Rank WAA BA OBP PA RBI R Name_Tm Pos
4 5.9 0.302 0.390 454 76 72 Mike_Trout_ANA CF
5 5.6 0.294 0.344 393 79 54 Jose_Abreu_CHA 1B-DH
6 5.6 0.244 0.326 451 73 70 Josh_Donaldson_OAK 3B

The formula is the same as before:

Win% =  0.5*WAA/(number of games played) + 0.5

We know WAA but what is the number of games played for Mike Trout?  Batters use the following formula:

G = PA/38.3

The number 38.3 is considered by this model a baseball constant.  It represents the average number of plate appearances per game per team since 1980.  Like we use 9 innings per game to estimate the number of games for pitchers, the 38.3 PA/game is good enough to estimate the number of games for batters.  Since Mike Trout usually has 5 plate appearances per game it will take him 7 or 8 actual games to accumulate enough PAs to represent a single game.  A batting squad consists of 9 players and not all those players get an equal amount of plate appearances.    Now  we can calculate Mike Trout’s winning percentage by the following:

Winning Percentage = 0.5*(5.9)/(454/38.3) + 0.5 = 0.749

What does this mean?  Not much for a single player.  If a team had 9 Mike Trouts batting or simply let Mike Trout bat all the time with a pitcher like the three spotlighted in the previous post while playing an average squad, that team should win around 75% of the time.

Let’s take a look at the lineup yesterday for ANA.

WAA Name_Tm PA
2.7 Kole_Calhoun_ANA 279
5.9 Mike_Trout_ANA 446
3.4 Albert_Pujols_ANA 446
0.9 Josh_Hamilton_ANA 235
1.5 Erick_Aybar_ANA 409
0.9 Howie_Kendrick_ANA 436
-0.1 Efren_Navarro_ANA 71
0.4 David_Freese_ANA 309
0.4 Hank_Conger_ANA 187
16.0 TOTAL 2818

Winning Percentage = 0.609 = 0.5*16/(2818/38.3) + 0.5

At 63-41 Anaheim has a 0.605 winning percentage overall almost matching the winning percentage of the lineup they put out last night.  This suggests Anaheim’s pitching is around average which it is according to this.

Converting WAA to winning percentages can be useful for groups like lineups, relief staffs, and starting pitching.   The quality of starting pitching changes daily.  Lineups can also change on a daily basis introducing swings in winning percentages for that particular group that may differ from a team’s total accumulated WAA for batting.  Significant changes can occur when trades get made or good players get injured or return.  Having the ability to compute winning percentages of entire hitting and pitching staffs can be useful when determining probabilities in  head to head matchups.