Tag Archives: baseball constants

Converting WAA to Winning Percentage Ctd.

In this post we’ll take a look at how to convert WAAs into a winning percentage for batters.  There might not be a lot of value doing this since the WAA value is much easier and a more accurate measure to compare and contrast different players.  Below are the current top three batters in MLB.

Rank WAA BA OBP PA RBI R Name_Tm Pos
4 5.9 0.302 0.390 454 76 72 Mike_Trout_ANA CF
5 5.6 0.294 0.344 393 79 54 Jose_Abreu_CHA 1B-DH
6 5.6 0.244 0.326 451 73 70 Josh_Donaldson_OAK 3B

The formula is the same as before:

Win% =  0.5*WAA/(number of games played) + 0.5

We know WAA but what is the number of games played for Mike Trout?  Batters use the following formula:

G = PA/38.3

The number 38.3 is considered by this model a baseball constant.  It represents the average number of plate appearances per game per team since 1980.  Like we use 9 innings per game to estimate the number of games for pitchers, the 38.3 PA/game is good enough to estimate the number of games for batters.  Since Mike Trout usually has 5 plate appearances per game it will take him 7 or 8 actual games to accumulate enough PAs to represent a single game.  A batting squad consists of 9 players and not all those players get an equal amount of plate appearances.    Now  we can calculate Mike Trout’s winning percentage by the following:

Winning Percentage = 0.5*(5.9)/(454/38.3) + 0.5 = 0.749

What does this mean?  Not much for a single player.  If a team had 9 Mike Trouts batting or simply let Mike Trout bat all the time with a pitcher like the three spotlighted in the previous post while playing an average squad, that team should win around 75% of the time.

Let’s take a look at the lineup yesterday for ANA.

WAA Name_Tm PA
2.7 Kole_Calhoun_ANA 279
5.9 Mike_Trout_ANA 446
3.4 Albert_Pujols_ANA 446
0.9 Josh_Hamilton_ANA 235
1.5 Erick_Aybar_ANA 409
0.9 Howie_Kendrick_ANA 436
-0.1 Efren_Navarro_ANA 71
0.4 David_Freese_ANA 309
0.4 Hank_Conger_ANA 187
16.0 TOTAL 2818

Winning Percentage = 0.609 = 0.5*16/(2818/38.3) + 0.5

At 63-41 Anaheim has a 0.605 winning percentage overall almost matching the winning percentage of the lineup they put out last night.  This suggests Anaheim’s pitching is around average which it is according to this.

Converting WAA to winning percentages can be useful for groups like lineups, relief staffs, and starting pitching.   The quality of starting pitching changes daily.  Lineups can also change on a daily basis introducing swings in winning percentages for that particular group that may differ from a team’s total accumulated WAA for batting.  Significant changes can occur when trades get made or good players get injured or return.  Having the ability to compute winning percentages of entire hitting and pitching staffs can be useful when determining probabilities in  head to head matchups.

What is WAR?

Introduction

WAR means Wins Above Replacement and has become a very popular stat.  Like this data model’s Wins Above Average, WAR tries to ascribe wins to players with a resulting number that can be used to rank those players amongst each other.  There are several variants of this WAR stat that can differ greatly from site to site.   These next few posts will  examine WAR in the context of its results and not dwell on its mathematical basis or lack thereof.  There are no mathematical proofs posted for any  WAR variants.

In my analysis of WAR I’ll focus on  results from  baseball-reference.com which is an excellent site for any kind of historical baseball research.  They include WAR valuations in their tables which I  will use for comparison with this data model’s WAA on a player and team basis.  In the end it’s results that matter.  Which stat can best discern the myriad of baseball statistics into a single ranking value?

Here’s a blurb from Baseball-Reference.com WAR Explained

There is no one way to determine WAR. There are hundreds of steps to make this calculation, and dozens of places where reasonable people can disagree on the best way to implement a particular part of the framework. We have taken the utmost care and study at each step in the process, and believe all of our choices are well reasoned and defensible. But WAR is necessarily an approximation and will never be as precise or accurate as one would like.

This is not true with this data model’s WAA.  There is only one way to calculate WAA and there are mathematical proofs to describe that calculation.   Analyzing the results of WAR will provide for a better understanding into the math behind WAA.

Let’s get Started

My first question from a macro level wondered about the sum of all player WAR values for each season.  What did WAR add up to?  I chose to limit the study from 1970-2013 which should provide a deep enough data set to figure out what they’re doing.   WAR Totals 1970-2013 is a large table I made listing league WAR totals for batters and pitchers.  To keep things short here are a couple random years pulled from that table:

Year BAT % PITCH % #Teams BAT/Team PITCH/Team
1970 472.7 0.59 328.0 0.41 24 19.7 13.7
1971 471.7 0.59 327.7 0.41 24 19.7 13.7
1987 518.9 0.59 355.3 0.41 26 20.0 13.7
1988 515.7 0.59 353.7 0.41 26 19.8 13.6
2012 598.7 0.59 409.9 0.41 30 20.0 13.7
2013 598.6 0.59 410.3 0.41 30 20.0 13.7

The first thing I noticed is that batters make up 60% of total league WAR, pitchers 40%.  This is consistent every year.   WAR  combines the fielding class with the batter class   I  assume WAR places pitchers and batters as equal entities which would lead to a logical conclusion that  WAR values fielding 20%, batting 40% and pitching 40%.  A team average WAR for batting is almost exactly 20.  It is unclear if someone determined that would make for a good average and worked backwards or it just somehow came out that way.

From the historical table I can tell WAR is conserved.  There are only so much WAR value that can go around to each player.  In WAA the sum of all pitchers and batters results in zero.  With WAR, for batters in 30 team leagues, the resulting sum is 600, 400 for pitchers.  Out of the 600 league total for batters, 1/3 of that, or 200 is reserved for fielding.

At least WAR adds up to something.  In future posts on this topic we will determine its efficacy.

RISP Averages

The table below runs through the last 24 years to calculate 1) what percentage of runs are scored in situations with runners in scoring position and 2) what is the percentage of plate appearances that result in runners in scoring position? According to the table below and after tabulating all 24 years of data the two RISP averages are;  #1 is almost exactly 3/4 and  #2 is almost exactly 1/3.

Year RISP Runs # Runs % RISP PAs # PAs %
1990 13859 17919 0.77 52681 160316 0.33
1991 13994 18128 0.77 52186 160746 0.32
1992 13534 17341 0.78 52807 160545 0.33
1993 16149 20862 0.77 58151 174564 0.33
1994 11866 15751 0.75 41791 124483 0.34
1995 14864 19554 0.76 52496 156703 0.34
1996 17165 22832 0.75 59086 177261 0.33
1997 16256 21602 0.75 58218 175541 0.33
1998 17589 23296 0.76 62816 188280 0.33
1999 18498 24690 0.75 63963 189692 0.34
2000 18639 24969 0.75 63801 190261 0.34
2001 17007 23199 0.73 60788 186976 0.33
2002 16884 22408 0.75 61084 186615 0.33
2003 17200 22978 0.75 61237 187449 0.33
2004 17342 23376 0.74 61806 188539 0.33
2005 16711 22325 0.75 60363 186292 0.32
2006 17575 23599 0.74 61943 188071 0.33
2007 17732 23322 0.76 62470 188623 0.33
2008 16944 22585 0.75 61270 187631 0.33
2009 16763 22419 0.75 61202 187079 0.33
2010 16074 21308 0.75 60062 185553 0.32
2011 15564 20808 0.75 58949 185245 0.32
2012 15285 21016 0.73 57609 184179 0.31
2013 14874 20255 0.73 57110 184873 0.31
TOTAL 388368 516542 0.75 1403889 4285517 0.33

Probability Distribution of Batter Counts

This post will describe a solution to a question that had to be derived from event data.

What is the Probability Distribution of Batter Counts?

One day I wondered what was the most frequent count a batter had before either making an out or getting a hit or walk.  One of the fields in event data shows the final batter count of each plate appearance.  There are 12 possible batter counts from 0-0 to 3-2.  This script simply runs through all event records and tabulates each type of count then divides by total number of plate appearance.  I chose to just do the 2012 season although I could have included more seasons.  I think the data set is deep enough from one season to come up with a pretty accurate probability distribution.    This distribution can only be derived from event data.

The table below shows a probability distribution of the 12 different counts.  The sum of the P column equals 1.  The most common final batter count is 1-2 followed closely by 2-2 then 3-2.   This makes sense since a batter must have two strikes in order to strike out.   The counts 3-0 and 2-0 are least likely.   The below distribution can be used for betting purposes

Ct    Total       P
32    23099    0.13
31    8351    0.05
30    3828    0.02
22    25468    0.14
21    9677    0.05
20    4554    0.02
12    26943    0.15
11    16145    0.09
10    12716    0.07
02    16026    0.09
01    17128    0.09
00    20291    0.11

How many hits does it take to score a run?

Short answer: 2

There are many ways of deriving this.  You can get the total hits and total runs by adding all player stats together or all team stats together.  You can add up all game line scores together too.  Having the event data I chose to just add up hits and runs from the event entities.  To run a longitudinal study like this requires calculating each season separately then adding up the results.  I chose to gather data from 1980 to present.   With this deep of a data set the number of hits per run is exactly 2 and that number doesn’t vary much at all from season to season.  This number can be considered a constant.

TOTAL 1317126 671115 2.0
1980 38004 18054 2.1
1981 24059 11147 2.2
1982 37437 18111 2.1
1983 37363 18170 2.1
1984 37251 17922 2.1
1985 36389 18216 2.0
1986 36751 18545 2.0
1987 37802 19882 1.9
1988 36208 17378 2.1
1989 36283 17403 2.1
1990 36811 17919 2.1
1991 36089 18128 2.0
1992 36048 17341 2.1
1993 41074 20862 2.0
1994 29731 15751 1.9
1995 36968 19554 1.9
1996 42316 22832 1.9
1997 41465 21602 1.9
1998 44488 23296 1.9
1999 45326 24690 1.8
2000 45244 24969 1.8
2001 43876 23199 1.9
2002 43267 22408 1.9
2003 44039 22978 1.9
2004 44511 23376 1.9
2005 43975 22325 2.0
2006 45062 23599 1.9
2007 44960 23322 1.9
2008 43963 22585 1.9
2009 43511 22419 1.9
2010 42545 21308 2.0
2011 42257 20808 2.0
2012 42053 21016 2.0