Category Archives: Information

Properties of a Value stat

The value stat generated by this data model, WAA, has additive and other properties that no other measure of value has.  In this post we’ll see how this all works with current data.  Since it’s May I’ll release the WAAs associated with players.  We’ll do ranking in a couple of days.

WAA has additive properties.  The WAA for a group of players is the sum of their individual WAA.  Let’s see how that works for the Cubs’ lineup today.

WAA Name_TeamID Pos PA
-0.25 Kyle_Schwarber_CHN LF 111
0.69 Kris_Bryant_CHN 3B 115
0.73 Anthony_Rizzo_CHN 1B 114
0.42 Ben_Zobrist_CHN RF 86
0.67 Addison_Russell_CHN SS 103
0.46 Jason_Heyward_CHN CF 94
0.13 Willson_Contreras_CHN C 72
-0.15 Jon_Lester_CHN P 11
-0.27 Javier_Baez_CHN 2B 66
TOTAL WAA=2.44 PA=772 Win%=0.561

The above table shows the CHN lineup today (5/2/2017) with their 2017 WAA as of yesterday.  Jon Lester at -0.15 is his batting WAA which is separate from his pitching WAA.   The total WAA for today’s lineup is WAA=2.44 which is above average.  But how far above average?  WAA can be converted into a Win/Loss percentage according to this post.  Below is the simple formula:

Win% =  0.5*WAA/(number of games played) + 0.5

After reading that article the Win% for the above tables is:

Win% = 0.5(2.44)*38.4/772 + 0.5 = 0.561

What does a 0.561% signify?  It just shows how far above average the lineup is.  If that lineup goes against a 0.561% pitching then what?   That would be a judgment call.  I haven’t figured out a mathematical way to model pitch vs. bat vs. bat vs. pitch ..etc. etc. to come up with a Win% that beats The Ouija Board.

The only point is it shows an accurate representation of the strength of a lineup.  I don’t have PHI lineup yet to compare but it doesn’t matter for this explanation.  There is plenty of season left to do a full analysis.  Here is what The Ouija Board says about today.

DATE 05_02 8:05_PM May_2_15:08:19 PHI CHN
LINEAWAY PHI [ 0.330 ] < 0.333 >
STARTAWAY 1.85 Jeremy_Hellickson_PHI
LINEHOME CHN [ 0.688 ] < 0.688 >
STARTHOME 0.36 Jon_Lester_CHN

The number next to the starting pitchers is their WAA.  Lester is above average so far but Hellickson is doing much better this season.  CHN is at around 2 – 1 favorites to win today at 0.688.  Standard home field advantage is 0.540 so the Cubs are getting a good premium again.

Here is Jeremy Hellickson’s career.

Rank WAA Name_TeamID Pos
2010 0.5 Jeremy_Hellickson_TBA PITCH XXXXX
2011 4.4 Jeremy_Hellickson_TBA PITCH +046+
2012 3.6 Jeremy_Hellickson_TBA PITCH +061+
2013 -5.3 Jeremy_Hellickson_TBA PITCH -009-
2014 -1.2 Jeremy_Hellickson_TBA PITCH -231-
2015 -2.5 Jeremy_Hellickson_ARI PITCH -078-
2016 2.1 Jeremy_Hellickson_PHI PITCH +158+
2017 1.8 Jeremy_Hellickson_PHI PITCH +015+
Total = 3.4

Since WAA has additive properties they can be added up year after year to get a total.  Hellickson is another slightly above average career pitcher and he’s off to a good start this year.  It will be interesting to see how this turns out.

What is a Value Stat?

If you look at the back of an old baseball card you’ll  sometimes see hundreds of numbers all aligned in rows and columns.  What does that all mean?  Since the movie Moneyball made Sabermetrics popular numbers have been multiplying like cockroaches.  OPS, BABIP, FIP, WAR, WRC+, and on and on and on and on.  It can make your head spin.

I don’t want to get into a critique of all these now because there are too many.  Here is my critique of FIP.  tl;dr FIP doesn’t predict anything and doesn’t mean anything.

This model’s definition of a value stat is one a General Manager uses.  A game stat is something Joe Maddon uses to manage a game.  Value stats can be tied to compensation.   Player contracts are an extremely complicated math model that I don’t know enough  to cover.  This model only provides a value stat in WAA for ranking purposes.  Sabermetrics has WAR.  Although there are many variations to how WAR is calculated, there is only one calculation to WAA and, like Batting Average and ERA, it will never change.

The Wins Above Replacement (WAR) measure uses the following

HITS —estimate—>RUNS —-estimate—->WINS

The basic HITS to RUNS estimate boils down to this:

Runs Estimated = ( Hits + Walks ) * ( Total Bases) / Plate Appearances

The above is the basic foundation of that estimation.  The Sabermetrics people have made it far more complicated than it needs to be.  If I get bored later this season I’ll do some error measurements on their math using the last 50 or 100 years of baseball data.  Once they have runs they estimate WINS from that.  There are two levels of error.

This models uses the following:

RUNS —-estimate—->WINS

We know the runs with 100% accuracy and estimate WINS using Pythagorean Expectation as defined by Bill James.  We know what the error is for Pythagorean Expectation.

Thus, this model cares not about hits, walks, strikeouts, ground outs, double plays, home runs, stolen bases, etc. etc. etc.  it only cares about runs.   When MLB determines who wins or loses a game they only care about the R column, not the H column or E column or any other column.  Runs are the currency that create wins.  We can divvy up runs with 100% accuracy even though it can be a harsh mistress to some, as we will see later on.

Game stats like K/9, OBP, WHIP, etc. are very useful for what Joe Maddon has to do to manage his players during a game.  He needs to know a pitcher’s (Walk + Hits) / Innings Pitched (WHIP) if that pitcher is coming into the game with Runners In Scoring Position (RISP).  He wants a pitcher that can throw strikeouts, perhaps throw 110mph, and not let up walks or hits very often because the game is on the line.  A value stat like WAR or WAA cannot tell you that.

If you want to argue who is better; Chris Sale or Adam Wainwright, you need to look at a value stat.  There are some limitations to WAA that we’ll get into later.  Until then.,

 

Introduction to Rankings Part 1

The formula that computes WAA took me around 6 months to figure out, partly because I have forgotten and had to relearn a lot of math and mostly because it became a puzzle to get all the books to balance.  WAA is a measure of player value to a team.  It is measured in Wins a player brings to a team much like WAR.  Unlike WAR, the WAA calculation has a mathematical proof.

Right now I’m not going to present the proof because it will complicate things.  There are  three years of posts explaining many aspects of this model.  You can also peruse any team up to 2014 here.  I’m two years behind updating that site because I need to do a proper database blah blah blah.   But I digress….  The efficacy of a ranking system is how accurate it ranks players.   We will compare WAA to WAR throughout this season.  WAR has no proof and although its value is conserved, it places 60% of its value onto Batters and 40% on Pitchers.  See this post.

WAA = W – L.  Its unit is wins.  If a team goes 81 – 81 for a season they have a WAA=0.  A zero means completely average.  If a player is at zero he isn’t adding wins to the average but most importantly he isn’t costing a team to go below average.

Let’s say there is a player who ended the season with a WAA=+10.  If the sum of WAA for the rest of his team equals zero, then that team will have ended the season 86-76 on the combined effort of the team, and the exceptional effort of that one player.

Here’s how the books must balance in the WAA calculation:

Team(WAA) = Sum(Player(WAA)) across all players playing for the team

where Team(WAA) = W – L

Sum(Player(WAA)) across all the league for pitchers = 0
Sum(Player(WAA)) across all the league for batters = 0

 The Cubs went 66-96 in 2013.  Their team WAA=-30, This means the sum of their players must add to -30 which means there weren’t a lot of superstars on that team.  Let’s look at the end of season rank for 2013.  Here is their team status line:

BAT PITCH Rs Ra W L UR LR TeamID
-69.6 -18.2 602 689 66 96 4.1 -3.4 CHN

Yikes.  Both BAT and PITCH under water by a lot.  But we should expect that.  Let’s look at their top ten players.

Rank WAA Name_TeamID Pos
+061+ 3.63 Travis_Wood_CHN PITCH
XXXXX 1.57 Nate_Schierholtz_CHN RF
XXXXX 1.57 Alfonso_Soriano_CHN LF
XXXXX 1.26 Blake_Parker_CHN PITCH
XXXXX 1.18 Matt_Garza_CHN PITCH
XXXXX 1.11 Donnie_Murphy_CHN 3B
XXXXX 0.90 Scott_Feldman_CHN PITCH
XXXXX 0.86 Pedro_Strop_CHN PITCH
XXXXX 0.82 Scott_Hairston_CHN RF
XXXXX 0.78 Dioner_Navarro_CHN CR

Players are ranked according to WAA which is the weighting factor.  To put the above numbers in perspective here were the top three players in MLB in 2013.

Rank WAA Name_TeamID Pos
+001+ 11.28 Clayton_Kershaw_LAN PITCH
+002+ 11.09 Miguel_Cabrera_DET 3B
+003+ 10.16 Chris_Davis_BAL 1B

At Rank +061+ Travis Wood is the only Cub player in the top 200 that season.  That’s pathetic.  An average team will have 6 or 7 players in the top 200.  Rank XXXXX is purgatory, neither in the top or bottom 200.  Here were the players dragging the Cubs down the most that season.

Rank WAA Name_TeamID Pos
XXXXX -1.26 Carlos_Marmol_CHN PITCH
XXXXX -1.30 Julio_Borbon_CHN CF
-198- -1.36 Junior_Lake_CHN LF-CF
-187- -1.45 Luis_Valbuena_CHN 3B
-160- -1.70 Shawn_Camp_CHN PITCH
-108- -2.08 Welington_Castillo_CHN CR
-087- -2.27 Jeff_Samardzija_CHN PITCH
-045- -3.17 Darwin_Barney_CHN 2B
-019- -4.49 Edwin_Jackson_CHN PITCH
-010- -5.23 Starlin_Castro_CHN SS

Starlin Castro clocked in at being the 10th worst player in MLB that season.  It takes a lot of playing time to dig that deep of a hole and unfortunately the Cubs had a manager who insisted on leading off Castro every game.  He’s doing pretty well this season and he’s one of the players we’ll follow as we go along.  Here’s an article written 9/2014 examining whether or not to trade him.

That’s enough for now.  The Cubs are doing well this season but have a much different start than last season.  Last season they coasted as an average, 0.500 team through the second half of May until All Start Break but had a phenomenal April.  There won’t be any Cubs in the top ten.  This season WAS is tearing up the league and they have the most.  For the next week I’ll slowly explain more as I think of it.  Until then ….

Team status Part 2

The purpose of this post is to explain the BAT and PITCH fields in team status lines that will be used here throughout the year.  This isn’t very complicated.  The use of this variable is to simplify clutter in stats.  The following tables lists the top 9 teams in baseball based upon their combined BAT+PITCH which is the TOTAL column.

TOTAL BAT PITCH W L TeamID 04-19-2017
26.5 13.2 13.3 10 5 NYA
16.5 14.2 2.3 10 6 ARI
16 8.6 7.4 9 6 CIN
16 3.7 12.3 8 8 LAN
12.9 -3.4 16.3 7 7 MIN
12.4 15.1 -2.7 9 5 WAS
9.1 5.7 3.4 8 7 NYN
8 4.7 3.3 8 7 CHN
7.5 1.2 6.3 10 5 HOU

This is data up to and including 4/19/2017 games.  BAT is calculated  as follows:

BAT{RAA} = Rs(Team) – Rs(Team Average) – LR
PITCH(RAA) = Ra(Team Average) – Ra(Team) – UR

Rs = Runs scored
Ra = Runs scored against
LR = Lucky Runs above average
UR = Unearned runs above average

UR = (Total League Unearned Runs)/(number of teams) – UR(Team)
LR =  LR(Team) – (Total League Lucky Runs)/(number of teams)

The number of teams is 30 and in the first half of the 20th century it was 16.  UR(Team) is the total unearned runs a team has incurred.  Ditto for LR(Team).

UR and LR are necessary to balance the books and they do affect team BAT and PITCH.  Lucky Runs are when a run scores from something like a wild pitch or balk where no one gets an RBI.  There are very few of these but they still count in determining who wins a game and need to be accounted for.  To keep things simple just ignore them for now.  The scripts that make these tables keep track of that as well as integrity checks in case the daily stat dataset is corrupt.

Since every time a run scores it generates a run scored against:

Rs(Team Average) = Ra(Team Average) = R(League)/(number of teams)

A team that scores a lot of runs will have a high BAT(RAA) and a team that lets a lot of runs scored against them can have a negative PITCH(RAA) and visa versa.  A completely average team will have BAT=0 and PITCH=0 or BAT+PITCH=0.   In the above table:

TOTAL = BAT+PITCH = Rs(Team) – Ra(Team)

The above is commonly referred to as the team run differential.  This model separates batting and pitching because it makes it more clear with a single number where a team’s strength lies.  UR and LR were left out in the above to show the gist of this  calculation.

Run differential is used in calculating Pythagorean Expectation which is a long time Bill James’ invented formula to link runs to wins. It is used in this data model which will be explained more when we have enough data to start ranking players.  The only columns in the above table that matter when MLB chooses who goes to the playoffs are the W and L.  This early in the season there are some wild swings which is why ranking players now makes no sense and is pointless.

That is all for now. Until then ….

Team status

Team status lines give the status of a team in a single line.  Here are the Cubs after 4/16/2017,

BAT PITCH Rs Ra W L UR LR TeamID
-4.0 10.6 47 43 6 6 -1.2 -1.4 CHN

The line consists 8 variables and TeamID which is franchise ID.  LAN, (aka LA Dodgers) also represents Brooklyn Dodgers if we go back in time.  The eight variables are:

  • BAT – Runs Above Average (RAA) for hitting
  • PITCH – Runs Above Average for pitching
  • RS – Runs scored
  • RA – Runs scored against
  • W L – Wins – Loss
  • UR – Team Unearned Runs above average
  • LR – Team Lucky Runs above average

A lucky run is when a run scored but the batter did not get an RBI.  MLB counts these runs in the final game score.  Unearned Runs are those not charged to a pitcher due to error.  These runs must be counted or the books don’t balance in this data model.

When any of these Above Average figures hits zero that’s completely average.  If a team plays completely average for an entire season they end with a W-L of 81-81.  The Cubs as a team  are slightly below average with UR right now but it’s way too early to determine that.

BAT is a little underwater, PITCH above water at +10.6 and the Cubs are completely average right now at 6-6.

When I compile the 2016 event data I’ll have dailies and we can compare and contrast to last year and any year for that matter.  In two weeks I may release player rankings and introduce the concept of WAA again and compare it to WAR.  Until then ….