Pythagorean Expectation

Pythagorean Expectation bridges the divide between runs and wins and is defined simply by the following formula.

Pythagorean Expectation

One theorem for this model is as follows:

SUM(Players WAA) = Team WAA

The above states that the sum of all WAAs accumulated for each player playing for a specific team will add up to the W-L delta for that team at any point in the year and, especially, at the end of the year.  This is a consistency check for the model and guarantees that wins and losses get divvied appropriately.

Obviously we have no crystal ball that can determine exactly how many wins and losses a team will have.  Using math we only know a team’s run scored and runs against totals.  From there people have developed formulae to bridge from mathematical knowns, runs,  to expected mathematical unknowns, wins.  We may know how many wins a team has made but in the soup of baseball statistics for individual players we only know runs.  Bridging runs into wins must be estimated.

The  Pythagorean Expectation formula stated above returns Wins as a winning percentage.  In order to convert that into delta W-L  units used by Team WAA we use the following formula:

G = number of games

PE(Win%) = Result from Pythagorean Expectation formula

Team WAA = PE(Win%) * G – ((1 – PE(Win%)) * G )

Team WAA = ( 2 * PE(Win%) – 1 ) *  G

Thus, the sum of all player WAA accumulated for each team must equal the Team WAA as estimated by Pythagorean Expectation.

This baseball model bases all its mathematics on formulae with proofs.  Pythagorean Expectation has a proof according to this paper:  A Derivation of the Pythagorean Won-Loss Formula in Baseball. The math is a little complicated and over my head.  It basically derives the formula based upon the assumption that (from the abstract)

…We provide a theoretical justification for this formula and value of c by modelling the number of runs scored and allowed in baseball games as independent random variables drawn from Weibull distributions with the same b and c but different a…

He assumes a particular distribution which is fine.  Since he’s not around to walk me through this proof I chose to prove (or more accurately analyze) this PE formula brute force by running it through all games from 1900 until present.  The goal was to estimate error.  Some sites will take hubris upon their math claiming that their formula is truth and reality is defined by luck.  This model assumes reality is truth and our mathematics can only estimate it within a certain margin of error.

In the next post I’ll show the results of my analysis comparing the Bill James’ version of the forumla (shown above), the one that uses 1.83 instead of 2 as an exponent, and the Pythagenpat version which (spoiler alert) turned out to be the most accurate estimating winning % using runs scored and runs scored against.