The Simulation Part 1

It has been difficult to think about baseball these last few months.  Still have a Giancarlo Stanton article and a rebuttle to a Bill James article in the works and then there are these simulations that need to be done before next season.  We’ll describe these simulations  a little at a time.

The basic concept of tiers is not too complicated and first described here and below.

  1. average + 1 standard deviation
  2. average + 1/2 standard deviation
  3. average
  4. average – 1/2 standard deviation
  5. average – 1 standard deviation.

We compute a moving average WAA and standard deviation and separate all groups into tiers.  A starter is a single player, a lineup is the sum of WAA for those in the starting lineup, and relief is the sum of relievers listed on the roster.  This is estimated for past years using event data from retrosheet.org.

Throughout the year we described the concept of lineup-starter pairs.  There is also a lineup-relief pair as well.  These two pairs represent matchups for an entire game.   Since the lineup tier will be the same for both pairs we get the following tuples.

lineup-(starter-relief)  for visitors-(home) 
lineup-(starter-relief)  for home-(visitors)

There are 5 tiers for each lineup starter and relief making 125 different combinations.  There are two tuples per game and we know the number of runs scored for all games since 1970.  Each tuple will have its own run distribution which we will use to draw from in the simulation.

The simulation will run a number of iterations.  Each iteration will generate a random number which will pull a value out of lineup, starter, and relief to determine who wins that game.

Innings pitched by starters will vary by tier level.  Tier 1 starters pitch more innings than Tier 5 for obvious reasons.  The run value pulled from the starter distribution will also have innings pitched.  This will be used to calculate the relief runs.

Each simulated game will have two tuples.  Let’s run through an example.  Suppose we see a home team with a Tier 1 starter, Tier 5 lineup, and a Tier 1 relief.  The visiting team has a Tier 5 starter, Tier 1 lineup, and a Tier 5 relief.  The two game tuples are.

home 5-(1-1)
visitor 1-(5-5)

The simulation runs 10K, 100K, or even 1M iterations pulling numbers from the distribution.  Which ever lineup gets more runs wins that game.  Ties are discarded and we calculate a win percentage which can translate into our estimated line for the game.  We’ll run the above estimate when the simulation scripts are completed and post and update.

This line can then be compared with the real lines to determine whether or not this is a betting opportunity based upon a different set of requirements which will be described later.   From this we can determine whether or not this method can beat the lines and by how many percent based upon our historical data.

There are 125 different tuples.  Since the order of two tuples in a game does not matter that means the number of possible pairs of tuples is:

n*(n+1)/2 where n=125

125*126/2 = 7875 combinations

If we do 100K iterations per game pair that would be almost a billion games to simulate.

In subsequent parts we will explore different pairs perhaps using real games from last season that I kind of eyeballed and guessed at.  Since the later playoff games had such close matchups perhaps that’s what makes home/away field advantage/disadvantage more important.  We observed this in both ALCS and NLCS.  This will have to be explored when we go further down this rabbit hole.  Until then….