The Simulation Part 3

It has been 9 months since Part 2 of this series and over a year since Part 1.  Part of this was procrastination but also since verification could mean results from simulation can’t beat Vegas lines there was little motivation to find out and get depressed.  But the 2019 baseball season is upon us and this has to be done so let’s get to it.  Note: Handicapping doesn’t begin here until May due to lack of enough current year data in April.

Verification isn’t complete but we have high level results showing error.  This post will show that error among multiple systems; including the fivethirtyeight ELO system, this data model’s simulation, the flip of a coin model, and much more.

tl;dr Cut the the chase: According to this error calculation ELO  surpasses Vegas lines,  our simulation surpasses ELO.   An adjustment had to be made from the simulation that produced probability results posted here last season and described in Parts 1 and 2.

Although Vegas starts handicapping on day 1 of an MLB season, this model does not have enough data to make proper evaluations until May (around 1/6 of a season).  This means a betting season for each team is around 133 games give or take a few with about 150 days (5 months) of actual betting.

Each system generates a break even probability.  As we saw in the last couple years of handicapping the beginning of each Cubs series, Vegas bet for and bet against add to more than 1.  The extra is the house spread which guarantees the bookie always gets a cut as long as they take in equal bets from each side.  Casinos in Vegas never gamble on anything.

In this error calculation there are two types of betting strategies; bet for and bet against a team every game.  If generated probabilities are accurate, each strategy would break even at the end of a season.  Predicted probabilities are too low if it leads to winning, too high if it leads to losing.

Clarification 3/30/2019:  The above wording is kind of confusing.  If you BETFOR all 30 mlb teams you will also BETAGAINST all 30 teams and mathematically that has to equal 0 except for Vegas where P(win) != 1 – P(lose).    The error ranked in tables below are betting for and against one team over ~133 events.  That error should be as close to 0 as possible.

This model’s handicapping season starts in May and runs around 133 games for each team.  This is nowhere near what would be considered an infinite set of events so there will be some variation, even for coin flip method where we know the actual probability.   The absolute value of error from all 30 teams gets added together and that gets divided by total bet to calculate percentage error.

Now let’s traverse the various systems examined, explain them, and look at their error for the 2018 season.  There will be a lot of tables so skip to the end for a summary.

Coin Flip

In the coin flip exercise we flip a coin 133 times, representing the number of events in a baseball season for this handicapping model, to put the following error percentages into perspective.  Even though we know a coin flip is exactly 50% heads, 50% tails, after 133 flips it rarely comes close to to 50/50.  After running 10000 simulated 133 event seasons the average error converged to 0.069 with a 0.010 standard deviation.  So even if a system can produce exact probabilities for a game, it cannot exceed 0.069 over the long term.

Clarification 3/30/2019: Just noticed that 133 events flipping a coin will mean no coin flipping season will end up with a perfect 50/50 split.  Changed that to 134 and the numbers didn’t change.

5050

The 5050 system assigns each team 50% probability of winning.  If you have no information  about either team, with two outcomes the probability is a coin flip.  Below are the top highest error rates for this system.

TeamID Type TeamID BET TYPE Error
2018 5050 BAL BETAGAINST 0.414
2018 5050 BAL BETFOR -0.413
2018 5050 BOS BETFOR 0.308
2018 5050 BOS BETAGAINST 0.308
2018 5050 HOU BETFOR 0.267
2018 5050 HOU BETAGAINST -0.267
2018 5050 OAK BETFOR 0.248
2018 5050 OAK BETAGAINST -0.248
2018 5050 KCA BETFOR -0.248
2018 5050 KCA BETAGAINST 0.248

Baltimore had a WinPct of 0.290 for 2018 which is pretty horrible and puts it on the top of this and almost every list below.  Betting against Baltimore would net you $41.4 on every $100 bet and almost the same lost betting for them.  This is because the real probability for BAL winning each game between May and September was way below 50% on average.   Boston was the opposite and won way more games than 50% so betting for them would net you around $30 for every $100 bet — assuming someone would take this kind of bet.  Vegas surely won’t.

Home Field Advantage

The home field advantage exercise assigns a probability of 54% to home teams, 46% to away teams.  This is a historical average.  If you have no other information other than where they are playing then this may be a good probability to use.  Let’s see.

TeamID Type TeamID BET TYPE Error
2018 homefield BAL BETFOR -0.419
2018 homefield BAL BETAGAINST 0.411
2018 homefield BOS BETAGAINST -0.312
2018 homefield BOS BETFOR 0.306
2018 homefield HOU BETFOR 0.287
2018 homefield KCA BETFOR -0.254
2018 homefield HOU BETAGAINST -0.254
2018 homefield OAK BETFOR 0.253
2018 homefield OAK BETAGAINST -0.244
2018 homefield KCA BETAGAINST 0.244

The above spread looks very similar to the 5050 system above.  The TOTAL table at the end will show they are virtually tied in error which could indicate that in general there is no such thing as home field advantage which is kind of counter intuitive.  Still not sure how to interpret this but for now the numbers and code to make these list looks correct.

Again, the usual suspects are shown in the top ten.

Vegas

Vegas is the gold standard.  If your system can beat Vegas you can literally win money in the long term.  Break even probabilities are derived from the last daily snapshot for each game between May and September.  In Vegas BETFOR probability will be different than  BETAGAINST probability.  This study only cares about break even probability assignments so it doesn’t matter.   If you bet the BETFOR probability an infinite number of times you will break even, unless there is error.

The following shows the top error betting strategies by team for Vegas lines.

Year Type TeamID BET TYPE Error
2018 vegas BAL BETFOR -0.298
2018 vegas OAK BETAGAINST -0.258
2018 vegas BOS BETAGAINST -0.233
2018 vegas OAK BETFOR 0.213
2018 vegas TBA BETAGAINST -0.172
2018 vegas MIL BETAGAINST -0.169
2018 vegas TBA BETFOR 0.166
2018 vegas COL BETAGAINST -0.159
2018 vegas BAL BETAGAINST 0.156
2018 vegas ARI BETAGAINST 0.145

The error shows a % +/- , + means you win money doing that, – means you lost that much using that strategy.   The top ten error % in Vegas are much less than the above two betting models which means they most likely will lose money — which we’ll see how much in subsequent parts to this series.

Even Vegas bettors were wrong betting for Baltimore who lost what could be a record number if games.  Betting against BOS and OAK were big losers last season too.

ELO

ELO surprised me in that it beat Vegas in this calculation.  Don’t know how Nate Silver calculates this but it’s pretty good and we’ll see how much money it makes in Vegas.

Year Type TeamID BET TYPE Error
2018 elo BAL BETFOR -0.254
2018 elo OAK BETAGAINST -0.211
2018 elo OAK BETFOR 0.180
2018 elo BAL BETAGAINST 0.165
2018 elo ARI BETAGAINST 0.162
2018 elo COL BETFOR 0.147
2018 elo COL BETAGAINST -0.143
2018 elo ARI BETFOR -0.134
2018 elo MIL BETFOR 0.132
2018 elo BOS BETAGAINST -0.130

The totals will come after the next system which tops them all.

The Simulation

The Simulation,  the subject of this series, generated using data and theories from this data model beats ELO and Vegas.  Note: The algorithm behind this simulation changed from last season.  it still uses away l-s, away l-r, home l-s , home l-r tiering but there was something missing that needed to be added.

Year Type TeamID BET TYPE Error
2018 sim ARI BETFOR -0.169
2018 sim TBA BETFOR 0.165
2018 sim OAK BETAGAINST -0.165
2018 sim ARI BETAGAINST 0.156
2018 sim PHI BETAGAINST 0.154
2018 sim BAL BETFOR -0.153
2018 sim LAN BETAGAINST -0.143
2018 sim LAN BETFOR 0.142
2018 sim CIN BETFOR 0.136
2018 sim PHI BETFOR -0.121

The above spread is much different than all of the above models.

TOTAL

TeamID Type TeamID BET TYPE Error
2018 sim TOTAL BETAGAINST 0.058
2018 sim TOTAL BETFOR 0.061
2018 elo TOTAL BETAGAINST 0.064
2018 elo TOTAL BETFOR 0.074
2018 vegas TOTAL BETAGAINST 0.074
2018 vegas TOTAL BETFOR 0.081
2018 homefield TOTAL BETAGAINST 0.150
2018 5050 TOTAL BETAGAINST 0.152

The above table is the rundown of all the systems and their error using data from 2018.  Although errors differ in 2016 and 2017, the ranking is not different. Absolute values of errors were added together and divided by number of events (games) to get the total error %.

The next part to this series will simulate betting seasons for 2016, 2017, and 2018   This data model has a 6 daily snapshot dataset of all Vegas lines from July 2014 until present and end of betting snapshot back to 2011.  For now only the last three years will be examined here.  Eventually all years will be compiled.  Until then ….