# Simulation Reboot Part 3

In order to test the integrity of the database used in simulation we need to run tests.  Without accurate data or bugs in scripts the estimated probability it produces is inaccurate.  In this part we’ll look at Tier Combo data from three baseball eras; 2000-2019, 1980-1999 and 1950-1969 to test the integrity of this data model.

Real team WAA, using real team wins and losses, the only stat in baseball that determines who makes the playoffs, was tiered in Part 2 of this series.  There is no dispute over real team WAA but there may be dispute over how this data model calculates it for players.  This exercise will deomnstrate if the player WAA and theories espoused by this data model has any merit.

A baseball season is much like any long race like a running marathon, Tour de France, or Indy 500.   Everyone is equal at start and as the race proceeds contestants become more and more separated where winners and losers and those mediocre become more and more defined.

Real team WAA is simply wins – losses.  This data model calculates and assigns WAA to players where the sum of WAA for all players on a team equals that team’s win/loss record.  In April and much of May not only are teams more bunched together with real team WAA, so are players making tiering much more error prone.  This model doesn’t start handicapping now until day 60 which is around third week in May nowadays.  This allows for standard deviations for lineups, starters, and relief squads used to calculate tiers to increase — meaning teams are separated enough to somewhat determine who is truly good this season and who is not good.

Much like marathons or Indy 500s, teams and players often crash and burn by the end of season.  This model quickly adjusts to reflect that.  Stats like batting averages do not.

There are two types of tier combos used in simulation; lineup -> starter and lineup -> relief.  Each game contains two pairs; one pair for away team and a pair for home team.

Tier combos are calculated by subtracting the pitching component tier number (starter or relief) from the lineup tier number.  Tier numbers are calculated by this simple formula:

Tier Number = 2 * ( WAA – league WAA average ) / league standard deviation

WAA for a lineup is the sum of player WAA for that lineup.  WAA league average and standard deviation is a running average of 30 teams’ last 3 lineups ( 90 lineups ).  A snapshot is taken at the beginning of each day, then averages and tier numbers for each team are calculated.

WAA for starters rely on a single player.  WAA for relief is the sum of a relief squad.   Relief squads are estimated from event data and are pretty accurate.

Tier numbers are floating point numbers.  When subtracted to make a tier combo they get rounded up or down to make an integer.  Right now tier numbers have a range of -4 to +4 and tier combos have a range of -6 and +6.  The simulator only cares about tier combos.

The run used to make the below tables looks at all games between 6/1 and 8/31.  Tiers fluctuate too much in April and May and in September player expansion can distort roster value.  Although we may handicap games in September and late May, we’re sticking to a much narrower window for the dataset simulation draws from.

The below tables show all the tier combo sets from -6 to +6 with columns runs/inning, number of innings pitched per game for both the lineup -> relief and lineup-starter.

First let’s look at the modern era from 2000-2019 which encompasses around 25,000 baseball games from 6/1 to 8/31.

### 2000 – 2019 Tier Combos

TC Lineup -> Relief Lineup -> Starter
R/Inn Outs R/Inn Outs
-6 0.359 8.76 0.353 19.86
-5 0.391 8.93 0.372 19.40
-4 0.390 8.86 0.409 18.93
-3 0.415 9.07 0.432 18.47
-2 0.424 9.14 0.449 18.19
-1 0.429 9.09 0.473 17.88
0 0.442 9.21 0.488 17.71
1 0.462 9.38 0.512 17.42
2 0.470 9.30 0.514 17.47
3 0.488 9.47 0.534 17.26
4 0.490 9.37 0.546 17.05
5 0.526 9.62 0.585 16.90
6 0.561 9.60 0.600 16.87

Tier Combo of -6 is a terrible lineup facing a very good relief squad or starter.  The opposite is true for a Tier Combo of +6.  The above shows runs per inning for starters goes from 0.353 at TC = -6 to  0.600 per inning at +6, the best lineups vs. worst starter.  Runs per innings increase almost the same with the lineup -> relief combos.

The number of outs for starters goes from 19.86 outs per game with the best starter facing the worst lineups down to 16.87 outs for the worst starter facing the best lineups.  Divide by 3 to get innings.  Outs per game for relief does not vary much between -6 and +6 probably due to the number of outs a relief staff must pitch has more to do with the starter than the value of the relief squad.

The number of runs given up by relief is much less than by starters which should be expected.  Tier Combo 0 is even steven between lineups and relief or starter.  The starter runs per inning is almost exactly league average for this 20 year span.

All runs counted for pitchers above are earned runs.  When determining who wins a baseball game, the commissioner counts unearned runs equally with earned runs.  This model counts and tiers  unearned runs separately for use in simulation because all runs must be accounted for to make the books balance here.  A pitcher should not be blamed for runs not his fault and an official scorekeeper keeps track of that for every play in every game since the beginning of baseball.

The next table will show the 1980 to 1999 era.

### 1980 – 1999 Tier Combos

TC Lineup -> Relief Lineup -> Starter
R/Inn Outs R/Inn Outs
-6 0.361 8.04 0.355 20.84
-5 0.378 8.40 0.374 20.44
-4 0.392 8.11 0.390 19.99
-3 0.375 8.19 0.409 19.51
-2 0.404 8.10 0.435 19.18
-1 0.416 8.12 0.456 18.87
0 0.418 8.41 0.466 18.77
1 0.449 8.27 0.477 18.46
2 0.465 8.49 0.500 18.37
3 0.460 8.24 0.503 18.25
4 0.458 8.67 0.536 17.92
5 0.504 8.63 0.532 17.92
6 0.495 8.92 0.553 18.03

The league had 26 teams for most this era and went to 30 teams in 1998 which means less pitchers.  A 30 team league will have around 150 starters, a 26 team league 130.  The above shows much narrower differences between -6 and +6 tier combos for both relief and starter which should be expected because talent is more concentrated.

This can be a problem in simulation that is still a work in progress.  As we go back to 1950-1969 we get to 16 team leagues with around 1/2 the number of players.  It may not be possible without some kind of adjustment to pull values from a tier combo in a 24 or 16 team league when we’re handicapping a 30 team league with much higher disparity of talent.

As we go back in time starters pitch more outs and relief less.  This means we can’t simply pull a pitchers innings pitch/earned runs from an early era and use that directly in simulation either.

Below is a look at the Tier Combo spread from 1950 to 1969.

### 1950 – 1969 Tier Combos

TC Lineup -> Relief Lineup -> Starter
R/Inn Outs R/Inn Outs
-6 0.320 7.57 0.325 22.06
-5 0.361 6.99 0.344 21.50
-4 0.358 7.14 0.356 21.13
-3 0.365 6.87 0.380 20.48
-2 0.385 7.28 0.389 20.19
-1 0.414 7.22 0.399 19.97
0 0.416 7.23 0.416 19.78
1 0.433 7.69 0.439 19.35
2 0.425 7.72 0.443 19.21
3 0.478 7.84 0.458 19.04
4 0.493 7.92 0.479 19.03
5 0.485 8.41 0.504 18.35
6 0.560 8.63 0.502 18.65

The above are averages.  When looking at % of 9 innings pitched by starters it skyrockets almost an order of magnitude (10x)  higher than modern era baseball.  Runs/inning are even more constricted with mostly 16 team leagues.

In past years this data model pulled data from 1970 – present without any alteration.  This probably introduced error even though it beat Vegas albeit not by enough to advertise.

Adjustments will have to be made on an era by era basis.  There is too much variation to come up with factoring coefficients on a yearly basis.  The eras shown above were thrown together arbitrarily to fit with the logistics of rebuilding this database.  Right now I’m thinking 1920-1960, 1961-1976, 1977-1997, 1998 -2019.

The biggest factor in narrowing Tier Combo results is number of players in a league which is directly related to number of teams.  1961 – 1977 went from 20 teams to 24.  The next era went from 26 to 28, and our modern era since 1998 has been at 30 teams.

The number of innings starters pitched has also declined a lot in recent years but that’s fodder for another post.

Looks like baseball season might be cancelled  <insert sad emoji>.  This model was going to get detailed box scores from mlb.com this season which would have made regular season handicapping much more interesting as roster value — especially relief, will be far more accurate than past seasons.  Unfortunately we may have to wait until next year.

Still working this simulation and the baseball-handbook.com website which will allow easy click through for any team, any player since 1900 and any game since 1920.  Until then ….

# Simulation Reboot Part 2

In order to properly simulate we need to know what happened in the past by actually counting it.   The difference in pitching innings between old era baseball and modern baseball is a problem.  We can’t simply dip into a game from the 1950s and pull a starter’s earned runs and innings pitched because starters pitch far less innings than than now.

DeltaWAA is the difference between an away team WAA and home team WAA where WAA is simply wins – losses.  League average across all deltaWAAs must equal 0 exactly because for every W a team receives, another team receives an L.  A standard deviation can be calculated however which means wins and losses can also be tiered like lineups, starters, and relief squads.

A tier as defined by this data model represents 1/2 standard deviation above or below league average.  A tier combo is an away team tier minus home team tier.  Negative tier combo means away team worse than home team and vice versa for a positive value,   In this data model tier combo are integers; each representing a set of values.

Below is a table showing home team win percentages for each tier combo set from  years 2000-2019.

### Tier Combo deltaWAA 2000-2019

TC Away% Home% Games Away R Home R
-6 0.248 0.752 929 3.608 5.473
-5 0.281 0.719 1070 3.817 5.524
-4 0.314 0.686 1984 3.845 5.235
-3 0.350 0.650 2871 3.960 5.130
-2 0.373 0.627 3479 4.152 5.029
-1 0.418 0.582 4077 4.355 4.769
0 0.465 0.535 4041 4.503 4.625
1 0.491 0.509 3921 4.632 4.472
2 0.547 0.453 3335 4.819 4.288
3 0.580 0.420 2872 5.048 4.155
4 0.607 0.393 1842 5.221 4.105
5 0.640 0.360 1089 5.488 4.134
6 0.713 0.287 999 5.710 3.869

Home team wins 75% of games at tier combo -6 which one would expect.  Away team wins 71.3% of the time with tier combo +6 when they have max advantage over home team.  The Games column shows the number of games in each tier combo set.  The last two columns show run differential per game that led to the win percentages.

Tier combo 0 is even steven between the two teams according to wins and losses.  Win% at TC 0 is almost exactly equal to overall home field advantage win% as one would expect.  The above represents what actually happened the last two decades with values that will be used to test the accuracy of the new simulator.

Not sure if or when baseball will resume.  Next part to this series will cover lineup -> starter and lineup -> relief tier combos through the various eras of baseball.  Right now we’re separating data into 1950-1969 , 1970 – 1999, and 2000 – 2019.  Eventually this simulator will look back to 1920 – 1949.  More on this later.  Until then ….

# Fantasy Baseball

Although fantasy baseball has been around for decades it has exploded with the advent of easy access to the Internet and emergence of sites like Draft Kings.   What makes a player excel on a fantasy baseball team does not necessarily mean they excel on their real team.

It was brought to my attention that perhaps modeling and ranking players in fantasy leagues might be a useful exercise and it only took around 700 lines of code to make it happen.  This post will show the top players in each position based upon 2019 numbers.  For this exercise we’ll only look at the head to head leagues where points are tabulated as followed.  This table was published on the CBS sports page.

Looks like runs and rbis aren’t very important for hitters and neither are earned runs for pitchers.  Whatever.  This is supposed to be fun, not an actual valuation of these players.  Let’s look at the top players in each position.

Unlike in this model where we rank pitchers and hitters together, in Fantasy pitchers have a separate ranking.  Pitchers can be ranked with hitters in this data model because for every run a hitter scores is a run a pitcher gives up making it exactly symmetrical.  We also rank pitchers with hitters with WAR only for comparison.

In WAR hitters make up 60% of total value, pitchers 40%.  It’s close enough to rank them together.  In Fantasy hitting total points is almost 2x pitchers meaning hitters make up 2/3 or 66% of total value.

In 2019 hitters accumulated 182,837 points at a rate of 0.980 points per plate appearance.  Not sure if that was by design to be close to 1/pa  or just how it happened.  In 2018 it was 0.933 and 0.952/pa in 2017.  The ball bounces differently each season.

Pitchers accumulated a total of 89,349 points in 2019 for an average of 0.686/out.  This model uses outs as a measure of time for pitchers, plate appearances for hitters.  A starter who pitches 6 complete innings should accumulate around 12 points.  This is an average across both starters and relievers.  Relievers pitch less innings but accumulate points at a faster rate.  In the tables below we lump them all together.

Points above average which uses a similar methodology to how Wins above average was calculated for each player but won’t be shown as it might confuse.   Due to the disparity in how points are distributed between players PAA should not be used for ranking purposes.  KISS requires ranking on total points accumulated per season.

The WAA and WAR columns show ranks for each value system covered here to provide context. Fantasy league values are points in a game for fun and do not necessarily reflect a player’s contribution to their real team winning.   For example, wins and losses are meaningless for pitchers and a legacy stat carried over from old school baseball when starting pitchers used to pitch an entire game.  Nowadays a reliever can come in, give up a bunch of runs to tie the game and receive a W for a bad outing.  In modern baseball the W and L stat are often randomly allocated.   Randomness is part of the fun in a game like Fantasy baseball however.

The below tables will cover top 15 point players for each position player and top 35 pitchers in 2019.  Since hitter ranks do not include pitchers and vice versa you kind of have to double the rank number to get an apples to apples comparison with the WAA and WAR columns.

### Fantasy Position 1B

Rank Points Name Pos WAA WAR
+002+ 849 Pete_Alonso 1B +031+ +038+
+008+ 791 Freddie_Freeman 1B +013+ +055+
+022+ 732 Rhys_Hoskins 1B XXXXX XXXXX
+023+ 730 Jose_Abreu 1B +086+ +166+
+024+ 729 Paul_Goldschmidt 1B +129+ +134+
+027+ 725 Carlos_Santana 1B +091+ +049+
+031+ 706 Josh_Bell 1B +019+ +124+
+053+ 654 Christian_Walker 1B XXXXX +134+
+056+ 641 Anthony_Rizzo 1B +092+ +073+
+058+ 637 Eric_Hosmer 1B XXXXX XXXXX
+060+ 633 Danny_Santana 1B-CF +069+ +176+
+064+ 628 Matt_Olson 1B +098+ +035+
+069+ 608 Yuli_Gurriel 1B-3B +052+ +108+
+093+ 563 Brandon_Belt 1B-LF -123- XXXXX
+100+ 555 Joey_Votto 1B -071- XXXXX

### Fantasy Position 2B

Rank Points Name Pos WAA WAR
+010+ 787 Jonathan_Villar 2B-SS XXXXX +066+
+032+ 704 Ozzie_Albies 2B +192+ +041+
+033+ 704 Max_Muncy 2B-1B +029+ +023+
+035+ 700 Whit_Merrifield 2B-RF XXXXX +066+
+047+ 669 DJ_LeMahieu 2B-3B +035+ +019+
+054+ 647 Rougned_Odor 2B +119+ XXXXX
+078+ 595 Ryan_McMahon 2B-3B +159+ XXXXX
+084+ 575 Starlin_Castro 2B-3B -160- XXXXX
+089+ 571 Jose_Altuve 2B +089+ +085+
+091+ 565 Cesar_Hernandez 2B -197- +158+
+094+ 561 Freddy_Galvis 2B XXXXX XXXXX
+120+ 528 Asdrubal_Cabrera 2B +078+ XXXXX
+127+ 509 Kolten_Wong 2B -174- +043+
+136+ 490 Cavan_Biggio 2B XXXXX +134+
+137+ 490 Adam_Frazier 2B -128- +141+

### Fantasy Position 3B

Rank Points Name Pos WAA WAR
+007+ 793 Eugenio_Suarez 3B +139+ +049+
+013+ 782 Alex_Bregman 3B-SS +011+ +002+
+014+ 782 Rafael_Devers 3B +006+ +033+
+018+ 756 Anthony_Rendon 3B +005+ +015+
+019+ 745 Josh_Donaldson 3B +102+ +018+
+025+ 729 Eduardo_Escobar 3B-2B +051+ +059+
+026+ 726 Nolan_Arenado 3B +022+ +023+
+028+ 720 Matt_Chapman 3B +106+ +013+
+029+ 710 Kris_Bryant 3B-RF +109+ +092+
+050+ 657 Yoan_Moncada 3B +171+ +047+
+055+ 643 Manny_Machado 3B-SS XXXXX +112+
+057+ 640 Hunter_Dozier 3B-RF XXXXX XXXXX
+072+ 600 Mike_Moustakas 3B-2B +125+ +108+
+080+ 590 Miguel_Sano 3B +038+ +112+
+104+ 554 Jose_Ramirez 3B XXXXX +102+

### Fantasy Position Catcher

Rank Points Name Pos WAA WAR
+049+ 658 Yasmani_Grandal CR-1B XXXXX +158+
+066+ 626 J.T._Realmuto CR +100+ +055+
+123+ 520 Gary_Sanchez CR +104+ +112+
+128+ 508 Christian_Vazquez CR XXXXX +187+
+129+ 504 James_McCann CR XXXXX +077+
+150+ 474 Jorge_Alfaro CR -167- XXXXX
+157+ 466 Robinson_Chirinos CR XXXXX +077+
+158+ 466 Mitch_Garver CR +033+ +066+
+159+ 462 Willson_Contreras CR +177+ +112+
+160+ 461 Roberto_Perez CR XXXXX +073+
+164+ 458 Omar_Narvaez CR XXXXX XXXXX
+171+ 441 Wilson_Ramos CR XXXXX XXXXX
+202+ 391 Travis_d’Arnaud CR-1B +130+ XXXXX
+208+ 372 Carson_Kelly CR XXXXX XXXXX
+213+ 367 Yadier_Molina CR XXXXX XXXXX

### Fantasy Position LF

Rank Points Name Pos WAA WAR
+012+ 783 Juan_Soto LF +020+ +043+
+037+ 687 Kyle_Schwarber LF +146+ +176+
+051+ 655 Tommy_Pham LF XXXXX +085+
+074+ 598 Marcell_Ozuna LF +084+ XXXXX
+075+ 596 Andrew_Benintendi LF-CF XXXXX XXXXX
+076+ 595 Michael_Brantley LF +124+ +047+
+079+ 594 Eddie_Rosario LF-RF +024+ XXXXX
+085+ 575 Bryan_Reynolds LF-RF +200+ +073+
+087+ 573 Joc_Pederson LF-RF +090+ +102+
+092+ 564 Jeff_McNeil LF-RF +175+ +038+
+095+ 560 Domingo_Santana LF-RF XXXXX XXXXX
+098+ 556 Eloy_Jimenez LF +174+ XXXXX
+107+ 550 Alex_Gordon LF XXXXX XXXXX
+113+ 545 Ryan_Braun LF +160+ XXXXX
+117+ 539 Wil_Myers LF-CF XXXXX XXXXX

### Fantasy Position CF

Rank Points Name Pos WAA WAR
+001+ 890 Ronald_Acuna_Jr. CF-LF +030+ +031+
+011+ 783 Mike_Trout CF +012+ +003+
+036+ 687 Ketel_Marte CF-2B +065+ +010+
+043+ 671 George_Springer CF-RF +023+ +017+
+062+ 629 Starling_Marte CF +075+ +124+
+065+ 627 Victor_Robles CF-RF XXXXX +064+
+081+ 589 Brett_Gardner CF-LF +126+ +066+
+086+ 574 Kevin_Pillar CF-RF XXXXX XXXXX
+088+ 572 Jackie_Bradley_Jr. CF XXXXX XXXXX
+096+ 557 Ramon_Laureano CF-RF +108+ +077+
+102+ 555 Mallex_Smith CF-RF -039- XXXXX
+109+ 548 Scott_Kingery CF-3B XXXXX +121+
+112+ 546 Mark_Canha CF-RF XXXXX +049+
+114+ 542 Leury_Garcia CF-RF -089- XXXXX
+119+ 528 Teoscar_Hernandez CF-LF XXXXX XXXXX

### Fantasy Position RF

Rank Points Name Pos WAA WAR
+003+ 818 Cody_Bellinger RF-1B +008+ +001+
+004+ 814 Bryce_Harper RF +045+ +059+
+009+ 789 Christian_Yelich RF +027+ +009+
+017+ 758 Mookie_Betts RF-CF +039+ +012+
+020+ 742 Trey_Mancini RF-1B +074+ +102+
+030+ 708 Michael_Conforto RF-CF +134+ +095+
+038+ 687 Nicholas_Castellanos RF-LF XXXXX +141+
+039+ 684 Charlie_Blackmon RF +049+ +176+
+040+ 677 Austin_Meadows RF +111+ +077+
+046+ 670 Kole_Calhoun RF XXXXX +176+
+061+ 630 Yasiel_Puig RF XXXXX XXXXX
+063+ 629 Randal_Grichuk RF-CF XXXXX XXXXX
+067+ 624 Max_Kepler RF-CF +046+ +066+
+071+ 605 Adam_Eaton RF XXXXX XXXXX
+090+ 570 Dexter_Fowler RF-CF XXXXX XXXXX

### Fantasy Position DH

Rank Points Name Pos WAA WAR
+005+ 813 Jorge_Soler DH-RF +054+ +085+
+021+ 741 J.D._Martinez DH-RF +041+ +102+
+034+ 700 Shin-Soo_Choo DH-RF XXXXX XXXXX
+042+ 672 Nelson_Cruz DH +014+ +058+
+068+ 609 Renato_Nunez DH-1B XXXXX XXXXX
+070+ 606 Franmil_Reyes DH XXXXX XXXXX
+077+ 595 Daniel_Vogelbach DH-1B XXXXX XXXXX
+097+ 556 Edwin_Encarnacion DH-1B +043+ +141+
+125+ 516 Khris_Davis DH XXXXX XXXXX
+138+ 489 Yordan_Alvarez DH-LF +042+ +085+
+151+ 473 Shohei_Ohtani DH XXXXX +158+
+165+ 455 Miguel_Cabrera DH-1B -020- XXXXX
+206+ 378 Hunter_Pence DH-LF +076+ XXXXX
+393+ 145 Nick_Solak DH-3B XXXXX XXXXX
+413+ 127 Kendrys_Morales DH -107- -065-

### Fantasy Pitchers

Rank Points Name Pos WAA WAR
+001+ 771 Justin_Verlander PITCH +001+ +005+
+002+ 750 Gerrit_Cole PITCH +002+ +010+
+003+ 628 Zack_Greinke PITCH +009+ +028+
+004+ 623 Stephen_Strasburg PITCH +028+ +015+
+005+ 603 Shane_Bieber PITCH +018+ +038+
+006+ 592 Jacob_deGrom PITCH +003+ +008+
+007+ 560 Charlie_Morton PITCH +021+ +035+
+008+ 548 Jack_Flaherty PITCH +007+ +021+
+009+ 547 Clayton_Kershaw PITCH +025+ +095+
+010+ 545 Patrick_Corbin PITCH +026+ +023+
+011+ 534 Hyun-Jin_Ryu PITCH +004+ +035+
+012+ 533 Lance_Lynn PITCH +057+ +007+
+013+ 524 Walker_Buehler PITCH +036+ +187+
+014+ 520 Eduardo_Rodriguez PITCH +114+ +019+
+015+ 519 Luis_Castillo PITCH +040+ +043+
+016+ 504 Max_Scherzer PITCH +017+ +022+
+017+ 495 Lucas_Giolito PITCH +044+ +028+
+018+ 490 Jose_Berrios PITCH +077+ +102+
+019+ 488 Mike_Minor PITCH +050+ +005+
+020+ 487 Madison_Bumgarner PITCH +133+ +158+
+021+ 480 Aaron_Nola PITCH +117+ +085+
+022+ 473 Mike_Soroka PITCH +010+ +023+
+023+ 464 Josh_Hader PITCH +096+ +147+
+024+ 464 Sonny_Gray PITCH +015+ +028+
+025+ 456 Zack_Wheeler PITCH +168+ +095+
+026+ 456 Trevor_Bauer PITCH XXXXX XXXXX
+027+ 454 Mike_Fiers PITCH +156+ +124+
+028+ 440 Noah_Syndergaard PITCH XXXXX +176+
+029+ 437 Will_Smith PITCH +135+ +197+
+030+ 433 Roberto_Osuna PITCH +121+ XXXXX
+031+ 428 Marco_Gonzales PITCH +141+ +108+
+032+ 426 Kirby_Yates PITCH +047+ +134+
+033+ 421 Yu_Darvish PITCH +169+ +102+
+034+ 420 German_Marquez PITCH XXXXX +095+
+035+ 418 Domingo_German PITCH XXXXX XXXXX

Head to head fantasy leagues are simplest to model as they merely require adding up points. Rotisserie leagues are far more complicated where player compositions depend upon a seasonal strategy for doing well in certain categories.  Since this baseball season is delayed maybe I’ll give it a try but not sure.

More simulation changes coming soon.  It the season doesn’t start until May then we can’t start ranking players, simulating. and handicapping games until June and possibly beyond More on this later.  Until then ….

# Simulation Reboot Part 1

Work is currently being done on the next iteration of the regular season simulation.  Last year a 5 part series of posts attempted to explain how this simulation worked.   Simulation in the past two years relied on all games from 1970 – present; around 100K games.  The next iteration of this simulation will use the daily snapshots back to 1920 or around 200K games.

Due to major differences in the way teams were managed between then and now there are some issues that need to be resolved.  This set of posts will highlight those issues and some of the decisions being made to address them.  This is after all a log book.

Relief was a big vector for error and how this was solved will be covered later.  Today will be a short, perhaps interesting post about starting pitching between then and now that affects the way the simulator treats starters.

As explained in part 5, there are two kinds of combo pairs; lineup -> starter and lineup -> relief. Each of these pairs is assigned an integer between -6 and +6.  A tier combo of -6 is the worst lineup facing the best pitcher or relief squad; +6 the opposite.

Each game consists of 4 pairs; an away ls, lr, and a home ls,lr.  The simulator will look back into the past and for lineup->starter grab the number of innings pitched and number of earned runs scored for both home and away.  It calculates who won that simulated game, counts it, and does it again one million times per simulation.  At the end it tabulates wins and losses and that’s the estimated probability for the current game.

There is a rather large difference between modern baseball and legacy baseball.  Right now studies are being made to show those differences.  Below is a table showing percentages for starters who pitched complete 9 innings.

Tier Inn 1950 1960 2000 2010
-6 9 0.459 0.413 0.088 0.062
-5 9 0.442 0.392 0.069 0.052
-4 9 0.397 0.350 0.047 0.034
-3 9 0.361 0.315 0.043 0.029
-2 9 0.337 0.288 0.039 0.024
-1 9 0.302 0.270 0.039 0.021
0 9 0.312 0.239 0.030 0.016
1 9 0.280 0.221 0.020 0.018
2 9 0.288 0.208 0.024 0.017
3 9 0.259 0.191 0.027 0.015
4 9 0.261 0.215 0.023 0.023
5 9 0.226 0.167 0.028 0.019
6 9 0.252 0.194 0.027 0.024
TOT 9 0.315 0.257 0.033 0.021

The columns represent a decade of games for the 1950s, 1960s, 2000s, and 2010s.  A -6 Tie combor is the worst lineup vs. best starter, +6 tier combo is the best lineup vs. worst starter.  The TOT row is  average across all tier combos for that decade.

in 1950s  A starter pitched 9 innings in 31.5% of all games.  That dropped to a little more than 1/4 in 1960s.  In modern baseball for which this simulator is supposed to handicap, it’s down to 2.1%.  Even though Tier combo -6 is almost 3x that at 6.2%, it’s still extremely rare.  Managers want to conserve wear and tear on their pitchers’s arms.  Sports medicine wasn’t as sophisticated back in the 1950s.

One would expect more complete games as Tier combos go negative where starters exceed lineups and that’s what we’re seeing.  The 0 Tier combo row represents even steven between lineup and starter.  One would expect that percentage should come close to the overall average and it does.  It’s a little off in modern baseball but that could be due to smaller sample size.

There is a table like this for each inning but the 9th is most interesting.  It is still a work in progress resolving 1950s data with modern baseball.  The more data we have the more accurate the simulation will be.

That’s all for this tidbit into the simulator.  This model cannot start simulating until May when there’s at least a month of baseball in the books and there’s still some problems with that. More on that later.

I had planned to cover spring training but decided to wait until opening day when we’ll do playoff horse race based upon 3 year splits of current roster.  Then we’ll cover new guys for both White Sox and Cubs.   Since the Cubs may not be on TV at my local tavern or any tavern around here we may be forced to follow the White Sox this season.  We’ll see.   Until then ….

# Opening Day 4/21/1950

I ran across this ticket stub after going through some of my dad’s stuff he threw in the top dresser drawer and forgot about if for over half a century.  Apparently dad went to a Cubs home opener at Wrigley on 4/21/1950.  On the back he wrote down the score and the Cubs  starting pitcher.

This post will do a data dump look see into this game.

Looks like the Feds put a 21 cent tax on a \$1.25 ticket.  No state or city amusement tax yet back then.  This is the second of only 5 games Cubs played in April 1950.  Aprils can be cold in Chicago and elsewhere.  Since we’re at the beginning of the season rankings, WAA calculation and tiering cannot be done so there’s not much to see in terms of handicapping this game.

#### GAME 195004210 SLN CHN

TeamID Line Score Runs TB Hits E
AWAY SLN 000000000 0 7 4 0
HOME CHN 00010100 2 7 3 1

Total Bases (TB) provides context to the Hits column.  Although Cubs had one less hit than Cardinals, both teams had equal total bases which could have been a factor in Cubs scoring those two runs.  The clickable web site will allow for navigation into events for this game.

#### STARTERS 195004210

Rank WAA Name TeamID Tier
XXXXX 0 Harry_Brecheen SLN 0
XXXXX 0 Bob_Rush CHN 0

As my dad wrote on the back of his ticket stub, Bob Rush started this game for the Cubs with the Cardinals starting the veteran Harry Brecheen.  Right now, at the start of the season, there isn’t enough data to evaluate these two pitchers.  Since we’re from the future however let’s look at their careers.

### Bob Rush

Year Rank WAA TeamID Pos
1948 XXXXX 0.78 CHN PITCH
1949 XXXXX 0.48 CHN PITCH
1950 +049+ 3.88 CHN PITCH
1951 XXXXX 1.05 CHN PITCH
1952 +015+ 5.9 CHN PITCH
1953 XXXXX -1.32 CHN PITCH
1954 XXXXX 0.84 CHN PITCH
1955 +057+ 2.69 CHN PITCH
1956 +038+ 4.16 CHN PITCH
1957 -025- -2.73 CHN PITCH
1958 +096+ 1.53 ATL PITCH
1959 +055+ 3.42 ATL PITCH
1960 XXXXX -0.76 TOT PITCH
1960 XXXXX -0.13 ATL PITCH
1960 XXXXX -0.61 CHA PITCH
TOTAL X 19.18 PITCH

Pretty decent career with his best year 1952.  The 1950 season will be his career year when that season ends.

### Harry Brecheen

Year Rank WAA TeamID Pos
1940 XXXXX 0.32 SLN PITCH
1943 +045+ 3.19 SLN PITCH
1944 +064+ 2.6 SLN PITCH
1945 +040+ 3.59 SLN PITCH
1946 +019+ 5.06 SLN PITCH
1947 +060+ 2.73 SLN PITCH
1948 +005+ 10.29 SLN PITCH
1949 +054+ 3.38 SLN PITCH
1950 +076+ 2.33 SLN PITCH
1951 +070+ 2.58 SLN PITCH
1952 XXXXX 1.07 SLN PITCH
1953 +061+ 2.88 BAL PITCH
TOTAL X 40.02 PITCH

Brecheen had a better career with his best year in 1948.

#### HITTERS SLN 195004210

Rank WAA Name Pos PA R RBI TB H W
XXXXX 0 Solly_Hemus X 4 0 0 0 0 0
XXXXX 0 Red_Schoendienst 2B-SS 4 0 0 2 1 0
XXXXX 0 Stan_Musial OF-1B-LF-CF-RF 4 0 0 1 0 0
XXXXX 0 Enos_Slaughter OF-RF-LF 4 0 0 1 1 1
XXXXX 0 Joe_Garagiola CR 4 0 0 0 0 1
XXXXX 0 Rocky_Nelson 1B 4 0 0 3 1 0
XXXXX 0 Harry_Walker OF-CF 4 0 0 0 0 0
XXXXX 0 Eddie_Miller SS 3 0 0 0 0 1
XXXXX 0 Harry_Brecheen P 3 0 0 0 0 1
XXXXX 0 Bill_Howerton OF-CF-LF-RF 1 0 0 0 0 0
TOTAL X X X 35 0 0 7 3 4

#### PITCHERS SLN 195004210

Rank WAA Name Outs PA R ER TB H W SO
XXXXX 0 Harry_Brecheen 24 28 2 2 7 3 2 7
TOTAL X X 24 28 2 2 7 3 2 7

#### HITTERS CHN 195004210

Rank WAA Name Pos PA R RBI TB H W
XXXXX 0 Wayne_Terwilliger 2B 4 1 0 3 1 0
XXXXX 0 Hal_Jeffcoat OF-RF-LF 3 1 0 3 1 0
XXXXX 0 Preston_Ward 1B 3 0 1 1 1 0
XXXXX 0 Hank_Sauer LF-OF-1B 3 0 1 0 0 0
XXXXX 0 Andy_Pafko OF-CF-RF 3 0 0 0 0 1
XXXXX 0 Bill_Serena 3B 3 0 0 0 0 0
XXXXX 0 Roy_Smalley SS 3 0 0 0 0 1
XXXXX 0 Mickey_Owen CR 3 0 0 0 0 0
XXXXX 0 Bob_Rush X 3 0 0 0 0 0
TOTAL X X X 28 2 2 7 3 2

The baseball-handbook.com web site will allow for one click navigation to any of these players.

#### PITCHERS CHN 195004210

Rank WAA Name Outs PA R ER TB H W SO
XXXXX 0 Bob_Rush 27 35 0 0 7 4 4 5
TOTAL X X 27 35 0 0 7 4 4 5

Update:  There’s a discrepancy in hits given up by Rush (4) and the number of hits SLN made (3).  The game logs show 4 but reading the event data I only see 3.  Box scores are derived from event data.  Stan Musial hit what they call a single bunt after a double where the runner on second got thrown out at third.  I don’t think that’s considered a hit but the scorekeeper might have.    Doesn’t really matter in the grand scheme.

Starters for both teams pitched complete games which was normal back then.  This reduces the importance of relief during this era in baseball.  More on that later.   Here’s how the 1950 season ended for the Cubs and Cardinals.

### NL 1950

Tm W L BAT PITCH UR
PHI 91 63 -30.1 119.1 7.8
LAN 89 65 82.9 4.0 22.8
SFN 86 68 -18.1 98.1 9.8
ATL 83 71 27.9 28.2 -13.2
SLN 78 75 -58.1 67.0 13.8
CIN 66 87 -92.1 14.0 2.8
CHN 64 89 -98.6 13.1 -34.2
PIT 57 96 -70.1 -89.1 -17.2

Both teams bottom of the pack which will become a common place for the Cubs to be for the next almost two decades.  We’re sill using the three letter franchise codes.  In 1950 LAN was Brooklyn Dodgers, ATL Boston Braves, and SFN was New York Giants.  Franchise codes are used internally in the database but official names could be used for display purposes.

That’s all for now.  The above is a work in progress presentation of a game data dump.  Spring training new guys post coming soon as I hear they started already.  Until then .,…