Normally these playoff horse race posts start around the end of August through September using current year data. The race actually starts now at the beginning of the season and unlike a real horse race like the Kentucky Derby, the race for a playoff spot is more like a 26 mile marathon run by humans.
This post could also be Part 2 of the Prediction Racket but we’re not going to make predictions or projections. Since we won’t have enough current year data until beginning of May we have to use career data. It has been determined that a 3 year snapshot is a good indicator of talent and levels the playing field for young guys with veterans. Guys like Albert Pujols who has the highest active career stats in MLB by far barely ranks in the top 200 in the 2016, 2017, 2018 snapshot.
The below playoff horse race table shows all teams sorted by actual team WAA (W-L) for the last three years which is the only stat the Commissioner of MLB cares about. The Total column is the sum of Hitters and Pitchers. Pitchers is the sum of Starters and Relief. Since new guys start out at WAA=0 there is no way to project what impact they may have on each category. This data model does not project. These values represent the past. There are a lot of numbers in these horse race tables so more explanation below the fold.
The Cubs (CHN) had three very good consecutive years despite losing early last September putting them near the top of this list. This and trades for high career value guys like Hamels give the Cubs the highest career Total of all 30 teams according to this data model. This should be expected but as we saw the last three real games against Texas, high value career guys can tank just as easily as anyone.
Note: This table was made automatically based upon incoming roster data which I did not thoroughly check for accuracy — except for CHN. If a good player is on DL his numbers won’t be part of that team’s total because he’s not on the roster. Rosters change daily and processing this is a big part of the current year dataset.
If we were joining the Prediction Racket, the above table would make for a nice template. Move teams around on a whim and you’ll probably be very close to being right at the end of the season. Teams like ATL who played well last season but struggled the two before last are in the bottom half but their Total 3 year career split has risen significantly from last season.
The White Sox will probably do better than their position on this table too and CIN, the worst team in MLB over the past three years, have pretty decent WAA value. By getting rid of Homer Baily CIN raised their Starters, Pitching, and Total numbers by over 10. Teams get better by getting rid of bad players.
Now let’s check the above numbers by taking a look at CHN roster. Our roster source is missing two players and not sure who they are. Since no data can be crunched in April all the scripts that process this data flow from May to November have to be reworked. Below are CHN Hitters, Starters, and Relief; the three categories crucial for daily simulation and estimating winning probabilities.
The Rank is the same process used for current year. Top 200 get ranked with a + , bottom 200 with a – , and everyone else gets XXXXX meaning unranked. Hitters and Pitchers, AL and NL are ranked together and the Cubs have 3 guys in the above list in the top 50 which is very good.
This is a good starting staff on paper but starting pitching can be extremely unreliable. Hendricks and Lester are pretty solid each year but it’s hard to predict this. No team can complain about 2 starters in the top 50 and 4 in the top 200. The PECOTA projection system banked their troll giving the Cubs 79 wins this season by predicting these old guys will tank. Hendricks hasn’t reached free agency yet. Whatever.
Another 5 guys in the top 200 on relief staff. If MLB games were played on paper the Cubs would be in extremely good shape. Unfortunately the Commish makes them play the games so anything can happen.
That is all for now. Perhaps Part 2 of this 2019 version of the playoff horse race will get posted end of May using current data. We’ll see. The next post will cover a topic brought up at the local pub. Someone suggested that first pitches have a higher probability of being strikes than all the other pitches. This can be easily proven using event data from retrosheet.org.
This season all betting opportunity will get posted which should be around 3 or 4 games per day starting in May plus the usual bi-weekly CHN team status and series analysis and more. Presentation of game data will be different and hopefully more intuitive. Until then ….