I haven’t done much since the end of the 2017 season. Spring training is starting up but we can’t make any analysis until May, possibly mid May. Hopefully by that time the simulations will have been completed and run against our dataset that consists of the last 7 years of daily lines for each game. If this system can show a clear margin we have something. If not then we need to figure out why.
A prototype web portal to this data model will be developed throughout the season. This blog concentrates on the Cubs but this analysis can be done for any team, for any season. The web portal will be turned into a prototype app for anyone to quickly look up anything about baseball through the lens of this data model.
The next post will be a career based ranking based upon whatever we can discern from various team’s player rosters. The Cubs have quite a few high ranking career players now compared to the earlier years after the Ricketts purchased the team. You can click on baseball.brandylion.com/seasons and peruse various careers by drilling down. This data covers everything up to including 2013 so it’s quite out of date. That will be made current.
Apparently mlb.com now publishes detailed box scores and event data for each game in XML. We used to rely on retrosheet.org who publish this yearly in December well after the season is over. We have certain stats like RISP which we introduce here. that can’t be calculated without event data. We should probably write another RISP article before May as well.
And finally, the Cubs just acquired Yu Darvish for $21/year for 6 years. I am not an MBA we don’t have a good way to determine whether that’s a good deal. All we can do is look at his career.
He has had a solid career with his best season in 2013 which is his upside potential. If he pitches like that the Cubs will be in good shape. For 2017 you have to add TEX and LAN together giving him a WAA=2.3. This ranks him #157 for the season, top 200.
The above numbers don’t mean much unless put into context with the entire MLB player dataset from 1900-2016. That will be fodder for another before May article. Until then….