Using Pitch Data

The last two exercises were simple.  Let’s dive one step further.

Event data contain a pitch data field describing the outcome of each pitch.  The letter C is for a called strike, S for a swinging strike, B for a ball, F for a foul, etc.  This NYT article wonders whether batters swing and miss at bad pitches more as a season progresses.  Event data can’t tell if a batter swung and missed at a bad pitch or one right through the strike zone.  It does however discern between a called strike and a swinging strike.  Logically, does it make a difference whether a batter swung at a bad pitch or missed a pitch right in the zone?   Both seem like mental lapses.  Since we know when a swinging strike occurred we can run through the event data and count swinging strikes.  In order to make sense of this total we need to normalize to the number of pitches.  Some months may have more games than others and we need to factor that out of the results. This study simply extracts the pitch count string from each event, categorize it with month, then count the swinging strikes (S) and total pitches.   Many years are needed in order to get a deep enough data set to make a conclusion.

The tables below show two different runs, one from 2010-2012 and another for the decade of 2000-2009.  The latter has a deeper set of data than the former and should be more accurate.

It appears that the percentage of swinging strikes to total pitches does increase as the season progresses in the ten year study.  This could either mean pitchers get better at fooling batters as the season progresses or batters get tired.  Expanded rosters occur in the month of September which could also be a factor.  This is another example of information that can be derived by traversing event data.  This factoid may not be that useful but someone did write an article in the NYT about it.

04      27344   0.334
05      31857   0.329
06      30501   0.334
07      30556   0.344
08      32768   0.341
09      32747   0.354

04      89829   0.319
05      102199  0.318
06      99274   0.321
07      98207   0.322
08      106854  0.327
09      102864  0.335