When is a ball a ball?

That question may sound like it belongs in a philosophy class, but it is one that baseball fans ask countless times during the season. To investigate the question, we were able to download “PITCHf/x” data. This data is freely available and is now often shown during the broadcast in the form of a box in the bottom right of the screen that shows the location of pitch relative to the strike zone. The data gives information on things such as location (horizontal and vertical), velocity (at the start and finish of the pitch), pitch type (with some level of confidence), batter, and pitcher.

The data we used included all called balls and called strikes from the 2010-2012 seasons. To avoid the problem of setting the vertical components of the strike zone, we looked only at pitches that were off the plate horizontally. We fit a logistic regression model that had effects for distance from the edge of the plate, inning, umpire, pitch type, velocity, batter stance, and whether the pitch was inside/outside (all were significant).

The first plot shows the proportion of balls called strikes as a function of distance from the plate and pitch type. The apparently wider strike zones for left-handed batters is something that has been observed in the blogosphere, but is still a bit disturbing. We can see that the umpires do a better job on the slower pitch types. An interesting thing to note is that once we account for the type of pitch, the “slope” term for velocity in our model was negative. This indicates that the faster the pitch is thrown the more likely it is to be correctly called a ball.

Umpires are known to have “their strike zones” so it wasn’t surprising to find a strong umpire effect as well. Our three best umpires, in terms of lowest misclassification of balls off the plate as strikes, were Eric Cooper, Tim McClelland, and Chad Fairchild and the three worst were Tim Welke, Wally Bell (RIP), and John Hirschbeck.


Lastly, we had hypothesized a 9th inning effect. Our thinking being that in blow out games on hot days in the middle of the summer, the umpires would just want to get out of there. To our delight, we actually saw a 9th inning effect. The left plot gives the empirical estimates based on distance off the plate and the right plot gives the estimates from our model. The effect is diminished after accounting for the other things in our model (closers throw more fastballs etc.), but still apparent.

Authors: Dr. Justin Post and Dr. Jason Osborne