2021 SABR Analytics Diamond Dollars Recap

WashU Sports Analytics
Jun 8, 2021
7 min read

By: Sarah Sult, Simon Todreas, and Max Hanley

FIERS

Fielding Independent Effective Runs per Start

Intro

This post summarizes Wash U’s entry for the 2021 Society for American Baseball Research (SABR) Diamond Dollars Case Competition. Held virtually this year, the Diamond Dollars Case Competition consisted of 20 teams of high school, undergraduate, or graduate students answering a relevant question in baseball at the time of the conference. This year’s team consisted of Max Hanley, Sarah Sult, and Simon Todreas. All teams were given the prompt entitled: “Changing the Game: A New Version of Game Score.”

This case assignment was to “develop a new stat to improve upon the insights that Game Score and Game Score 2.0 provide about a starting pitcher’s outing. This new metric should serve as both a diagnostic and evaluative tool that would inform front offices, players, and fans about a starting pitcher’s performance.”

Game Score and Game Score 2.0 each start pitchers at a predetermined starting value and assign values, positive or negative, to different events (outs, hits, runs, etc). The pitcher’s final score allows fans and coaches alike to interpret the pitcher’s overall performance.

In creating our new pitching metric, we wanted to calculate values to represent the quality of pitches thrown, not the end result. To do this, we first assigned values to every possible pitch result (ball, strike, single, double, field out, etc.). Then, we wrote an algorithm that calculated the expected value of a pitch based on the average of similar pitches. We assigned this expected value to the pitch to ensure quality pitches were rewarded, even if the batter was able to get a hit, and poor pitches were punished, even if the pitcher got the out.

Here is the link to our code, and here is the link to our slides.

Calculating Values

All of our data was downloaded directly from baseball savant. We started with all pitch by pitch data from 2019, but eventually reduced it to a random subset of days. This was done to help with run-time because the algorithm could not handle every pitch (over 700,000) in a timely fashion. We randomly selected dates instead of pitches so we could analyze entire appearances. When this process was complete, we were left with 108,500 pitches to analyze, a large enough sample to be representative of the entire year.

To ultimately calculate the quality of a start (or relief appearance), we first determined the quality of every pitch. In order to account for the randomness and variability inherent in the outcome of any given pitch, we found the expected value for every pitch by averaging out the outcome values for all similar pitches. We found the outcome value for every pitch by finding the context-neutral run expectancy for every pitch.

To start this process, we converted Fangraphs wOBA constants to wRAA to determine the value of all balls in play (single, double, triple, home run, out, and hit-by-pitch). Then we determined the values of balls, strikes, and fouls in each count by finding the change in expected runs from one count to the next. For example, after a 2-1 count, there is 0.03 runs/PA above average but after a 2-2 count, there is -0.04 runs/PA above average, so a strike in a 2-1 count is worth -0.07 runs. All the different counts were calculated and then a weighted average was computed to determine the overall value of a strike, ball, and foul. All the final outcome values are displayed in the chart below.

Algorithm

We worked exclusively in RStudio to create an algorithm that compared every pitch to similar pitches in our sample to predict the outcome of the pitch and assign it an expected value. We compared pitches where pitcher handedness, batter handedness, game day zone, and pitch type (categorized by fastball, breaking, or offspeed) were the same. Additionally, velocity and spin rate were kept within a one standard deviation range (±0.5 standard deviations from the value). With all the pitches that made it through these criteria, we found the average value of these pitches and this value represented the expected runs above average for the original pitch.

To help clarify, we broke down the last pitch of Justin Verlander’s 2019 no-hitter. It was a 4-seam fastball from a right-handed pitcher to a right-handed batter (Bo Bichette) in gameday zone 9 (low and away strike). It registered 96.9 miles per hour with a spin rate of 2805 revolutions per minute. The pitches our algorithm compared it to were all fastballs thrown from a right handed pitcher to a right handed batter in gameday zone 9 between 95.4 and 98.4 miles per hour and 2708 and 2902 revolutions per minute. In our sample, this returned the following pitches.

All similar pitches happened to be 4-seam fastballs, but this was not a requirement. Of these similar pitches, no pitches besides Verlander’s resulted in an out. We average the values of all of these pitches to obtain a final value for this pitch of -0.03067. This value is less than the original value assigned to this pitch, because no other pitches had as good of an outcome.

We included the original pitch in the average because it is a perfect match and there is no reason to exclude it from our process. Additionally, it acts as a default for outlier pitches which no other pitches fulfill our criteria for comparison. While far from ideal, having the outcome value as the expected value is the best possible default since it is the baseline for what we are trying to improve upon.

Our algorithm goes through this process for each pitch of the starters appearance and sums the values over all pitchers to arrive at our raw score for the appearance.

Adjustments

To adjust for the error in calculating the value of each pitch, we added .0019488 to each pitch. This accounts for the imperfections of our data, most likely due to the different methodologies we used to calculate the values of balls/strikes/fouls and balls in play. This small correction makes the average pitch value exactly 0, instead of -.002 runs.

We then wanted to adjust for innings pitched, so starting pitchers who lasted longer into the game would be rewarded. To do this, we compared the pitchers to replacement level so a longer start gives a starting pitcher more time to accrue value as long as they perform above replacement level.

The 2019 Rockies relief pitchers had a 0.0 WAR over the entire season so we used them as a proxy for a replacement level stat line. On average, their pitchers threw 17.2 pitches per inning and were 0.0038 runs per pitch above average. Multiplied together, we find that the replacement level is 0.065 runs per inning worse than average.

To calculate a pitchers final score, we took the corrected replacement runs per inning above average (the value we found from the 2019 Rockies), and multiplied this by the number of innings the starting pitcher appeared in. This is how we would expect a replacement pitcher to have performed. We then subtracted their raw score, given to us by our algorithm, and added .002 times their total number of pitches. This adjusted for our error in calculating the value of each pitch. This total calculation gave us our final score (see below).

Results

In order to interpret how our final score represented a pitcher’s performance, we continued with Justin Verlander’s 2019 no hitter. His Game Score was 100, and our method gave him a score of -1.191, meaning we predict he would save about 1.191 runs that game compared to a replacement level pitcher given the quality of pitches he threw. It is important to note that for our score, negative values are good since it represents runs saved.

Where our method excels is comparing two starts that received the same Game Score value. Because our methods evalutate each pitch based on expected outcome instead of result, our final score can vary drastically for appearances with similar Game Scores.

As an example, we compared Ross Detwiler’s June 28th start to Adam Wainwright’s May 28th start. Both pitchers received a Game Score of 50, but their FIERS score varied significantly. Detwiler had a FIERS score of 0.887 and Wainwright a score of -0.833. These values help us to understand their starts in more depth. As an example of why this happens, we compared two specific pitches.

The above pitch was a hanging changeup from Detwiler that resulted in a line out to center field. It was in the middle of the strike zone (gameday zone 5) at 83.5 MPH and a spin rate of 1472 RPM. Game Score gives Detwiler one point for the out, and Game Score 2.0 gives him two for the result. FIERS realizes that Detwiler got lucky that the pitch was hit right at a fielder and punishes him for the poor pitch with a score of 0.142 runs on that pitch. Since positive scores are runs allowed, this is worse than an average “replacement level” pitch.

As for Wainwright, here he throws a 90.3 MPH, 2136 RPM fastball down and away (gameday zone 7) to Bryce Harper. Both Game Score and Game Score 2.0 gives Wainwright negative two points because the pitch resulted in a double. Our system recognizes that this is actually a good pitch that usually results in positive outcomes. It gets a FIERS score of -0.069, a solid pitch.

These examples highlight how our method is able to better examine the quality of pitches being thrown and score the pitchers accordingly.

Limitations

Our primary limitation was time. In the competition, each team is given exactly one week from the delivery of the case to submission of the final powerpoint. We wrote an algorithm that could handle scoring one start in a 100k pitch sample size in about 12 minutes. We were in the process of creating a faster algorithm that presorted the pitches based on the categories described previously, and only compared a given pitch from the starting pitches to others that were categorically similar. The presort would shorten the runtime of our algorithm exponentially, allowing us to use a larger sample size. Unfortunately, we were still a few hours away from debugging this faster algorithm and had to continue with our slower code. A larger sample size would have allowed us to get more accurate raw values for each pitch.

Another limitation is we did not assign values based on the count at the time the pitch was thrown. Adding this into the value of a pitch could help pinpoint situational strengths and weaknesses of pitchers.

----------------------------------------------------------------------------------------------------------------------------------------

We want to thank SABR for putting on a spectacular conference despite having to go virtual. Thank you to Wash U’s Sports Analytics Club for giving us the opportunity to represent the school in this case competition, and congratulations to all of the teams who participated. We are looking forward to next year!

2021 SABR Analytics Diamond Dollars Recap

FIERS

Recent Posts

Comments

Mailing List