2019 SABR Analytics Diamond Dollars Recap

WashU Sports Analytics
Mar 10, 2020
10 min read

By Aaron Margulis

Intro

This article is meant to summarize WashU’s winning entry in the 2019 Society for American Baseball Research (SABR) Diamond Dollars Case Competition. Diamond Dollars is an annual case competition at the SABR Analytics Conference in Phoenix, Arizona during Spring Training. The competition consists of twenty teams, each comprised of four to five high-school, undergraduate, or graduate students, attempting to quantitatively answer a major question in baseball at the time of the conference. In 2019, WashU’s Sports Analytics Club sent a team of five – Aaron Margulis, Devlin Sullivan, Johnnie Teng, Noah Kastelman, and Sam Linker – who went on to win the competition for WashU’s second year running. With the 2020 competition coming up on the weekend of March 13th-15th, we wish all the contestants good luck!

On February 28, 2019, 2019 Diamond Dollars competitors were sent the following prompt entitled, “Developing a Strategy for Pitching Usage.” To summarize, in 2018 we saw a new trend emerging in the MLB in which a number of teams started games with a relief pitcher on the mound. This “opener” pitcher would typically pitch one to two innings before a traditional starter would enter and proceed to pitch for another five-or-so innings. In 2018, 91 games were started by 28 different pitchers who each averaged 1.6 innings or less per start. Though there has been much discussion about the effectiveness of this new strategy, it has been largely subjective. As Vince Gennaro, the author of Diamond Dollars: The Economics of Winning in Baseball amongst many other impressive accolades, puts it,

This case aims to introduce objective analysis into the pitching usage question in order to elevate the public dialog to a higher plane. In this case I’m asking you to lay out a strategic framework for an MLB team to make decisions about the optimal way to deploy their pitching assets … You can define broader strategic principles that any teams should follow, or you can define your strategies in the context of a specific 2018 MLB team.

We elected to approach this case using a combination of broad strategy and team-specific results, and accomplished this by constructing a general framework which we then applied to each of the thirty 2018 MLB pitching staffs and evaluated through simulation. The rest of this article will cover how we selected rotations for individual teams using our general approach, how we quantified the impact of our rotational adjustments, and the results of these processes.

This is the video of our winning, encore presentation, and here are the slides to go along with it.

Rotation Determination

The first step of our combined approach was to devise a general, objective structure in order to construct pitching rotations for individual teams. Before conducting any analysis, we had to collect data as SABR did not provide any alongside the 2019 prompt. We collected end-of-season 2018 MLB rosters, and gathered these players’ individual pitching statistics against each spot in the batting order using the baseball-reference.com play index. Since we used end-of-season rosters, our exact quantitative findings were under the incorrect assumption that no player changed teams during the course of the 2018 season. This is something to keep in mind when digging into the exact numbers or orderings presented in our Results sections, however our general findings regarding the effectiveness of opening pitchers still holds true.

The concept of using an opening pitcher is derived from the idea that a lineup’s top three hitters are its best, and using a reliever to start the game can not only give the pitching team better initial matchups, it also decreases the likelihood of the best three opposing batters seeing the same pitcher thrice during one game. Because of this, and in order to, in effect, triple our sample size, we grouped our pitching lineup data into thirds (one through three hitter, four through six, seven through nine) rather than individually using the nine spots we originally collected.

Strikeout rate (K%) stabilizes at around 70 batters faced, so we only considered pitchers who faced at least 210 batters in the 2018 MLB season (210 resulting from an approximate minimum of 70 batters faced from each third of the lineup). K% is one of the first of a pitcher’s stats to stabilize, though, so we had to make further adjustments to the sample we would consider. Home run rate (HR%) takes significantly longer than K% to stabilize, so we instead used a pseudo-HR% metric, calculated as fly ball rate (FB%) times the league average HR/FB ratio. This is actually standard practice in xFIP, our first use of pseudo-HR% as you will soon see. The last of the three event-rates considered at this point in our analysis was walk rate (BB%), which stabilizes around 170 batters faced. We made an additional distinction at 510 batters faced (170 times three) between starter-ones (S1s) who faced at least 510 batters in 2018 and starter-twos (S2s) who face no more than 509. We’ll explain where this distinction becomes relevant shortly.

To categorize relief pitchers, we used Tom Tango’s leverage index to separate high and middle/low leverage relievers. We categorized high leverage (HL) relievers as those with a leverage index greater than or equal to 1.55, and all other relievers were considered low leverage (LL).

We then calculated xFIP against the top third of the lineup for all pitchers deemed to have a sufficient 2018 sample size. We found a normal distribution of xFIPs, and so we computed each pitcher’s top-third-of-the-lineup-xFIP z-score. We ranked each pitcher within every MLB pitching staff by this z-score, and assigned starters, openers, bridges, and relievers accordingly abiding by the previously described leverage and usage rules. We also eliminated the possibility of opening with a lefty reliever, as, even if they were low leverage, we assumed they might typically be saved for late game situations. We used the distinction between S1s and S2s to prioritize otherwise evenly ranked starters, assuming the S1 is better than his S2 counterpart.

We assigned low leverage (LL) righty relievers to open for S1s and S2s with higher z-scores (higher z-scores are undesirable because a higher xFIP is worse for pitchers). We allowed each opener to open twice throughout a five-man rotation under an assumption that starters can each start 32 games throughout a healthy season, and relievers can throw 65-70 innings per year. Here is an example of the final result of this process showcasing the 2018 Tampa Bay Rays:

Interestingly, this is quite similar to what we observed from the Rays and their implementation of the opener strategy in real life during the 2018 season.

Below is a graphic showing the number of spots that we determined should have been opened for in a five-man rotation for each MLB team in 2018:

An interesting observation about this distribution is that there are playoff teams in every bracket, showing that the effectiveness of openers varies from pitching staff to pitching, and that there are serious playoff implications of this strategy.

Simulation

With our rotations in place, we needed a way to quantitatively measure their expected performance. Since nearly all of the opener-bridge combinations our classification system generated were never implemented in the 2018 MLB season, we decided the best approach to measure them was via simulation. Before getting into the process of the simulation, we first defined four main assumptions in order to control as many variables as possible. These assumptions were:

When simulating the performance of opener-bridge pitcher sequences, the opener is to always pitch three outs regardless of how many batters he faces, and the bridge will then pitch through the 27th batter regardless of how many batters the opener faced.
When simulating the performance of traditional starter-reliever pitcher sequences, the starter is to always face the first 24 batters, and the reliever will then pitch the following three outs regardless of when he entered the game.
We will only simulate nine innings, as, based on assumptions (1) and (2), it is impossible for either our starter/bridge or opener/reliever to still be on the mound in extra innings. In the likely scenario, however, where one of these two pitchers doesn’t even make it through the ninth, we will replace them with an average MLB bullpen pitcher, as defined by the average performance of non-starting pitchers in the 2018 MLB season, for the remainder of the first nine innings.
Hitters will have league average stats by lineup position, as defined by the average performance of batters in each lineup position during the 2018 MLB season.

With these assumptions in place, we began to write our simulation in Python. We found a 2014 Fangraphs article titled “The Outcome Machine: Predicting At Bats Before They Happen,” which did just that, and was able to predict at-bat outcome likelihoods with astounding accuracy given only basic pitcher and hitter event rates. The regressions found in that article and used in our simulation are summarized in the following table:

Given the pitcher-hitter matchups as defined by our assumptions, the expected probability of each possible outcome in each of those matchups (derived from the above regressions), and a random number between zero and one, we were able to effectively simulate the outcome of each hypothetical plate appearance. We ran 500 simulations for each pitcher sequence, therefore simulating each pitcher combination 1,000 times. We chose 500 because it is a round number at which our findings were quite stabilized – we found a standard deviation of only about 0.1 runs created for each pitcher sequence when extrapolated over the course of a season, an essentially negligible error as you’ll see in our later Results section. From these simulations, we generated the expected number of plate appearances for each position in the lineup in each inning. Here is an image we used to illustrate this in our slides, however keep in mind that this image shows probabilities (capped at 100%) instead of expected plate appearances (what we actually used with no upper limit) because we felt this simplified the explanation during the presentation. In this table, we also combined the probabilities of all three pitchers (Lou Trivino, Edwin Jackson, Average Reliever), although in practice you can think of each pitcher having their own such table:

With the probabilities of each possible outcome given by the regression, we could also calculate the expected value of each of these matchups instead of only their likelihood as shown in the above table. To do this, we used runs created linear weights for the 2018 season as calculated by Fangraphs. Fangraphs adjusts each value such that the average plate appearance has a runs created value of zero, hence why outs have a negative runs created value in the next graphic. This means that a positive runs created value for a pitcher-hitter matchup suggests an advantage for this hitting team, while a negative value indicates an advantage for the pitching team. Below are the weights, along with an example of how they would be implemented in a specific pitcher combination. Keep in mind we don’t need to use a simulation for this part, as we simply multiply the event weights by the corresponding event probabilities generated using regressions. The pitcher combination we use in this example is the same as in the above probability table; that is Lou Trivino as an opener (reliever) and Edwin Jackson as a bridge (starter):

If we combine the expected number of specific pitcher-hitter matchups as well as the value of each of them (if we combine the information described above and depicted in the previous two tables), we can generate the following table:

The “Runs Created” number on the right represents the sum of this table, showing that the Lou Trivino opening-Edwin Jackson bridge combination is 0.72 runs per game worse than the average pitching rotation combination. The being said, when conducting the same analysis with Jackson as a starter and Trivino as a reliever, we calculated an expected runs created value of 0.83. This led us to conclude that the Oakland A’s could benefit from opening for Jackson with Trivino by nearly 0.11 runs saved per game.

Results

We extrapolated these results to represent the runs saved over the course of a season by multiplying the opener-traditional difference by 32 games, as we assumed each combination would start 32 times during a season. Conducting the simulation on all of our pitcher combinations, we derived the following season runs saved results:

Of the 62 pitcher combinations we generated through our Rotation Determination process, we found that implementing the opener-bridge strategy would save runs in 59 instances (the three negative instances were Maton-Richard for the Padres, Edwards-Montgomery for the Cubs, and Stanek-Chrinos for the Rays).

By using the standard ten runs equals one win wOBA translation, we found that over the 2018 season the following MLB teams (all thirty teams minus the five teams from the zero-opener bracket in the earlier Rotation Determination section graphic) could have saved the following number of runs and added the corresponding number of wins:

Some more general findings of ours included that the average 2018 MLB pitching staff was composed of 3.93 S1s, 2.73 S2s, and 4.8 relievers (HL and LL). The remaining one to two spots would be filled by shuttled players and injured players who didn’t register enough batters faced in 2018 to qualify for any of our rotation positions.

Limitations and Further Uses

We’d be remiss if we didn’t admit the limitations of this research. We were only given a week to complete this project from devising an approach to concluding our results, so inevitably there are some shortcomings. Maybe the largest shortcoming is in terms of the data we used – a researcher’s output can only ever be as good as their input. Our sample size was limited to 2018 regular season data, and as a result we weren’t able to use extremely granular data such as righty-lefty splits which would likely provide additional insight. There also might have been noise in some of our pitcher’s event rates even though we only considered those with somewhat substantial sample sizes. If this research were to be continued in the future, we might also want to incorporate Statcast data to improve the accuracy of these expected event rates. Along those lines, we assumed each pitching staff faced a league average lineup every game and throughout the entirety of each game. Maybe if we accounted for scheduling and used more granular opponent hitting data, we’d find that there are some games in which our openers should wait to come in as relief, and others where they should open.

Lastly, something that cannot be quantitatively accounted for yet is a huge limitation to the accuracy of our rotation propositions, is player psychology and player resistance. This one-size-fits-all approach would fail to fit Madison Bumgarner, for example, who told reporters in 2018, “If you use an opener in my game I’m walking right out of the ballpark.” Some other pitchers who aren’t as outspoken about the change would likely still be as reluctant or at least have a different psyche during their time on the mound which could impact their performance and tarnish our results. Nonetheless, we believe that our generalized approach is proof that there is serious, statistically-backed reason for many teams to implement the new opener pitching strategy in Major League Baseball, and we can’t wait to see what clubhouses and front offices have in store for 2020.

2019 SABR Analytics Diamond Dollars Recap

Recent Posts

Kommentarer

Mailing List