By Devlin Sullivan
Which of these home runs is more impressive, Yordan Alvarez's 418-foot upper-deck blast, or Yandy Diaz's wall-scraping 423 foot home run to straightaway center?
In developing this metric, I was inspired by the inability of Statcast's distance metric to encapsulate how aesthetically impressive home runs are. In doing this, I modified existing Statcast data to create several important inputs, watched around 70 home runs and scored them on a scale of 1-10 based on how awesome they looked, in order to train a random forest regression that I would apply to all dingers hit during the 2019 regular season. At the end of this article, I'll link the 10 highest and 5 lowest scoring round trippers.
Before running the regression I first had to turn the mostly nonsensical 'hc_x' and 'hc_y' figures into something that could be turned into a spray angle figure. Using this article, the home runs could be turned into spray angle using arctan. In the final version it was normalized for righty/lefty batters with +45 being pulled and -45 being opposite field. The spread of home runs can be seen in the plot, and it should be noted that the x and y axes are essentially unitless.
Other inputs include the % difference between the actual distance of the HR and the expected difference of the HR based on launch angle, and the same metric with spray angle. Additionally the vertical and horizontal components of exit velocity are isolated to quantify whether a home run skews more towards a pop-up or a line drive.
The relative importances of the inputs are outlined below.
importance
pct_diff_SA 0.584937
hit_distance_sc 0.130540
adjpull_angle 0.091515
pct_diff_LA 0.089740
vertical_velo 0.054763
launch_speed 0.024102
horizontal_velo 0.009165
xd_launch angle 0.008390
launch_angle 0.006849
There are a number of limitations as to the usefulness of this project, the foremost being that it's not particularly useful from a player valuation standpoint. The second limitation is that the decimal places in the score calculation generally outstrips the precision of the data. If this were to become a real Statcast metric, it would probably be best presented in a way similar to the star-based catch system.
1.66% of outfield outs recorded were 5-star catches, 6.41% were 4-star, 20.15% were 3-star, 35.57% were 2-star, and 36.21% were 1-star. Applying this same distribution to the home run regression results, home runs scoring above 8.3/10 is a 5-star home run, scoring above 7.0 is a 4-star home run, above 5.3 is a 3-star, above 3.7 is a 2-star, and lower is a 1-star home run.
All told, I think that this metric adds something missing to the currently available data, and sorting through the leader-board offers more entertainment value than just watching the longest home runs.
Here are the Top 10 most aesthetically pleasing home runs of the 2019 season in reverse order:
6. Nomar Mazara on June 21 (longest HR of the season, only one in top 15 in distance on this list)
And the 5 Least:
Comments