Overthinking Paul Goldschmidt and Billy Butler: A Wild Goose Chase

Many of you have started reading this week. Thanks!

I like to construct narratives around my fantasy team, which currently operates under the name Goldschmidt’s Gold Shit. This was the dumb logo.

It’s a keeper league and I don’t think I’m gonna change it soon.

Behemoth slugger Paul Goldschmidt didn’t ascend to team leadership until early August, when he kicked Josh Reddick (and his Red Dick [SFW]) out of the clubhouse in a verbal altercation, his booming baritone resonating throughout the bowels of the stadium and every fan in it. You see, Reddick had been spending too much time at nighttime clubs, wielding his favorite toy and namesake, leaving himself depleted come gametime. Things all came to a head when Marco Scutaro screamed at Reddick in frustration, expressing incidentally some long-held and deep-seated ethno-linguistic tensions felt by pretty much everyone on the team, normally neglected in the daily performance of badass manhood. In other words, Scutaro was fucking fed up with the nickname “Scooter.” Long story short, Jose Iglesias was pushed into a table, paralyzed gruesomely, and released within minutes, while Scutaro turned his back on the team to follow the long cold path of revenge. In the end, it was Goldschmidt who stepped up to refocus the team, leading a bold charge up the rotisserie points standings all the way up to…third.

Anyway, I acquired Goldschmidt in late 2012 with a specific hunch suggesting a breakout for 2013. The hunch was that some of his doubles would turn into home runs, since I’ve heard a lot that power develops later than other skills. And Goldschmidt’s 2-to-1 ratio of doubles to home runs (43 to 20, to be exact) seemed abnormally high for a slugger of his caliber. So I figured he’d get two bumps in homers, one from growing into power and the other from a corresponding decrease in doubles. Even supposing Goldschmidt couldn’t age (as yet unconfirmed), I still would expect the doubles and home runs to average out.

Before 2012 it was reasonable to consider Kansas City DH Billy Butler a robot designed by some bored and impractical scientist to hit doubles with maximum efficiency in a believably human range. It was, I swear. Then he broke out for 29 dingers with only 32 doubles, evening out the ratio when previously it had been 2-to-1. The example of Butler, combined with the fulfilled promise of Goldschmidt this year, spurred me to study the league at large for a trend between doubles, home runs, and how power ages.

That study ballooned and ballooned as I struggled for find evidence for the hypothesis that home runs become a bigger share of extra base hits as hitters age. I didn’t always phrase the hypothesis that way, which was part of the problem. The answer of course was staring at me all along, but I’m glad I took myself on a wild goose chase, because I think I learned a lot of interesting things. See for yourself.


Wow! They can’t drink AND they get beaned disproportionately! I’m starting to pity all those 20-year-old ballplayers. Serious stuff starts below.

First I thought I should check the most consistent measures of power before getting caught up in the doubles search. So I got caught up in examining the year-to-year correlation of stats: HR, HR/PA, HR/Contact, HR/Air. “Contact” is shorthand for all plate appearances ending in contact. To get it I subtracted strikeouts, walks and HBPs from total plate appearances. “Air” is shorthand for all airborne batted balls. I used Fangraphs’ data to get fly balls and line drives. Might as well say all the data came from Fangraphs.

The pool of players I used started big and got bigger. I began by collecting all player-seasons from 2009 to 2013 with at least 100 PA, then filtered the pool so that only players who had 100 PA in consecutive seasons would be counted. Then I made a lot of what are in retrospect superfluous graphs that look like this example. I wasn’t satisfied, so I started over, doing the same thing for 2004-2013. I recorded all these R-squared figures to put into a summary bar graph.


It looks nice, I’ll give it that. The dark bars signify the 2004-2013 window, and the half-transparent bars 2009-2013. Over the long run, no combination of home runs plus other extra-base hits offered more predictive power than home runs alone. Isolated Slugging Percentage (ISO) was the next-best thing after home runs; though it comes just behind 2B+HR/Contact, a stat with a shamefully long name, ISO wins given how readily it can be found on the internet.

Within the subset of home run stats, HR/Contact was the most consistent year-to-year. My guess as to why would be that it isolates home runs from outcomes based on other batting skills. Walks and strikeouts are heavily influenced by a batter’s plate discipline, which is a separate skill (some say the sixth baseball tool) and itself an ever-evolving attribute. Thus HR/PA can be influenced by a sharp spike or drop of restraint at the plate in a way that HR/Contact is not. Whatever the underlying reason, HR/Contact ought to be of some use for projections. I for one will be using it for fantasy purposes.

I imagine the propensity to hit fly balls or ground balls stems from a batter’s swing mechanics, which are etched into a player’s muscle memory long before the major leagues. Such differences distinguish batters into types: the slap hitter, the slugger, the veteran bat-control guy, etc. Performance may wax and wane, but those kind of batting identities are ingrained. Plus, batted balls come in hundreds while home runs and doubles in tens; there’s simply more data. All these, reasons why Air/PA and Air/Contact are more consistent than the rest of the stats in that graph.



The bumps at 39 are Barry Bonds’ legacy. Dude broke a lot of calculators.

There are gains in the rates of walks and home runs, only they are tough to discern at this scale. I have other graphs where the differences are obvious, and I’ll make them public with all these graphs, even the ones that make no sense now that I think about it, on my Google Drive later this week.

Anyway, the decline in strikeouts is obvious, and lasts throughout a player’s thirties. Walks rise less dramatically, but never stop rising–at least until 40. Overall, strikeouts are reduced more than walks are increased, resulting in more plate appearances that end in contact as a player ages.


Moreover, there are basically no gains in HR/Contact with age. Mere tenths of a percent. Perhaps you wish to identify a plateau from ages 25 to 28 that is distinctly higher than a plateau from 21 to 24. The spike around thirty probably isn’t happenstance, given that my pool was 4394 player-seasons large. The spike at 39 is Barry Bonds again.

So far, I hadn’t found anything that convinced me of my theory behind home run surges. Could it really be the accumulation of all these little marginal differences? Is this the whole story: slighty fewer strikeouts, plus a few more walks, plus more plate appearances on average, plus small percentage points gained in home run rate, equals more home runs? It’s possible but I might have missed something, I thought. I couldn’t get the question off my mind, all thanks to Goldschmidt and Billy Butler. Tomorrow I’ll go into some of my later approaches.

The Case for Mark Trumbo

Rumors of the MLB trade variety suggest that Angels third baseman/hitter Mark Trumbo might soon be on the move to Arizona as part of a three-team deal involving also the White Sox. From what I’ve seen, the internet’s opinion (served hot, in take-out form) has been critical of Trumbo and the Diamondbacks for targeting him. We say Trumbo is written off unfairly thanks to sabermetric cynicism.

Time for us to be clear: we came of age in the present era of sabermetric explosion. Advanced stats are second nature to us, and beautiful; OBP is like the Mona Lisa and batting average is this thing (NSFW?). I’ve hated pitcher wins since I was 12 years old. So we know the criticism of Trumbo’s plate discipline is valid, that any OBP below .300 should be considered untenable by a sound-minded front office. But we don’t agree that Trumbo’s OBP will stay that way in the next few years. And we don’t think that the sabermetric community fully appreciates his power.

Recently we’ve been studying the year-to-year correlations of stats that express a hitter’s power. With pretty much the same data, we’ve also been studying aging patterns over the last ten seasons (2004-2013). In this article we’ll use some of our findings to talk about Trumbo. The whole shebang will be presented later this week. If you care about these things, keep in mind that we limited our research to player-seasons with at least 100 plate appearances.

Let’s start with the power stats. Below is a bar graph showing the year-to-year R-squared value for a bunch of different stats. The higher the bar, the more consistent the stat is from one season to the next. For now, focus only on the green bars; they are concerned only with home runs.


(Data taken from Fangraphs.com)
Air = FB + LD
Contact = PA – (BB + K + HBP)

Home Runs per Contact (HR/Contact) has the highest R-squared value of the bunch. In terms of predictive power, it’s better than plain Home Runs, Home Runs per Plate Appearance (HR/PA), and Home Runs per Airborne Ball (HR/Air). It’s better than Isolated Slugging Percentage (ISO), itself way better than regular Slugging Percentage. HR/Contact isn’t difficult to calculate, either. You only need five stats: home runs, plate appearances, walks, strikeouts, and hits by pitch. (Hits by pitches? HBP, you know what I mean.)

HR/Contact is better because, more than any other stat, it separates a hitter’s power from the rest of his batting skills. It does not care about how many times a batter walks or whiffs, all it knows is how often the ball goes out when that batter does make contact. For Trumbo, that means appreciating his raw power free of context. By context we mean his prolific out-making. Bear with us here. We’ll re-contextualize him by the end.

Trumbo has been a regular player for three years, so we compared him to the best power hitters of those years (2011-2013), guys with at least 1000 PA over that span. Of the 30 players with the most home runs, Trumbo ranks 11th in HR/Contact. Of the 30 players with the highest ratio of home runs to fly balls, Trumbo ranks 12th in HR/Contact. Here’s a table of the latter group.


I like this group better because it has Trumbo’s potential future-teammate GOLDSCHMIDT in there for a nice comparison. (Click to engorge)

Based on this data, it’s easy to come up with crude tiers of raw power. (Crude things usually are easy, and fun.) Chris Davis and Giancarlo Stanton are clearly the elite mofos of the present day. The group from Adam Dunn to Mark Reynolds can claim to be distinct from those two above them and the morass below–Tier 2. Trumbo definitely belongs to Tier 3, however large you want to make that. Let’s say you’re Mother Teresa. You want to be generous with your rankings, so you define Tier 3 as anything above seven percent. Trumbo would sit in the upper half of that tier, above a lot of other people who are more celebrated than he. All told, only 21 players have a HR/Contact over seven percent, a.k.a. fewer than one player per team. If the Diamonbacks pulled this trade off, they’d have one, and another guy by the name of Goldschmidt, currently 23rd, a near-certain lock to crack the top 20 by the end of next year. That might just be the most powerful duo in the league.

You already knew about Trumbo’s power, though maybe you didn’t know the extent of it. Still, you’re skeptical of his plate discipline, and of his somewhat-related ability to avoid strikeouts. However, the aging data we’ve studied suggests that in the next two or three years Trumbo is likely to draw more walks and strike out less. Since 2011, his walk rate, according to Fangraphs, has risen steadily from 4.4% to 6.1% to 8.0% last year. His strikeouts have actually increased, however, bucking the traditional trend illustrated below.


(Data taken from Fangraphs.com)

That steady decline across all ages bodes well for Trumbo, even though he hasn’t yet demonstrated a prolonged improvement. Batters find avoiding strikeouts easier as they age and gain MLB experience. Trumbo is a professional like the rest of them, and in his prime: the smart bet would be that he figures something out and shaves a couple of percentage points off his abnormally high strikeout rate. Especially if he bats in front of Goldschmidt, and you’re a believer in lineup protection (I think I am). Fewer strikeouts of course lead to more plate appearances ending in contact, so that Trumbo would get about two dozen more chances to put one in the seats. And if you think–with bias, presumably–that his gains in walk rate are bogus, then it stands he’ll have even more chances to make contact. That’s when Trumbo is dangerous.