Part Two of Our Research on Power Development

The previous post introduced some research we undertook at the behest of my fantasy star Paul Goldschmidt. In short, I kept Goldschmidt from 2012 to 2013 on the hunch that his doubles and home runs (43 and 20, respectively, in 2012) would even out some, toward a ratio closer to 1-to-1 than 2-to-1. (Huzzah, he hit 36 doubles and 36 home runs in 2013.) So I decided to follow that up with a 10-year study of whether doubles (rather, doubles plus home runs, or all extra-base hits) offered more predictive power than home runs. It was for my own fantasy purposes that I began this study, but it ballooned into something enormous thanks to my inability to accept that my hypothesis was wrong. The number of avenues I investigated was excessive, but some of them proved interesting, so I’ve started sharing. Thus ends the recap.

My hypothesis wasn’t wrong per se, just poorly worded. I thought at first that doubles and home runs (2B+HR on most of my graphs) would prove more stable over time than plain home runs, feeling that players generally displayed a consistent amount of power and that it was mostly luck who determined whether that long fly ball went over the fence or bounced just short of it. That was proven wrong; home runs are more consistent year-to-year than 2B+HR, no matter how you slice them.

If I had reworded my original hypothesis to account for aging, I would have gotten a satisfactory answer sooner. As a player enters his prime, and his power develops, home runs should eat a bigger share of his total extra-base hit pie. Here’s one visualization of that idea.

xbhtypes

Again, everything weird about age 39 is Barry Bonds in 2004. THE MAN HAD A .609 OBP.

You can see the trends, but on this graph they look minor. Triples decline pretty much from the start, first counting for 10 percent of all XBH and declining with age to about five. Players get slow; that makes sense. As the triples vanish, home runs seem to replace them. However, the gains there are minimal, except at age 39 (see caption above). Doubles maintain their share of about 60 percent of all XBH.

But you are tired of hearing about dead ends. Did I find anything useful?

2bperhrbar

OH MY GOD BARRY STOP BREAKING MY GRAPHS.

Doubles per home run. That’s what it came down to at the start. Goldschmidt had a more than two doubles for each home run in 2012, far too many for a player of his talent, at his age. This final graph delineates three distinct plateaus throughout a young player’s development. From ages 20 to 24, the ratio holds steady around 1.9, then it dips to around 1.7 for ages 25 to 28, then dips again to 1.6 for ages 29 and 30. So you may expect the ratio of doubles to homers to even out with age, but not as much as I originally hoped; the average ratio never dips below 1.5-to-1, except for the Bonds-skewed 39 year olds. Of course, these figures could be refined by categorizing batters by type and measuring the ratio for each type. We won’t shut the door on that possibility.

***

Here’s an appendix of interesting graphs that didn’t fit into the discussion above.

agedistribution

Correction: It should read : “Batter-Seasons.” Also, I included only seasons of 100 plate appearances or more. Still, you get the idea.

The mean age in this sample was 28.3 years old. The graph shows that the mode for ages is 26. The difference is explained above, in the gradual downward slope. Veterans stick around in baseball. Speed isn’t as important as in other sports, home run power lasts through the early and mid 30s, their production is generally more reliable thanks to all the data of their past seasons, and don’t forget that this window includes the PED era. PED use grinds normal aging to a halt, and sometimes can turn the process around.

waravg

 

Baseball is a kind climate for veterans above 30, but younger players are more profitable in almost every sense. The line representing WAR per 600 PA shows how much more valuable youngsters are than veterans, on average. Younger players are not only more productive, but cheaper, under team control, steadily improving, more handsome and likely less prone to injury. If there was enough young talent to stock major league rosters, we would already have seen that happen. Alas, the dozen or so minor and international leagues that feed into the majors are playing so far below big-league level that Juan Uribe just got two years, $15 million from the Dodgers.

averagegamesThis graph was from early in our process, which is why it only covers the last five years. Still, it demonstrates that the few players who are talented enough to break into the bigs at 20 and 21 are also talented enough to start the majority of games. The more traditional path to regular playing time starts at age 22 on this graph. Thirty-year-olds have the highest average games played, yet their figure is still less than 75 percent of all games in a season, underscoring the ever-present need for positional depth.

***

Hope you enjoyed this. These kinds of data-driven posts will be occasional features here at Midnight Baseball. We’d love any feedback and quality control you have to offer.

 

Advertisements

Overthinking Paul Goldschmidt and Billy Butler: A Wild Goose Chase

Many of you have started reading this week. Thanks!

I like to construct narratives around my fantasy team, which currently operates under the name Goldschmidt’s Gold Shit. This was the dumb logo.

It’s a keeper league and I don’t think I’m gonna change it soon.

Behemoth slugger Paul Goldschmidt didn’t ascend to team leadership until early August, when he kicked Josh Reddick (and his Red Dick [SFW]) out of the clubhouse in a verbal altercation, his booming baritone resonating throughout the bowels of the stadium and every fan in it. You see, Reddick had been spending too much time at nighttime clubs, wielding his favorite toy and namesake, leaving himself depleted come gametime. Things all came to a head when Marco Scutaro screamed at Reddick in frustration, expressing incidentally some long-held and deep-seated ethno-linguistic tensions felt by pretty much everyone on the team, normally neglected in the daily performance of badass manhood. In other words, Scutaro was fucking fed up with the nickname “Scooter.” Long story short, Jose Iglesias was pushed into a table, paralyzed gruesomely, and released within minutes, while Scutaro turned his back on the team to follow the long cold path of revenge. In the end, it was Goldschmidt who stepped up to refocus the team, leading a bold charge up the rotisserie points standings all the way up to…third.

Anyway, I acquired Goldschmidt in late 2012 with a specific hunch suggesting a breakout for 2013. The hunch was that some of his doubles would turn into home runs, since I’ve heard a lot that power develops later than other skills. And Goldschmidt’s 2-to-1 ratio of doubles to home runs (43 to 20, to be exact) seemed abnormally high for a slugger of his caliber. So I figured he’d get two bumps in homers, one from growing into power and the other from a corresponding decrease in doubles. Even supposing Goldschmidt couldn’t age (as yet unconfirmed), I still would expect the doubles and home runs to average out.

Before 2012 it was reasonable to consider Kansas City DH Billy Butler a robot designed by some bored and impractical scientist to hit doubles with maximum efficiency in a believably human range. It was, I swear. Then he broke out for 29 dingers with only 32 doubles, evening out the ratio when previously it had been 2-to-1. The example of Butler, combined with the fulfilled promise of Goldschmidt this year, spurred me to study the league at large for a trend between doubles, home runs, and how power ages.

That study ballooned and ballooned as I struggled for find evidence for the hypothesis that home runs become a bigger share of extra base hits as hitters age. I didn’t always phrase the hypothesis that way, which was part of the problem. The answer of course was staring at me all along, but I’m glad I took myself on a wild goose chase, because I think I learned a lot of interesting things. See for yourself.

hbpavg

Wow! They can’t drink AND they get beaned disproportionately! I’m starting to pity all those 20-year-old ballplayers. Serious stuff starts below.

First I thought I should check the most consistent measures of power before getting caught up in the doubles search. So I got caught up in examining the year-to-year correlation of stats: HR, HR/PA, HR/Contact, HR/Air. “Contact” is shorthand for all plate appearances ending in contact. To get it I subtracted strikeouts, walks and HBPs from total plate appearances. “Air” is shorthand for all airborne batted balls. I used Fangraphs’ data to get fly balls and line drives. Might as well say all the data came from Fangraphs.

The pool of players I used started big and got bigger. I began by collecting all player-seasons from 2009 to 2013 with at least 100 PA, then filtered the pool so that only players who had 100 PA in consecutive seasons would be counted. Then I made a lot of what are in retrospect superfluous graphs that look like this example. I wasn’t satisfied, so I started over, doing the same thing for 2004-2013. I recorded all these R-squared figures to put into a summary bar graph.

summarywithbotheras

It looks nice, I’ll give it that. The dark bars signify the 2004-2013 window, and the half-transparent bars 2009-2013. Over the long run, no combination of home runs plus other extra-base hits offered more predictive power than home runs alone. Isolated Slugging Percentage (ISO) was the next-best thing after home runs; though it comes just behind 2B+HR/Contact, a stat with a shamefully long name, ISO wins given how readily it can be found on the internet.

Within the subset of home run stats, HR/Contact was the most consistent year-to-year. My guess as to why would be that it isolates home runs from outcomes based on other batting skills. Walks and strikeouts are heavily influenced by a batter’s plate discipline, which is a separate skill (some say the sixth baseball tool) and itself an ever-evolving attribute. Thus HR/PA can be influenced by a sharp spike or drop of restraint at the plate in a way that HR/Contact is not. Whatever the underlying reason, HR/Contact ought to be of some use for projections. I for one will be using it for fantasy purposes.

I imagine the propensity to hit fly balls or ground balls stems from a batter’s swing mechanics, which are etched into a player’s muscle memory long before the major leagues. Such differences distinguish batters into types: the slap hitter, the slugger, the veteran bat-control guy, etc. Performance may wax and wane, but those kind of batting identities are ingrained. Plus, batted balls come in hundreds while home runs and doubles in tens; there’s simply more data. All these, reasons why Air/PA and Air/Contact are more consistent than the rest of the stats in that graph.

***

threeoutcomes

The bumps at 39 are Barry Bonds’ legacy. Dude broke a lot of calculators.

There are gains in the rates of walks and home runs, only they are tough to discern at this scale. I have other graphs where the differences are obvious, and I’ll make them public with all these graphs, even the ones that make no sense now that I think about it, on my Google Drive later this week.

Anyway, the decline in strikeouts is obvious, and lasts throughout a player’s thirties. Walks rise less dramatically, but never stop rising–at least until 40. Overall, strikeouts are reduced more than walks are increased, resulting in more plate appearances that end in contact as a player ages.

hrcontact

Moreover, there are basically no gains in HR/Contact with age. Mere tenths of a percent. Perhaps you wish to identify a plateau from ages 25 to 28 that is distinctly higher than a plateau from 21 to 24. The spike around thirty probably isn’t happenstance, given that my pool was 4394 player-seasons large. The spike at 39 is Barry Bonds again.

So far, I hadn’t found anything that convinced me of my theory behind home run surges. Could it really be the accumulation of all these little marginal differences? Is this the whole story: slighty fewer strikeouts, plus a few more walks, plus more plate appearances on average, plus small percentage points gained in home run rate, equals more home runs? It’s possible but I might have missed something, I thought. I couldn’t get the question off my mind, all thanks to Goldschmidt and Billy Butler. Tomorrow I’ll go into some of my later approaches.

Two All-West Trades, Let’s Savor Them

(Sorry, White Sox.)

Angels receive LHP Tyler Skaggs from Diamondbacks and RHP Hector Santiago from White Sox.
Diamondbacks receive 1B Mark Trumbo and RHP A.J. Schugel from Angels and OF Brandon Jacobs from White Sox.
White Sox receive CF Adam Eaton from Diamondbacks.

We went into Trumbo’s power yesterday (see the sidebar), so we’ll get into the other aspects of his game for Arizona. The behemoth GOLDSchmidt does not retreat from first base for any mortal being; Trumbo will probably play left field. The Dbacks had Fangraphs’ highest team defensive rating last year, and Adam Eaton was considered a rangy defender with a capable arm. But if you check out the data, you’ll see that Eaton actually had a drastically negative defensive score, -11.2, equal to Jason Kubel. Yeesh, that’s ugly, and probably a poor measure of Eaton’s talent, given the sample size. All the same, Eaton’s negative figure did not knock Arizona out of first place, nor did Kubel’s (the two combined for about 500 PA). Trumbo could honestly be that bad, but the defense as a whole will remain elite.

The Angels get two young, cost-controlled pitchers who are bound to them for many moons. If you’ve found this blog, I don’t need a long ass paragraph to convince you that’s a good thing.

***

Rockies receive LHP Brett Anderson from A’s
A’s receive LHP Drew Pomeranz and RHP Chris Jensen from Rockies

We’ve long been bullish on Brett Anderson, and we view his injury history as just an unlikely clustering of obstacles to his success. Anderson has an excellent pair of breaking pitches and locates them both well. His groundball rate is a bigger asset at Coors than it is elsewhere. Anderson could be the best pitcher on the Rockies for the next two years, after which he’ll become a free agent. If it goes the other way and he keeps getting hurt, all the Rockies gave up was a middle-of-the-rotation starter and a minor league arm. Not bad for a team currently out of the playoff picture.

Drew Pomeranz might become the next young A’s pitcher to blossom under pitching coach Curt Young. You figure A.J. Griffin, Sonny Gray, Jarrod Parker and Scott Kazmir all come above him in the rotation. That leaves Pomeranz in competition with Tommy Milone (another lefty) and Dan Straily (a righty). If he doesn’t distinguish himself in the spring or early in the season, he’ll come into play later on as a testament to the wonders of Oakland’s depth. He’s much cheaper than Anderson, too, meaning the A’s could continue to stock their bench this winter (with great dividends come summer, the opposite of the ant in the fable).

***

Despite the prevailing criticism of Kevin Towers and the Diamondbacks, we think all four teams made smart decisions yesterday. Arizona’s isn’t far from the wild card or even the division. Look at this scatterplot of runs scored and allowed per game. The teams on the bottom right almost universally made the postseason. The Diamondbacks are on the fringe of that group. They addressed an offensive need, and the resulting defensive sacrifice doesn’t affect their standing as an elite defensive team. Now all they need is a couple of proven starters and a rebound year for the bullpen. Bullpens rebound all the time.

Last year’s Angels had laughable pitching depth. Bear witness if you have a strong stomach. Santiago and Skaggs should take most of the 254 and two-thirds innings that went to Jerome Williams and Joe Blanton. If they self-improve on top of that, huzzah! They’ll be Mike Trout’s teammates for a while.

The Rockies could be a playoff contendah in the next two years if they hit on the Anderson gamble. If it busts, they’re in the same position they are now.

The A’s, already a very good team, they don’t need an expensive boom-or-bust pitcher when they have six others at the same position. Acquiring cheap depth like Pomeranz allows for more cheap depth. That’s how they succeeded last year. And even with better Angels and Rangers playing opposite, I’d count on it happening again.

The Case for Mark Trumbo

Rumors of the MLB trade variety suggest that Angels third baseman/hitter Mark Trumbo might soon be on the move to Arizona as part of a three-team deal involving also the White Sox. From what I’ve seen, the internet’s opinion (served hot, in take-out form) has been critical of Trumbo and the Diamondbacks for targeting him. We say Trumbo is written off unfairly thanks to sabermetric cynicism.

Time for us to be clear: we came of age in the present era of sabermetric explosion. Advanced stats are second nature to us, and beautiful; OBP is like the Mona Lisa and batting average is this thing (NSFW?). I’ve hated pitcher wins since I was 12 years old. So we know the criticism of Trumbo’s plate discipline is valid, that any OBP below .300 should be considered untenable by a sound-minded front office. But we don’t agree that Trumbo’s OBP will stay that way in the next few years. And we don’t think that the sabermetric community fully appreciates his power.

Recently we’ve been studying the year-to-year correlations of stats that express a hitter’s power. With pretty much the same data, we’ve also been studying aging patterns over the last ten seasons (2004-2013). In this article we’ll use some of our findings to talk about Trumbo. The whole shebang will be presented later this week. If you care about these things, keep in mind that we limited our research to player-seasons with at least 100 plate appearances.

Let’s start with the power stats. Below is a bar graph showing the year-to-year R-squared value for a bunch of different stats. The higher the bar, the more consistent the stat is from one season to the next. For now, focus only on the green bars; they are concerned only with home runs.

summaryrsquarednoleged

(Data taken from Fangraphs.com)
Air = FB + LD
Contact = PA – (BB + K + HBP)

Home Runs per Contact (HR/Contact) has the highest R-squared value of the bunch. In terms of predictive power, it’s better than plain Home Runs, Home Runs per Plate Appearance (HR/PA), and Home Runs per Airborne Ball (HR/Air). It’s better than Isolated Slugging Percentage (ISO), itself way better than regular Slugging Percentage. HR/Contact isn’t difficult to calculate, either. You only need five stats: home runs, plate appearances, walks, strikeouts, and hits by pitch. (Hits by pitches? HBP, you know what I mean.)

HR/Contact is better because, more than any other stat, it separates a hitter’s power from the rest of his batting skills. It does not care about how many times a batter walks or whiffs, all it knows is how often the ball goes out when that batter does make contact. For Trumbo, that means appreciating his raw power free of context. By context we mean his prolific out-making. Bear with us here. We’ll re-contextualize him by the end.

Trumbo has been a regular player for three years, so we compared him to the best power hitters of those years (2011-2013), guys with at least 1000 PA over that span. Of the 30 players with the most home runs, Trumbo ranks 11th in HR/Contact. Of the 30 players with the highest ratio of home runs to fly balls, Trumbo ranks 12th in HR/Contact. Here’s a table of the latter group.

trumbohrcontact

I like this group better because it has Trumbo’s potential future-teammate GOLDSCHMIDT in there for a nice comparison. (Click to engorge)

Based on this data, it’s easy to come up with crude tiers of raw power. (Crude things usually are easy, and fun.) Chris Davis and Giancarlo Stanton are clearly the elite mofos of the present day. The group from Adam Dunn to Mark Reynolds can claim to be distinct from those two above them and the morass below–Tier 2. Trumbo definitely belongs to Tier 3, however large you want to make that. Let’s say you’re Mother Teresa. You want to be generous with your rankings, so you define Tier 3 as anything above seven percent. Trumbo would sit in the upper half of that tier, above a lot of other people who are more celebrated than he. All told, only 21 players have a HR/Contact over seven percent, a.k.a. fewer than one player per team. If the Diamonbacks pulled this trade off, they’d have one, and another guy by the name of Goldschmidt, currently 23rd, a near-certain lock to crack the top 20 by the end of next year. That might just be the most powerful duo in the league.

You already knew about Trumbo’s power, though maybe you didn’t know the extent of it. Still, you’re skeptical of his plate discipline, and of his somewhat-related ability to avoid strikeouts. However, the aging data we’ve studied suggests that in the next two or three years Trumbo is likely to draw more walks and strike out less. Since 2011, his walk rate, according to Fangraphs, has risen steadily from 4.4% to 6.1% to 8.0% last year. His strikeouts have actually increased, however, bucking the traditional trend illustrated below.

kpa

(Data taken from Fangraphs.com)

That steady decline across all ages bodes well for Trumbo, even though he hasn’t yet demonstrated a prolonged improvement. Batters find avoiding strikeouts easier as they age and gain MLB experience. Trumbo is a professional like the rest of them, and in his prime: the smart bet would be that he figures something out and shaves a couple of percentage points off his abnormally high strikeout rate. Especially if he bats in front of Goldschmidt, and you’re a believer in lineup protection (I think I am). Fewer strikeouts of course lead to more plate appearances ending in contact, so that Trumbo would get about two dozen more chances to put one in the seats. And if you think–with bias, presumably–that his gains in walk rate are bogus, then it stands he’ll have even more chances to make contact. That’s when Trumbo is dangerous.