Page 3 of 4

Posted: April 18 07, 7:54 pm
by Phyrkrakr
Mephistopheles wrote:
Phyrkrakr wrote:So, are the graphs above saying that there aren't any real line drive hitters in the league, or that line drive percentage doesn't really correspond to BABIP?
The first one is suggesting that line drive percentage slighly corresponds to BABIP for a hitter. The second one suggests there aren't any consistent line drive hitters in the league.

Ironically enough I'd expect LD% and BABIP to correlate higher for pitchers, just as is factors out speed to a certain extent.
I don't see how the second graph shows that. It certainly shows that some guys hit more line drives in a season than others, but if you plot a 45 deg. line through the graph (slope of 1) any hitters that stay close to that line throughout multiple years would be consistent line drive hitters. For example, AP, over his career, has a low LD% of 17 and a high of 22.5%. His average LD% is 19.88% with a standard deviation of 2.38. That means, if I'm doing this right, that his line drive percentage is pretty constant, with most of the data falling within the standard deviation. The only big outlier is the 17% in 2004.

The other outliers fall within 10% of the standard deviation, which means that's those aren't much of an outlier, and, since there's some disagreement/human error in what defines a line drive, that data is within fudging distance, I think.

There is no data for 2001, btw.

Posted: April 19 07, 1:18 am
by Asmodai
So you're calculating standard deviation for Pujols' career LD% and then using that to say he's consistent? By sheer mathematics, IE the formula used to find standard deviation, we would expect 2/3 of his seasons to fall within 1 standard deviation on either side. It's that way for a couple of reasons. One you mathematically made it that way, and LD% is clearly a normal distribution.

You didn't prove or show anything. And you don't have to spell things out to me, odds are you're not going to mention anything I don't know and can't understand.

Anyways, back to the data. I didn't realize fangraphs had LD% back to 2002. Unfortunately right now, all I have loaded in my spreadsheet is GB/FB, LD%, GB%. FB%, IFFB%, HR/FB, IFH, IFH%, BUH and BUH%. BABIP was on another page. I'll add that and others tomorrow afternoon or in the class I TA.

Adding 2002 and 2003 brings up our total sample size to 791 seasons. Now for just some random easy data:

StDev: 2.78%
Mean: 20.9%
Max: 30.7% (Brian Roberts, 2003)
Min: 13.3% (Rocco Baldelli, 2004)

I should point out that the third and fourth highest belong to Todd Helton. It's my opinion that his (and all Colorado) figures are skewed for the obvious reasons. It would probably be a good idea to exempt all Rockies data, but I'll leave it in for now.

In theory, because it's (well should be) a normal distribution with the standard deviation and mean we would expect 68.3% of the data to fall within one stdev, 95.4% to fall within two standard deviations, and 99.7% to fall within three standard deviations. I'm going to post the exact percentages from the data, though we're just going through motions here:

1σ (18.2%-23.7%) = 539/791 = 68.1%
2σ (15.4%-26.5%) = 760/791 = 96.1%
3σ (12.6%-29.3%) = 789/791 = 99.7%

Magic, they fit almost perfectly, with rounding causing a little bit of error. Gotta love the rigorous perfection of some things in mathematics. Anyways, this was expected. It doesn't show us anything. But to further illustrate it, here's the distribution plotted with groups of .5%. LD% is on the bottom, percentage on the side. It's pretty much a histogram thouigh it's not in bar format. It's not perfect, but close.
Image
Now, to find the same seasons back-to-back again. This time there are 419 such sets of datapoints. It's a pretty large sample.
Image
The results of the data found earlier were pretty much re-affirmed, but that's one poor r-squared. Using the same analogy I said earlier it's like saying that if Yadi hits .220 and Albert Pujols hits .330, Yadi is just as likely as AP to win the batting title the next season.

Next up, you're probably saying there are probably a few outliers that stay near the top each season. This is going to be true, but we can explain some of it. Now, you've seen the binomial distribution at some point, I hope. If you haven't, that's fine, just go with the answer I spit out on faith. I'm not going to put it on steroids. We luckilly have 42 players that qualified in all five seasons, a fairly large sample. Based on the properties of a standard deviation, we would expect 15.8% of data points to be over one standard deviation above the mean (23.7%+). Using probabilities we would expect a 0.0623% probability of having a player being above 23.7% in all five seasons. That's about 1 in 10,000.

There's none who fit the criteria. Using the binomial distribution we would expect .27% of hitters to be able to go over it at least four of the five seasons, about one in 300. Of those 42, 2 of them did: Michael Young and Bobby Abreu. Young might have an advantage hitting in Texas, but Abreu doesn't seem to have an advantage I can think of, other than being left-handed. Continuing, we would expect 3.1% to have it in 3 of the seasons. Four more guys reach this: Jason Kendall, Manny Ramirez, Todd Helton and Mark Kotsay (wtf?). Continuing again, we would expect another 14.9% to have a couple of those seasons. Only one guys hit it here, Derek Jeter.

I would assume that there are "linedrive" hitters but they're rarer than most people think. I wouldn't include Derek Jeter, 2 of his seasons were below the average LD% overall. The guys that were able to do it 3 or 4 times in five years are the only guys that I would call line drive hitters, and they probably represent less than five percent of all major league starters. It's a rare "gift" I guess.

Now let's look at the flip side. Are there hitters that are worse than usually hitting them? Probably. But there's a selection bias (there was in the first part too), only the elite hitters are good enough to play five consecutive seasons. The ones that are poor, wash out. So it's a skill, but very very few do it well. Of these hitters we would expect the same probabilities as the others (just the flip side of the distribution). There was no one who did it four or five times. There were three guys who were one standard deviation or more worse three times. They are: Shawn Green, Luis Castillo, and Alex Rodriguez. Shawn Green seems odd, didn't expect to see him there. Luis Castillo is no shock. On the flip side of LD% there is GB%. Luis Castillo is uber elite in GB%. Being fast he just whacks the ball on the ground and runs like the wind. Alex Rodriguez is a real shock to me. My guess is that a lot of his hits are majestic home runs that aren't classified as line drives.

So there you have it:

Linedrive Hitters:
Michael Young
Manny Ramirez
Bobby Abreu
Jason Kendall
Mark Kotsay
Todd Helton*

Anti-linedrive Hitters:
Shawn Green
Alex Rodriguez
Luis Castillo
Jacque Jones

Pretty much every other hitter is non-denominational. Todd Helton and Coors Field have gotten astericked. Young, Kendall, and Manny shouldn't shock anyone. If I had to predict who would be line drive hitters they probably would have been on my list. Jeter would too. If I had to guess the other side I would have said Castillo, Pierre, Ichiro and other burners.

Pierre was pretty much in the middle each season. Ichiro was much of the same. Looking case by case I am going to add Jacque Jones to the none ones. Despite not reaching any rough criteria he had two seasons where he was two stdevs worse than league average and the other three were over half a standard deviation away.

For the interested, Albert Pujols wasn't even close to being that good. I guess there's other ways to classify who's a LD hitter than this. Maybe summing the Z scores and seeing if they average +1 stdev. Doing that just gives us Abreu, Young and Helton. Maybe that's a little too high. +.75 would just throw back in Mark Kotsay and Jason Kendall. We're not getting anywhere.

I guess a general LD hitter would maybe average .50+. If it was completely normal, just .3% would be expected to average that. Doing that cutoff would thrown in Edgar Alfonzo and Edgar Renteria. I'll buy those.

So yes it appears there are a few line drive hitters. It's a very rare trait though. If we're using the .5 average, then here's the final list

Linedrive Hitters
Michael Young
Manny Ramirez
Bobby Abreu
Jason Kendall
Mark Kotsay
Todd Helton*
Edgar Alfonzo
Edgar Renteria

Not So Linedrive Hitters
Shawn Green
Orlando Cabrera
Luis Castillo
Jacque Jones
Carlos Beltran {-0.09, -0.05, -1.92, -0.63, -1.49}
Andruw Jones
Alex Rodriguez
Adrian Beltre

Finally, there should be some five year control over LD% obviously, but year to year the non-existant correlation shows there's a lot of variability in the stat. I've never looked at BABIP for pitchers this way. I dunno if anyone has. Maybe I should...

Posted: April 19 07, 1:46 am
by thrill
Mephistopheles wrote:Gotta love the rigorous perfection of some things in mathematics.
NO I DON'T. MATH BLOWS.

Seriously though, remember the little people when you are some statistical advisor for a ML team.

Posted: April 19 07, 8:20 am
by Phyrkrakr
OK Meph, you've got me. It should be fairly obvious to the board by now that you're better at stats than most people. I'm not sure what I was looking at incorrectly in the data, but I'm pretty sure that I was in over my head.

So, Line Drive Hitter (TM) is just another commonly held assumption in the baseball community that doesn't hold water when analyzed statistically, like guys wit grit being better than the average ball player, or hitting in front of Albert Pujols makes you a better hitter.

Thanks for doing all of the leg work on this, it really is eye opening.

Posted: April 19 07, 9:02 am
by Asmodai
Phyrkrakr wrote:So, Line Drive Hitter (TM) is just another commonly held assumption in the baseball community that doesn't hold water when analyzed statistically, like guys wit grit being better than the average ball player, or hitting in front of Albert Pujols makes you a better hitter.
well it does hold water... michael young, bobby abreu and others are consistent elite line drive hitters

Posted: April 19 07, 9:19 am
by Hungary Jack
Fantastic work, Meph. Can I hire you?

Posted: April 19 07, 9:22 am
by Asmodai
Ill never work for a Cardinals fan. mwahaha.

Posted: April 19 07, 9:29 am
by haltz
I'm pretty curious about those numbers for pitchers. I'm not saying you should do all of that again, I'm just saying if you looked at it at all I'd be interested in the results. Anyway, that's fantastic stuff. I may or may not have a couple of questions, but I'll probably try and digest it a couple more times before asking, when I have more time. I'm sure we're all looking forward to that.

I really don't want to believe that it's that rare a repeatable skill. But I generally lose arguments when I ignore "facts."

Posted: April 19 07, 11:00 am
by Asmodai
I was sitting in the class I TA and started running through it. It's even harder finding pitchers who were healthy 162 innings for five years in a row so I've cut back to only four years. I also am only using 1995-2005 data because 1 i didnt have 2006 in my spreadsheet and 2 league average BABIP probably changes over time.

Posted: April 20 07, 11:22 am
by Asmodai
i didnt have a chance to do it last night, but probably later today. there are a couple pitchers who keep their babip low, but it was expected that their types would...