Page 2 of 4

Posted: April 18 07, 10:09 am
by Bo Hart
Isn't a "line drive" arbitrary? How do you determine what is a line drive and what isn't? Like how is it technically defined when finding the LD percentage?

Posted: April 18 07, 10:42 am
by skmsw
BABIP = batting average on balls in play
A "ball in play" is a fair ball that when first hit has the opportunity to become either a hit or an out -- that is, no strikeouts, no walks, no home runs, no HBPs. Across the major leagues, the "average" BABIP is close to .300, but interpreting variances from this "average" must be done very cautiously -- different hitters have very different BABIPs that are normal for them.

LD = percentage of balls put in play that are hit for line drives
There is an obvious subjective element to this, and a slight inaccuracy; "line drives" are determined by spotters who watch games and enter in play by play data for the different agencies who track them. So what one spotter considers a line drive, another might consider a fly out (subjectivity). And a softly-hit liner to the shortstop counts as a line drive but has the properties of a pop-out (inaccurate). Including vector data (speed, trajectory, distance, direction) decreases the subjectivity quite a bit and allows for more penetrating analysis, but is not widely available. In the major leagues, 18-21% of a good hitter's balls in play are typically line drives.

In general -- very general --

10% of fly balls in play are hits
33% of ground balls are hits
60-70% of line drives are hits

Posted: April 18 07, 10:46 am
by greenback44
Kyle wrote:Isn't a "line drive" arbitrary? How do you determine what is a line drive and what isn't? Like how is it technically defined when finding the LD percentage?
STATS and BIS (and others) define line drives on a case-by-case basis. The arbitrariness is enough of a problem that one of em (I forget which) introduced a "fliner" category.

Posted: April 18 07, 10:56 am
by jim
greenback44 wrote:
Kyle wrote:Isn't a "line drive" arbitrary? How do you determine what is a line drive and what isn't? Like how is it technically defined when finding the LD percentage?
STATS and BIS (and others) define line drives on a case-by-case basis. The arbitrariness is enough of a problem that one of em (I forget which) introduced a "fliner" category.
"fliner" is a John DeWan invention, so that would be STATS.

Posted: April 18 07, 11:45 am
by Hungary Jack
skmsw wrote:LD = percentage of balls put in play that are hit for line drives
There is an obvious subjective element to this, and a slight inaccuracy; "line drives" are determined by spotters who watch games and enter in play by play data for the different agencies who track them. So what one spotter considers a line drive, another might consider a fly out (subjectivity). And a softly-hit liner to the shortstop counts as a line drive but has the properties of a pop-out (inaccurate). Including vector data (speed, trajectory, distance, direction) decreases the subjectivity quite a bit and allows for more penetrating analysis, but is not widely available. In the major leagues, 18-21% of a good hitter's balls in play are typically line drives.
This is the key. I would imagine that there is a set of correlations between distance traveled and duration of flight that distinguishes line drives from fly outs, popups, fliners, etc. quite definitively.

Posted: April 18 07, 3:35 pm
by Asmodai
Hungary Jack wrote:It would be cool if some of our stat gurus could plot BABIP vs. LD% and determine correlation and R-squared.
Will do. This first graph has hitters LD% on the x-axis and hitters BABIP on the y-axis. Each data point represents one player-season from 2004, 2005 or 2006 who qualified for the batting title. This gave us 443 data points.
Image
There's a general correlation. It's likely that speed is another factor, and park as well as GB rate. This graph is using the same sample of guys who had consecutive seasons with the cuttoff. This is year 1 LD% on the x-axis and year 2 on the y. Obviously there are only three years so 2005 showed up a lot. There were 178 such sets of players.
Image
I cannot stress how important this graph is. It's saying that there's no year-to-year correlation for a players ability to hit linedrives. It's similar to saying that if Albert Pujols hits .330 one year and Yadier Molina hits .230 that the next season Molina is just as likely as Pujols to win the batting title. While a high LD% usually leads to more hits, a high LD% doesn't appear to be much of a consistent skill for a hitter.

If you do pitchers you're going to get similar results. You'll have a little bit of correlation for BABIP vs LD% in any given season, but you'll have essentially no correlation in BABIP or LD% from year to year which is vital from a projection standpoint. That's why HR power/groundball rates, even transient speed for a hitter, strike out rates, and walk rates are so vital for projections for hitters and pitchers. They're less likely to change season to season, although generally you lose footspeed as you age causing a higher expected BABIP as you beat out less GBs.

Posted: April 18 07, 3:41 pm
by greenback44
I've been wondering about the value of LD%. Oh, well.

FoxSports (one word!) bought out Dewan at STATS. He's at BIS now.

Posted: April 18 07, 4:14 pm
by Hungary Jack
Mephistopheles wrote:
Hungary Jack wrote:It would be cool if some of our stat gurus could plot BABIP vs. LD% and determine correlation and R-squared.
Will do. This first graph has hitters LD% on the x-axis and hitters BABIP on the y-axis. Each data point represents one player-season from 2004, 2005 or 2006 who qualified for the batting title. This gave us 443 data points.
Image
There's a general correlation. It's likely that speed is another factor, and park as well as GB rate. This graph is using the same sample of guys who had consecutive seasons with the cuttoff. This is year 1 LD% on the x-axis and year 2 on the y. Obviously there are only three years so 2005 showed up a lot. There were 178 such sets of players.
Image
I cannot stress how important this graph is. It's saying that there's no year-to-year correlation for a players ability to hit linedrives. It's similar to saying that if Albert Pujols hits .330 one year and Yadier Molina hits .230 that the next season Molina is just as likely as Pujols to win the batting title. While a high LD% usually leads to more hits, a high LD% doesn't appear to be much of a consistent skill for a hitter.

If you do pitchers you're going to get similar results. You'll have a little bit of correlation for BABIP vs LD% in any given season, but you'll have essentially no correlation in BABIP or LD% from year to year which is vital from a projection standpoint. That's why HR power/groundball rates, even transient speed for a hitter, strike out rates, and walk rates are so vital for projections for hitters and pitchers. They're less likely to change season to season, although generally you lose footspeed as you age causing a higher expected BABIP as you beat out less GBs.
Great stuff. Thank you, thank you, thank you.

I was suspecting/hoping that the R-squared value would be higher, but it makes sense given that graph 2 essentially establishes that there is no such thing as a "line drive hitter", and that BABIP can vary significantly from year to year.

Posted: April 18 07, 5:07 pm
by Phyrkrakr
So, are the graphs above saying that there aren't any real line drive hitters in the league, or that line drive percentage doesn't really correspond to BABIP?

I appreciate the work that went into those graphs. Excellent work.

Posted: April 18 07, 5:12 pm
by Asmodai
Phyrkrakr wrote:So, are the graphs above saying that there aren't any real line drive hitters in the league, or that line drive percentage doesn't really correspond to BABIP?
The first one is suggesting that line drive percentage slighly corresponds to BABIP for a hitter. The second one suggests there aren't any consistent line drive hitters in the league.

Ironically enough I'd expect LD% and BABIP to correlate higher for pitchers, just as is factors out speed to a certain extent.