Page 4 of 4

Posted: April 20 07, 7:19 pm
by Asmodai
Phyrkrakr wrote:OK Meph, you've got me. It should be fairly obvious to the board by now that you're better at stats than most people. I'm not sure what I was looking at incorrectly in the data, but I'm pretty sure that I was in over my head.
Okay, I'm probably not going to help any but here goes nothing. What happens when you take find the standard deviation of any data sampling is it fits the standard deviation so that about 2/3 of the data points are within 1 standard deviation of the mean and 95% are within two. So when you do it with Pujols' five seasons of data, of course three of them will be really close with one outlier season. It doesn't mean he's consistent or not, it's just a byproduct of the method you used. Statistics are a very valuable tool, but if you use the method incorrectly it's becomes an atom bomb.

Anyways, now for the BABIP stuff for pitchers. Getting some stuff out of the way, I'm only using data from 1995-2005. I don't know how much of a factor league environment affects BABIP, so I errored on the safe side and decided to only use seasons from the same era. With a minimum of 162 innings pitched we found 880 pitcher seasons. I also used estimated BABIP, not exact. I didn't have the AB for hitters, but this is probably better because I think we should include things like sac flys. I used (H-HR)/(BF-BB-HBP-K-HR). Some stuff from the samplet:

Mean: .290
StDev: .0228
Min: .232 (Damian Moss, 2002)
Max: .353 (Aaron Sele, 1999)

I hope you did not pick Moss in your 2003 fantasy draft. I broke up the players into groups of .005 and plotted a frequency graph, it's pretty normal with a few differences here and there.
Image
Next I found how many consecutive seasons from the same player we could find. There was a sample of 475, with everyone who had a BABIP under .240 was between .285 and .295 the next season. I wasn't expecting much when I saw that. Here's the graph:
Image
As you can see there's even less correlation than there was with LD% for hitters. There's probably more correlation (overall) than expected because in consecutive seasons pitchers usually pitch for the same team in the same park with a similar defense behind them. Obviously there shouldbe some correlation because pitchers certainly have control over GB% and FB% to a large extent. Also, I've never run through this myself, but in the DIPS formula there's basically a if, then statement that helps a pitcher if they're left-handed or a knuckleballer, so some of those pitchers might be able to consistently post a lower BABIP.

I did not want to use five consecutive seasons like I did for hitters. There simply aren't many pitchers who stay healthy five years in a row. I decided to use four and found a sample of 152 sets of four seasons, and some players (and seasons) are repeats. As with the LD% sample, there's a selection bias here as well. You already have to be good to survive four seasons. Before getting into the numbers, I decided to graph the progression of the pitchers. Warning: It's a big graph so you can see it, heh.
Image
I've taken the liberty to make the trend lines for two pitchers bold, so you can see them. Alright, if you remember from the LD% post, 15.8% of the data lies over one standard deviation below the mean. Using this, the probability of a pitcher having four consecutive seasons bettering a .269 BABIP is about 0.06%, yeah that's low. If a pitcher does this, then odds are he's good at preventing a high BABIP (or his defense is). Here's where we reach a bit of an impasse. Jamie Moyer did it (the black bold line), but the rest of the career he didn't, so who knows. The fact he did it four consecutive seasons either man A) he really did well and could prevent it, or B) his defense was great. He's lefthanded for that stretch it appears he was in the zone enough or good enough to do it. The probability of a doing it three times out of four is 1.13% and our sample size is 152, so we expect 1.71 pitchers to have done this, just because of the probabilities. Ironically enough, just one did it. Barry Zito {-2.16,-2.41,0.09,-2.23}. Given that Zito's three 1+ probabilities are actually 2+, where the probability of doing that three times is a whopping 0.004%, he's probably got the skill OR he's an extreme popup pitcher who was aided to a certain extent my his home stadium's mammoth foul territory. It's probably a bit of both. Given that both guys are left-handed finesse guys with a standout offspeed stuff, it's likely there's a bit control for them. However, it does not appear to be something sustainable for a career. Obviously they both could NOT be able to, they just got really really really really lucky. Then again I can't tell you, the Pope doesn't know, Jesus Christ doesn't know.

The probability of doing it twice is 10.62%. So we would expect just over 16 guys to do it. We come up with nineteen, but a lot of them are repeats who had two seasons of doing it, and they came up in two of their sets of 4 because they pitched five or six seasons in a row with 162 innings. Given that it's close to the number of guys we expected, it's hard to say any of these guys are special.

The last thing I want to look at is seasons over a half standard deviation below. The probability of doing this 30.85%. The probability of doing it four times in a row is 0.91%, so we expect about one or two guys of doing it. Two guys did do it, Jon Garland and Jamie Moyer. Garland's set was 2002-05 {-0.70,-1.08,-0.86,-1.25}. Given we expected 1.38 guys to do it, and two did it. Moyer has been in everything, so we might be able to label him. Garland in 2006 was all the way up to .313, so I don't know. Perhaps he was just the lucky guy for the four years. The other three seasons in his career were nothing special.

So what abou the flip side, are there any bad guys at it? Nobody was at +1σ four years in a row. Nobody was at +2σ three years out of four. There were 18 sets who did it 2 out of 4, but the expected was about 16, and some of those were repeats. Nothing special here. We had three guys who were over .5σ four years in a row. The expected was 1.38. They were Aaron Sele in the Kingdome {0.55,2.07,3.06,0.61}. The turf might have something to do with it? The other two were Andy Pettitte who did it five consecutive seasons. Andy Pettitte appears he may be bad on the other side....or he just had Derek Jeter behind him. Every season but two in Pettitte's career have been over .300, both of those were two of his three in Houston, going from Jeter to Adam Everett. That's probably most of the difference, but his BABIP last season in Houston was .333. It'll be interesting this season to see how his BABIP plays out. Other than him there's not much bad guys, but remember the selection bias.

All pitchers probably do have some control over BABIP, but once they show they can sustain an average BABIP in the majors, there appears to be little consistency of beating league average BABIP. There also appears little consistency of being worse than league average BABIP. However when a player breaks into the majors there is probably a lot different results. That's something I would like to look at later. Looking say at minor leagues in AA, and AAA before majors. Granted an issue here is going to be sample size. If a AAA pitcher sucks with a .400 BABIP in the majors, odds are he's not lasting 50 innings.

Posted: April 21 07, 7:21 am
by jim
Great work.

I am always uneasy reading this stuff with pitchers and hitters seemingly unable to control their destiny, but clearly they can. These studies, as absolutely thorough as they are, using actual events that really happened, don't pass the smell test. Is it possible that with the newer methods to track actual trajectory/speed of ball etc... that we will get an insight into what is going on? In other words, almost all players seem to hit LD at about the same rate, but all LD's aren't created equal. A Pujols LD is hit harder than a Molina LD.

Really fascinating, even though it flies in the face of everything I ever thought I knew about the game.

Posted: April 21 07, 9:06 am
by greenback44
jim wrote:Really fascinating, even though it flies in the face of everything I ever thought I knew about the game.
I feel compelled to reiterate something Mephisto said. There is a selection bias here. It could be what this says is that there is some sort of maximum LD% and to be a major league regular you must be pretty close to that maximum.

I believe MGL is actually doing something like you suggest with speed/distance of the paths of struck balls, but unfortunately that data set costs money. The next step after that is quality of pitches; if Brad Lidge hangs a slider, then a major league hitter really should smack it. Then there's the dynamic aspect, which is ultimately hopeless, since you'll have the butterfly effect to contend with. I've convinced myself that the reason Rolen had his big World Series (even if he isn't as a good story as Eckstein, he should've been MVP) was that the Tigers were trying to pitch to him based on a scouting report that was a week out of date.

Posted: April 21 07, 11:28 am
by Asmodai
jim wrote:Great work.

I am always uneasy reading this stuff with pitchers and hitters seemingly unable to control their destiny, but clearly they can. These studies, as absolutely thorough as they are, using actual events that really happened, don't pass the smell test. Is it possible that with the newer methods to track actual trajectory/speed of ball etc... that we will get an insight into what is going on? In other words, almost all players seem to hit LD at about the same rate, but all LD's aren't created equal. A Pujols LD is hit harder than a Molina LD.

Really fascinating, even though it flies in the face of everything I ever thought I knew about the game.
yes, it would certainly be something to do if you had the data, but as the above poster said it costs a boatload. the fielding data mgl uses is like 10 grand a season.