( Singles * 3 +
Doubles * 5 +
Triples * 7 +
Home Runs * 9 +
Walks * 2 +
Stolen Bases * 1 +
Outs * -0.61 ) * .16 = Estimated Runs Produced
My favorite formula for estimating the number of runs produced by a batter is not Bill James' Runs Created, but Paul Johnson's little-known Estimated Runs Produced.
ERP is my favorite because
Paul Johnson's Estimated Runs Produced was published in the 1985 Bill James Baseball Abstract. Bill James himself in his afterword of Paul Johnson's essay acknowledged that ERP's accuracy rivaled that of his own Runs Created formula, and acknowledged that Runs Created had problems with players of both high OBP and high SLG.
To determine if a runs produced formula is accurate, you can run the formula on an actual team's stats, and see if the result is the number of runs that the team scored. Bill James compared Estimated Runs Produced to Runs Created on 100 teams from 1955 to 1975 and found that while Runs Created was more accurate for 56 of the 100 teams, Estimated Runs Produced had a smaller gross error (18.4 runs per team vs 19.3 runs per team). ERP also was slightly better than RC in 1983 and 1984.
To determine if a runs produced formula is accurate for superstar players, you can run the formula on stats added up from high-scoring games. Paul Johnson added up 14 World Series teams' games whose stats added up to stats similar to Babe Ruth's. There were 125 runs scored by those teams in those games -- the ERP formula gives 129 runs (3% error), and RC gives 149 (19% error).
ERP assigns negative values to outs, making it possible for a bad hitter's stats to work out to "negative" run production. That might seem to be a glitch in the formula because an entire lineup of such hitters can't score negative runs. Clay Davenport's Equivalent Runs also can give negative values. Davenport has pointed out that negative values make sense in a team context. Adding outs to the same number of hits spreads out the offense, making it more likely that baserunners will be stranded, i.e. that fewer runs will score. With the ERP formula, a hitter who hits .168 or worse with no walks or extra-base hits would be considered to have produced "negative" runs. And it's probably true that such a hitter is taking more away from his teammates than he is adding himself, assuming his teammates aren't nearly as bad as himself.
At one time I thought ERP didn't handle Caught Stealing correctly. It seems like Caught Stealing should count as more negative than other outs, because it negates a lot of the value of the single or walk that got the player on base. But I did some experiments and I found more heavily weighting Caught Stealing outs (while making compensatory changes to the weights of other outs) made the formula less accurate. So perhaps it is not the case that Caught Stealing outs are worse than other outs on average; for example, striking out with the bases loaded to end an inning is more damaging than the typical Caught Stealing. Also, when you put ERP in a per game (i.e. per 27 out) context, Caught Stealing reduces the result because it adds to the denominator (like all outs), so a Caught Stealing has a bigger negative impact on stats such as R27 (ERP per 27 outs) and Equivalent Average than just its negative impact on ERP (the numerator).
Above I gave the variation on ERP that I use most often. Here are the actual variations given by Johnson:
ERP = (2*(TB+BB+HB)+H+SB-(.605*(AB+CS+GIDP-H)))*.16 ERP = (2*(TB+BB) +H+SB-(.615*(AB -H)))*.16 ERP = (2*(TB+BB) +H+SB-(.610*(AB+(SB/4) -H)))*.16
Note: TB = Total Bases (H + D + 2*T + 3*HR), BB = Walks, HB = Hit by pitch, H = Hits, SB = Stolen Bases, AB = At Bats, CS = Caught Stealing, GIDP = Grounded Into Double Plays, D = Doubles, T = Triples, HR = Home Runs.
ERP is great for working out a player's contribution during a game. You just assign 3 to a single, 5 to a double, etc. (as in the chart above), subtract -0.6 for each out, and divide by 6. If you are wondering if a team is underachieving or getting lucky against a pitcher, you can add up its ERP to see if it's higher or lower than the teams's actual runs scored.
Johnson says in his essay he developed ERP by charting how many bases batters and runners advanced on various offensive plays, e.g. he found that a home run typically moved batters and runners 3 times as many bases as singles, hence the formula gives 3 times the weight to home runs than singles. For the two fractional numbers, he says he just determined them by experiment.
I was surprised that despite its accuracy and simplicity, Bill James still used his own Runs Created formula in his future books, and as far as I can tell never mentioned ERP again. ERP is mentioned much less often in newsgroup discussions than Runs Created or Linear Weights.
In a previous version of this article, I wrote that Bill James showed that the Linear Weights formula was inaccurate. I received a message from Jim Furtado pointing out how similar ERP's weights were to those in the Linear Weights formula. This seemingly contradicted the finding that the Linear Weights formula was inaccurate. On reviewing I found out that there are (at least) two different Linear Weights formulas out there.
In his Historical Baseball Abstract (1985), Bill James criticized Pete Palmer's Linear Weights formula (page 446) which he quotes from Chapter 4 of Palmer's 1984 book, "The Hidden Game":
.46S+.80D+1.02T+1.4HR+.33(BB+HB)+.3SB-.6CS-.25(AB-H)-.5(OOB)
(S is Singles, OOB is Outs On Base). James went on to show this formula was not very accurate. Perhaps Palmer took the criticism seriously, because he has a different Linear Weights formula in "Total Baseball, 5th edition" (page 571):
.47S+.78D+1.09T+1.4HR+.33(BB+HB)+.3SB-.6CS-.25(AB-H)-.5(OOB)
Those little changes apparently improve the accuracy. I only tried it on one team, (the '83 Padres whose stats were listed in James' book), and these changes did move the result 9 runs closer to the actual number of runs scored. The actual ERP formula is closer still to the actual number of runs scored.
Furtado went on to fine tune the weights used by ERP and called the result "Extrapolated Runs". In a post to rec.sport.baseball, he gave two variations:
xRun = (1B x .51) + (2B x .8) + (3B x 1.14) + (HR x 1.46) +
((BB-IBB+HBP) x .33) + ((IBB+SB) x .18) +
((SH+SF) x .21) + ((CS+GDP) x -.17) + (Outs x -.10)
Outs = (AB-H+SF+SH+CS+GDP)
xRunK = xRun using different values for
Outs (.097) and Strikeouts (.11)
He finds these weights are more accurate for teams from 1955-1997 than ERP or other runs produced formulas. It will be interesting to see if it remains more accurate in future years, on data which didn't exist when the formula was developed. For the time being, I will continue to use ERP because I often don't have the minor stats such as GDP, and because I haven't yet independently verified the xRun formula's accuracy.
There is an oft-cited myth on the net that "offense is not linear", hence a linear formula can't be accurate. Bill James stated on page 451 of his Historical Abstract that "Linear weights cannot possibly evaluate offense for the simplest reason: Offense is not linear." Presumably he wrote this before he published the ERP formula, a linear formula whose accuracy he acknowledged as noted above.
For the record, here are Bill James' 3 Runs Created formulas, from least to most accurate:
RC = (H+BB)*(TB)/(AB+BB) RC = (H+BB-CS)*(TB+.55*SB)/(AB+BB) RC = (H+BB+HB-CS-GIDP)*(TB+.26*(BB-IBB+HB)+.52*(SB+SF+SH))/(AB+BB+HB+SF+SH)
Note: the abbreviations not mentioned below ERP are IBB (Intentional Walks), SF (Sacrifice Flies), SH (Sacrifice Bunts).
You will see the ERP formula applied in many of my Jays' articles.
Jim Furtado has posted Paul Johnson's original ERP essay on his Baseball Think Factory web site.
Access count for this page:
Last Major Update: 1998 Aug 14
Comments are welcome at comments@stephent.com.