Methods for evaluating historical players
Updated: July, 2006
Because we have gone to great lengths to level the playing field, and because
you can use our Batting Register and Pitching Register reports to sort these
players on a wide variety of statistics, it might appear that the statistics on
this disk represent Diamond Mind's rankings of the greatest players in history.
In a way, that's exactly what they are. But we're not claiming that this is the
last word on the subject of ranking historical players, and we're not putting
this disk forth as our attempt to enter the debate about the best way to do
this. That debate has gone on forever and will continue to go on forever, and
we're quite content to let others carry on that debate.
Our goal was simply to provide you with another way to enjoy Diamond Mind
Baseball. We wanted to take advantage of the available statistics and make
reasonable adjustments for era and park effects. We've made every effort to do
that to the best of our ability, and if you feel that we have accomplished that
much, we're happy.
In fact, we do not believe there is a perfect way to compare players from one
era to another, even though a number of people have used different methods to
come up with rankings of the greatest players in baseball history, and some have
written entire books on the subject.
In this section, we'll point out the strengths and weaknesses of some of the
methods we're familiar with, including our own, and see if we can convince you
that there's no one right way to rank historical players.
Plus/minus versus percentages
Even if people agree that it's important to evaluate players relative to league
average, there's room for disagreement about how to do that.
One approach is to use plus/minus differences, as in "he hit five more homers
than the average player would have hit in the same number of plate appearances."
Another is to use percentages, as in "he hit 70% more homers per plate
appearance than the average player."
A third approach is to use standard deviations (more on this in a moment), as
in "he was 2.3 standard deviations above the mean."
We used the plus/minus approach, which has the advantage of avoiding gross
distortions when a player puts up big numbers in a category where the average is
In 1919, for example, Babe Ruth hit 29 homers in 532 plate appearances. That's
roughly 55 homers per 1000 PA in a league where the average player hit 6 per
1000 PA. In other words, Ruth's rate was about 900% of the league average.
A strict use of the percentage method would cause one to project Ruth for a
little over 240 homers per 1000 plate appearance if he played today, or over 150
homers in a 162-game season.
The plus/minus approach gives Ruth credit for hitting 49 more homers per 1000
plate appearances than the average hitter. If a season consists of 650 plate
appearances, that's about 32 more homers per season.
Today, the average player hits about 28 homers per 1000 PA, or about 18 in a
season of 650 PA. Using the plus/minus method, Ruth's 1919 season translates
into about 50 homers in today's environment. We're much more comfortable with
results like this than with a strict application of the percentage approach.
But the plus/minus approach is not without limitations. If a pitcher in the
dead-ball era allowed no homers in 1000 batters faced, and the average pitcher
allowed 6, that pitcher gets credit for preventing 6 homers. If you apply that
result to a season when the normal homerun rate was 24 per 1000 BF, you'd rate
that pitcher to allow 18 homers per 1000 BF.
Maybe that's the right answer, maybe it's not. The percentage method would rate
him to allow zero homers today, too, but that doesn't seem reasonable. But it's
not possible to say that 18 is definitely the right number, either.
Furthermore, a current-day pitcher can earn a homerun difference of -12 because
he's pitching in an environment when the norm for homers is in the twenties.
It's impossible for a pitcher from the dead-ball era to earn a homerun difference
that low. It is also much less likely for a dead-ball era pitcher to earn a
high homerun difference, and maybe that evens things out. But there's no way to
The bottom line is that we don't believe either approach is perfect. We prefer
the results we get using the plus/minus approach, and that approach is
consistent with how we've done all of our season disks, so we went with it.
Standard deviations and the quality of the player pool
One particularly interesting approach was described in great detail by Michael
Schell in his books "Baseball's All-time Best Hitters" and "Baseball's All-time
Best Sluggers". The first book is focused on batting average, while the second
covers all aspects of batting performance.
Schell's method consists of four key elements:
(a) he didn't want to penalize players for late career declines, so he evaluated
all hitters on their first 8000 atbats,
(b) he adjusted for the ups and downs in league batting average by evaluating
all players relative to their leagues,
(c) he used standard deviations to adjust for the overall quality of play, and
(d) he adjusted for park effects.
We didn't want to penalize players for getting called up at a very young age, so
we chose to go with each player's best series of consecutive seasons, rather
than always starting at the beginning of his career.
Schell's goal was to rank the 100 best hitters, so he could afford to limit his
work to players with at least 8000 career atbats. We needed a full set of
statistics, not just batting average, so we used plate appearances instead of
atbats. We needed many more than 100 players, so we set the bar at 4000 plate
Like Schell, we adjusted for park effects and normalized against the league
The use of standard deviations is one of the more interesting topics for
discussion. Standard deviation is a measure of the extent to which a set of
data points is clustered around the average versus being spread out. The
greater the spread, the higher the standard deviation.
Schell argued that if the overall level of talent in a league is low, the good
players are able to dominate the weaker ones to a greater extent, and the spread
in batting average from top to bottom is greater. As the overall level of
quality improves, it becomes harder for one player to separate himself from the
pack, so the spread decreases.
In other words, if you measure the standard deviation of batting average, you
can use it as a measure of the quality of that league. High values indicate low
quality and vice versa.
Schell's book includes charts showing the changes in standard deviation over
time for both leagues. The standard deviations were much higher in the early
part of the 20th century but have settled down since. The implication is that
the quality of baseball was much lower in the earlier years, so it was easier
for players like Ty Cobb and Rogers Hornsby to dominate their leagues than it
later was for players like Tony Gwynn and Alex Rodriguez.
That makes intuitive sense, and Schell put a lot of weight on it when evaluating
his list of hitters. This is the adjustment that propelled Tony Gwynn to the
top of his rankings and relegated Ty Cobb to number two.
We gave serious thought to using Schell's method for this project, but we were
not convinced it was the right way to go.
Schell demonstrates that the batting average standard deviation has shown no
upward or downward trend since the 1930s. It was higher in the first part of
the 20th century, dropped steadily until the 1930s, and has drifted sideways
Since the 1930s, though, there have been a lot of year-to-year fluctuations.
And those fluctuations don't seem to fit the theory that standard deviation is a
good measure of the level of talent.
We know the level of talent went down during World War II, but the standard
deviation in the AL dropped in 1943, 1944 and 1945 before rising sharply in 1946
and 1947. That's exactly the opposite of what this theory would predict. In
the NL during those years, the direction of the changes was more consistent
with the theory, but the magnitude of the changes was very small.
The theory predicts that standard deviations should rise in expansion years, but
this has not been the case. In the AL, it was below the long-term average when
the league expanded in 1961 and only slightly above average in the expansions of
1969 and 1977. In the NL, it was below average in 1962 and 1993. The 1969 NL
was well above average, as expected, but overall, these values don't lend a lot
of credibility to the idea.
In another curious shift, from the early 1960s to the early 1970s, the standard
deviations in the NL rose to levels that had not been seen in sixty years. We
can't think of a reason related to player quality that explains this pattern.
While the theory makes sense to us, we just don't see enough consistency in the
data to feel comfortable using standard deviations as the basis for our rating
system. With so many unexplained fluctuations, we could end up over-rating and
under-rating players just because the standard deviations happened to go one
way or the other during their peak years.
To get another angle on the quality of play question, we designed and carried
out our own study. We identified all players who earned a significant amount of
playing time in consecutive seasons, and then we looked at how their stats
changed from the first year to the second year.
The theory behind this study is that an expansion year introduces a significant
number of new players into a league. The returning players are presumably of a
higher quality than the new players, so the returning players should see their
stats improve because they're now getting some of their atbats against
Looking at all years since 1901, not just expansion years, we noticed that
returning players tend to perform at a slightly lower level in the second year,
though the drop from one year to the next was only a few points of OPS.
This tendency for returning players to decline may indicate a general rise in
the quality of play or a selection bias. Our sample included only those
players who met a minimum playing time threshold in both seasons, and it's
possible that players in the decline phase of their career were more likely to
qualify than were the younger, better players who were about to take their jobs.
Schell's standard deviation work suggests that the quality of play was
noticeably lower in those early years, improved rapidly until the 1930s, and
then settled down to a steady state. If that was true, we would have expected
to see a more rapid rate of decline among returning players in those early
But the rate of decline of the returning players was no different in the early
years than at other times. That's consistent with the idea that the quality of
the player pool has been improving, slowly but steadily, from day one.
We did see plenty of evidence of a decline in quality in the war years and
expansion years. In every case, the returning players improved. This supports
the idea that they now had the opportunity to beat up on the weaker players who
had come into the league in that second season, enough to overcome the normal
rate of decline.
Those expansion effects were relatively small, however, and didn't last very
long. As a result, they weren't enough to make an impact on player ratings that
were based on seven or eight seasons of playing time, and we chose not to make
any expansion year adjustments.
We did adjust for the World War II years of 1943 to 1945. The change in
performance was more striking for those three seasons, and it immediately
reversed itself when the players came back in 1946. Even this adjustment didn't
make a big difference for the players who were affected, because these players
were also being rated on several other seasons in which they faced the best
players of the day.
And when we chose to include all years back to 1876 for the 2006 update to this
disk, we began discounting the early years of professional baseball to account
for the lower quality of play during that era. Without these adjustments, too
many of the top players from the 1880s ranked at or near the top at their
The standard deviation approach provides some evidence supporting the plausible
notion that it was easier to dominate a 1906 league than a 1996 league, but it
doesn't support the idea that the quality of play declined in WWII and in the
The returning players study suggests that the quality of play has been improving
slowly since the beginning and continues to do so. And it does support the idea
that quality is diluted when war or expansion introduces a lot of new players at
The bottom line is that we chose not to make a timeline adjustment like the one
Schell made. We reached that conclusion for several reasons.
First, it is difficult to decide how to quantify the rate of improvement and
over what period of time to apply it. The standard deviation work and the
returning players work suggest different patterns of improvement, and neither
provided us with a clear result that we felt comfortable with.
Second, we believe that anyone who believes they can compare 1906 and 1996 stats
with mathematical precision is fooling themselves. No matter how much work we
do with differences, percentages, or standard deviations, we're never going to
know what numbers Ty Cobb would have put up in today's game.
Maybe Cobb's career stats would translate in a direct way, or maybe he would
take one look at today's smaller parks and change his approach, trading off
thirty points of batting average for another fifteen homers. We just don't
Third, if we assume that the quality of play has indeed been improving steadily
and continues to do so, today's players must be far better than those of a
hundred years ago. That argument makes sense when you consider the substantial
and measurable improvements that have been made in other athletic endeavors like
track and field. Modern athletes are stronger, faster, better conditioned,
better fed, and have access to better medicine than their counterparts from
a century ago.
If we really believe that, and if we follow through on that belief by adding
a timeline adjustment for all players, the heroes from the early 1900s
wouldn't look so good. We could end up turning Ty Cobb into Lenny Dykstra
and Babe Ruth into Bobby Abreu. And we don't think that would make for a
very interesting disk.
Finally, when we looked at the results, we didn't see a compelling need to
discount those early performances. It's not as if those players wound up
dominating our rankings. In fact, we found that all eras were well represented
at the top of our leader boards, suggesting that there are no serious era-based
biases in our method.
It's true that several of the top batting averages on this disk belong to
players from the dead-ball era, but that doesn't mean those players are
overrated. Players from other eras have greater overall value because they
supplement their batting averages with more power, higher walk rates, or both.
Using our approach, Ty Cobb looks like the Ty Cobb we've all read about. He's
got the best projected batting average on the disk, good doubles and triples
power, and he can run. He's not the best player on the disk, but he is in the
top ten. We can live with that. In our view, that's much better than the
At the risk of repeating ourselves, we're not claiming that this is the one
right answer, and we're not at all sure there is one right way that a lot
of people can agree on. Time and further study may change our thinking about
standard deviations and some of the other decisions we've made.
For now, however, we're very happy with the way things turned out, and we hope
you are, too.