Methods for evaluating historical players
                -----------------------------------------

Updated: July, 2006


Because we have gone to great lengths to level the playing field, and because 
you can use our Batting Register and Pitching Register reports to sort these 
players on a wide variety of statistics, it might appear that the statistics on 
this disk represent Diamond Mind's rankings of the greatest players in history.

In a way, that's exactly what they are.  But we're not claiming that this is the 
last word on the subject of ranking historical players, and we're not putting 
this disk forth as our attempt to enter the debate about the best way to do 
this.  That debate has gone on forever and will continue to go on forever, and 
we're quite content to let others carry on that debate.

Our goal was simply to provide you with another way to enjoy Diamond Mind 
Baseball.  We wanted to take advantage of the available statistics and make 
reasonable adjustments for era and park effects.  We've made every effort to do 
that to the best of our ability, and if you feel that we have accomplished that 
much, we're happy.

In fact, we do not believe there is a perfect way to compare players from one 
era to another, even though a number of people have used different methods to 
come up with rankings of the greatest players in baseball history, and some have 
written entire books on the subject.

In this section, we'll point out the strengths and weaknesses of some of the 
methods we're familiar with, including our own, and see if we can convince you 
that there's no one right way to rank historical players.  


Plus/minus versus percentages
-----------------------------

Even if people agree that it's important to evaluate players relative to league 
average, there's room for disagreement about how to do that.  

One approach is to use plus/minus differences, as in "he hit five more homers 
than the average player would have hit in the same number of plate appearances."  

Another is to use percentages, as in "he hit 70% more homers per plate 
appearance than the average player."  

A third approach is to use standard deviations (more on this in a moment), as 
in "he was 2.3 standard deviations above the mean."

We used the plus/minus approach, which has the advantage of avoiding gross 
distortions when a player puts up big numbers in a category where the average is 
low.

In 1919, for example, Babe Ruth hit 29 homers in 532 plate appearances.  That's 
roughly 55 homers per 1000 PA in a league where the average player hit 6 per 
1000 PA.  In other words, Ruth's rate was about 900% of the league average.  

A strict use of the percentage method would cause one to project Ruth for a 
little over 240 homers per 1000 plate appearance if he played today, or over 150 
homers in a 162-game season.

The plus/minus approach gives Ruth credit for hitting 49 more homers per 1000 
plate appearances than the average hitter.  If a season consists of 650 plate 
appearances, that's about 32 more homers per season.  

Today, the average player hits about 28 homers per 1000 PA, or about 18 in a 
season of 650 PA.  Using the plus/minus method, Ruth's 1919 season translates 
into about 50 homers in today's environment.  We're much more comfortable with 
results like this than with a strict application of the percentage approach.  

But the plus/minus approach is not without limitations.  If a pitcher in the 
dead-ball era allowed no homers in 1000 batters faced, and the average pitcher 
allowed 6, that pitcher gets credit for preventing 6 homers.  If you apply that 
result to a season when the normal homerun rate was 24 per 1000 BF, you'd rate 
that pitcher to allow 18 homers per 1000 BF.  

Maybe that's the right answer, maybe it's not.  The percentage method would rate 
him to allow zero homers today, too, but that doesn't seem reasonable.  But it's 
not possible to say that 18 is definitely the right number, either.  

Furthermore, a current-day pitcher can earn a homerun difference of -12 because 
he's pitching in an environment when the norm for homers is in the twenties.  
It's impossible for a pitcher from the dead-ball era to earn a homerun difference 
that low.  It is also much less likely for a dead-ball era pitcher to earn a 
high homerun difference, and maybe that evens things out.  But there's no way to 
be sure.

The bottom line is that we don't believe either approach is perfect.  We prefer 
the results we get using the plus/minus approach, and that approach is 
consistent with how we've done all of our season disks, so we went with it. 


Standard deviations and the quality of the player pool
------------------------------------------------------

One particularly interesting approach was described in great detail by Michael 
Schell in his books "Baseball's All-time Best Hitters" and "Baseball's All-time
Best Sluggers".  The first book is focused on batting average, while the second 
covers all aspects of batting performance.

Schell's method consists of four key elements:

(a) he didn't want to penalize players for late career declines, so he evaluated 
all hitters on their first 8000 atbats,

(b) he adjusted for the ups and downs in league batting average by evaluating 
all players relative to their leagues,

(c) he used standard deviations to adjust for the overall quality of play, and

(d) he adjusted for park effects.

We didn't want to penalize players for getting called up at a very young age, so 
we chose to go with each player's best series of consecutive seasons, rather 
than always starting at the beginning of his career.  

Schell's goal was to rank the 100 best hitters, so he could afford to limit his 
work to players with at least 8000 career atbats.  We needed a full set of 
statistics, not just batting average, so we used plate appearances instead of 
atbats.  We needed many more than 100 players, so we set the bar at 4000 plate 
appearances.

Like Schell, we adjusted for park effects and normalized against the league 
averages.  

The use of standard deviations is one of the more interesting topics for 
discussion.  Standard deviation is a measure of the extent to which a set of 
data points is clustered around the average versus being spread out.  The 
greater the spread, the higher the standard deviation.

Schell argued that if the overall level of talent in a league is low, the good 
players are able to dominate the weaker ones to a greater extent, and the spread 
in batting average from top to bottom is greater.  As the overall level of 
quality improves, it becomes harder for one player to separate himself from the 
pack, so the spread decreases.

In other words, if you measure the standard deviation of batting average, you 
can use it as a measure of the quality of that league.  High values indicate low 
quality and vice versa.  

Schell's book includes charts showing the changes in standard deviation over 
time for both leagues.  The standard deviations were much higher in the early 
part of the 20th century but have settled down since.  The implication is that 
the quality of baseball was much lower in the earlier years, so it was easier 
for players like Ty Cobb and Rogers Hornsby to dominate their leagues than it 
later was for players like Tony Gwynn and Alex Rodriguez.

That makes intuitive sense, and Schell put a lot of weight on it when evaluating 
his list of hitters.  This is the adjustment that propelled Tony Gwynn to the 
top of his rankings and relegated Ty Cobb to number two.

We gave serious thought to using Schell's method for this project, but we were 
not convinced it was the right way to go.

Schell demonstrates that the batting average standard deviation has shown no 
upward or downward trend since the 1930s.  It was higher in the first part of 
the 20th century, dropped steadily until the 1930s, and has drifted sideways 
since then.

Since the 1930s, though, there have been a lot of year-to-year fluctuations.  
And those fluctuations don't seem to fit the theory that standard deviation is a 
good measure of the level of talent.

We know the level of talent went down during World War II, but the standard 
deviation in the AL dropped in 1943, 1944 and 1945 before rising sharply in 1946 
and 1947.  That's exactly the opposite of what this theory would predict.  In 
the NL during those years, the direction of the changes was more consistent 
with the theory, but the magnitude of the changes was very small.

The theory predicts that standard deviations should rise in expansion years, but 
this has not been the case.  In the AL, it was below the long-term average when 
the league expanded in 1961 and only slightly above average in the expansions of 
1969 and 1977.  In the NL, it was below average in 1962 and 1993.  The 1969 NL 
was well above average, as expected, but overall, these values don't lend a lot 
of credibility to the idea.

In another curious shift, from the early 1960s to the early 1970s, the standard 
deviations in the NL rose to levels that had not been seen in sixty years.  We 
can't think of a reason related to player quality that explains this pattern.

While the theory makes sense to us, we just don't see enough consistency in the 
data to feel comfortable using standard deviations as the basis for our rating 
system.  With so many unexplained fluctuations, we could end up over-rating and 
under-rating players just because the standard deviations happened to go one 
way or the other during their peak years.

To get another angle on the quality of play question, we designed and carried 
out our own study.  We identified all players who earned a significant amount of 
playing time in consecutive seasons, and then we looked at how their stats 
changed from the first year to the second year.

The theory behind this study is that an expansion year introduces a significant 
number of new players into a league.  The returning players are presumably of a 
higher quality than the new players, so the returning players should see their 
stats improve because they're now getting some of their atbats against 
expansion-quality players.

Looking at all years since 1901, not just expansion years, we noticed that 
returning players tend to perform at a slightly lower level in the second year, 
though the drop from one year to the next was only a few points of OPS.

This tendency for returning players to decline may indicate a general rise in 
the quality of play or a selection bias.  Our sample included only those 
players who met a minimum playing time threshold in both seasons, and it's 
possible that players in the decline phase of their career were more likely to 
qualify than were the younger, better players who were about to take their jobs.

Schell's standard deviation work suggests that the quality of play was 
noticeably lower in those early years, improved rapidly until the 1930s, and 
then settled down to a steady state.  If that was true, we would have expected 
to see a more rapid rate of decline among returning players in those early 
years.  

But the rate of decline of the returning players was no different in the early 
years than at other times.  That's consistent with the idea that the quality of 
the player pool has been improving, slowly but steadily, from day one.  

We did see plenty of evidence of a decline in quality in the war years and 
expansion years.  In every case, the returning players improved.  This supports 
the idea that they now had the opportunity to beat up on the weaker players who 
had come into the league in that second season, enough to overcome the normal 
rate of decline.

Those expansion effects were relatively small, however, and didn't last very 
long.  As a result, they weren't enough to make an impact on player ratings that 
were based on seven or eight seasons of playing time, and we chose not to make 
any expansion year adjustments.

We did adjust for the World War II years of 1943 to 1945.  The change in 
performance was more striking for those three seasons, and it immediately 
reversed itself when the players came back in 1946.  Even this adjustment didn't 
make a big difference for the players who were affected, because these players 
were also being rated on several other seasons in which they faced the best 
players of the day.

And when we chose to include all years back to 1876 for the 2006 update to this
disk, we began discounting the early years of professional baseball to account
for the lower quality of play during that era.  Without these adjustments, too 
many of the top players from the 1880s ranked at or near the top at their 
respective positions.


Summing up
----------

The standard deviation approach provides some evidence supporting the plausible 
notion that it was easier to dominate a 1906 league than a 1996 league, but it 
doesn't support the idea that the quality of play declined in WWII and in the 
expansion years.

The returning players study suggests that the quality of play has been improving 
slowly since the beginning and continues to do so.  And it does support the idea 
that quality is diluted when war or expansion introduces a lot of new players at 
one time.

The bottom line is that we chose not to make a timeline adjustment like the one 
Schell made.  We reached that conclusion for several reasons.

First, it is difficult to decide how to quantify the rate of improvement and 
over what period of time to apply it.  The standard deviation work and the 
returning players work suggest different patterns of improvement, and neither 
provided us with a clear result that we felt comfortable with.

Second, we believe that anyone who believes they can compare 1906 and 1996 stats 
with mathematical precision is fooling themselves.  No matter how much work we 
do with differences, percentages, or standard deviations, we're never going to 
know what numbers Ty Cobb would have put up in today's game.  

Maybe Cobb's career stats would translate in a direct way, or maybe he would 
take one look at today's smaller parks and change his approach, trading off 
thirty points of batting average for another fifteen homers.  We just don't 
know.

Third, if we assume that the quality of play has indeed been improving steadily 
and continues to do so, today's players must be far better than those of a 
hundred years ago.  That argument makes sense when you consider the substantial 
and measurable improvements that have been made in other athletic endeavors like 
track and field.  Modern athletes are stronger, faster, better conditioned, 
better fed, and have access to better medicine than their counterparts from 
a century ago.

If we really believe that, and if we follow through on that belief by adding 
a timeline adjustment for all players, the heroes from the early 1900s 
wouldn't look so good.  We could end up turning Ty Cobb into Lenny Dykstra 
and Babe Ruth into Bobby Abreu.  And we don't think that would make for a 
very interesting disk.

Finally, when we looked at the results, we didn't see a compelling need to 
discount those early performances.  It's not as if those players wound up 
dominating our rankings.  In fact, we found that all eras were well represented 
at the top of our leader boards, suggesting that there are no serious era-based 
biases in our method.

It's true that several of the top batting averages on this disk belong to 
players from the dead-ball era, but that doesn't mean those players are 
overrated.  Players from other eras have greater overall value because they 
supplement their batting averages with more power, higher walk rates, or both. 

Using our approach, Ty Cobb looks like the Ty Cobb we've all read about.  He's 
got the best projected batting average on the disk, good doubles and triples 
power, and he can run.  He's not the best player on the disk, but he is in the 
top ten.  We can live with that.  In our view, that's much better than the 
alternatives.

At the risk of repeating ourselves, we're not claiming that this is the one 
right answer, and we're not at all sure there is one right way that a lot 
of people can agree on.  Time and further study may change our thinking about 
standard deviations and some of the other decisions we've made.  

For now, however, we're very happy with the way things turned out, and we hope 
you are, too.