Sei sulla pagina 1di 6

John Thomas Econometrics Project Spring 09 General Managers of professional sports franchises spend countless hours trying to determine

what a players added value would be to their team, which would lead them to offering a yearly salary that equals the added value of that player in equilibrium of a competitive market. In theory, the team that values the player the most will offer that player the most money, so resources across the industry will be allocated efficiently. In professional baseball, general managers look at various statistics, such as batting average, on-base percentage, slugging percentage, home runs, strikeouts, defensive ratings, and many others, to determine what potential impact a player could have on a teams winning percentage. In this paper, I intend on finding a correlation between some of these statistic measures and a Major League players average yearly salary using multiple regression analysis. In the analysis, the average yearly salary of a player will be the dependent variable (measured in millions) and statistic measures like the ones listed above will be used as independent variables. In professional baseball, players are not able to enter the free market until after 5-6 years of major league service, so in order to get a fair estimate on the relationship between a players skills and his salary, only players with former free agent eligibility will be examined and the emphasis will be on one skill position, specifically the outfield. The

John Thomas Econometrics Project Spring 09 sample size is 30 outfielders, which is the minimum number to estimate a normal distribution, but also the number of outfielders that I could find that meet my eligibility requirements. In a standard OLS multiple regression analysis, the equation can be written: Y = B0,i + B1,iX1,i + B2,iX2,i + B3,iX3,i + Bk,iXk,I + errori For all i n In my regression analysis, there are eight variables that I will use in different combinations to find the highest adjusted R2, which will show how well the regression equation has done at predicting a players average yearly salary. The variables taken into consideration are batting average, on-base percentage, slugging percentage, home runs, strikeouts, a defensive rating called the Ultimate Zone Rating, whether the player plays centerfield or not (a binary variable), and major league service in years. UZR is a combination of defensive measures that compares a player to the league average, and so can be positive or negative. One problem with considering so many variables is that some variables, such as batting average and on-base percentage, or the UZR and the ability to play centerfield, are highly correlated. With different combinations of these variables used in regression analysis, it may be possible to find which variables are more important to General Managers. One curious example is the use of home runs in

John Thomas Econometrics Project Spring 09 combination with slugging percentage. While home runs are tied into a players slugging percentage, home run hitters are glamorized and could be overvalued because home runs are exciting, as well as instant run production. A complete multiple regression with all eight variables provides an adjusted R2 at .392 and can be written: Avg. Yrly Salary= B0 + B1(BAi )+ B2(OBPi )+ B3(SLGi )+ B4(HRi )+ B5(Ki )+ B6(UZRi ) + B7(CFi ) + B8(SERVICEi ) + error. For all i n

Using 30 outfielders statistics, this regression equation becomes Avg. Yrly Salary= -18.7126 + 171.7016BAi + 12.78652OBPi + -81.88347SLGi + .7636563HRi - .0339474Ki + .0921453UZRi + 4.470975CFi + .0264646SERVICEi + error. However, not all of these variables seemed statistically significant, so after numerous trials to find the highest adjusted R2, the best regression equation only used five of the explanatory variables and gave an adjusted R2 of .445. This equation was Avg. Yrly Salary= B0 + B1(BAi )+ B2(HRi )+ B3(Ki )+ B4(UZRi ) + B5(CFi ) + error. It seems odd that On-Base Percentage and Slugging Percentage are two of the variables omitted, especially since both are known to be highly valued in the baseball world, but this may be due to other variables like Batting Average and Home Runs being highly correlated to OBP and SLG respectively, but may also be due to the small sample size. This best equation with coefficients entered is written: Avg. Yrly Salary= -20.94275 + 93.00385BAi + .4208976HRi - .

John Thomas Econometrics Project Spring 09 0470357Ki + .0920968UZRi + 4.484997CFi + error. These coefficients look weird because while home runs and strikeouts are integers greater than or equal to 0, batting average is a decimal ranging usually around .250 to .330, UZR can be positive or negative, and CF is a binary variable. Also, even though this equation gave the highest adjusted R2, it can be argued that UZR is not statistically significant, with a t-statistic 1.45, respectively, compared to a t-statistic of 1.64 for a one-sided hypothesis test. Contracts given to players may not be entirely based on previous performance but on expected potential development as these players gain experience in the league and continue to improve. With a sample size of only 30 players, there is significant error, which can come from a team overvaluing a player, or the players agents ability to negotiate for a better contract, possibly by holding out and waiting until there is player scarcity in the market. By analyzing this multiple regression equation, a General Manager should be willing to pay a player with one additional home run an additional $420,897 in average yearly salary. And while UZR and the ability to play CF are probably highly correlated, UZR may not be statistically significant, but a general manager could be willing to pay around $4.48 million more for a centerfielder. This seems overstated however because out of the 30 players examined, only 7

John Thomas Econometrics Project Spring 09 were considered centerfielders. These veteran players are often premiere talents with greater offensive ability than most centerfielders, while most teams seem to have young defensiveminded players in centerfield that are under team control and therefore not considered in my analysis. When considering other equations with a combination of different variables considered, Slugging Percentage became an important variable, many times being statistically significant. In some cases however, some variables had an opposite correlation than intuition and common sense would predict. The next best regression equation gave an adjusted R2 of .422, and considered Batting Average, Slugging Percentage, Home Runs, Strikeouts, and Centerfield. In this case, while not statistically significant with a t-stat of -1.05, slugging percentage had a negative coefficient and hence a negative correlation, which does not make sense. In this model, home runs were statistically significant and may have had an adverse effect on slugging percentage when considered in the same equation. While not satisfying my own expectations to estimate a MLB players average yearly salary with high certainty, this multiple regression analysis has shown that general managers seem to value home run hitters that can limit their strikeouts while still playing adequate defense in the outfield, while putting a premium on

John Thomas Econometrics Project Spring 09 centerfielders. In my observations of Major League Baseball, this research reinforces my existing view of veteran outfielders. Since steroid testing has been in place and the economy has slowed, there has been a slow shift to younger, more athletic outfielders with a more all-around game, but my analysis is focused on veterans, so it would be interesting to re-do this project in a few years.

Potrebbero piacerti anche