Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Page 1
There are several types of averages. We will consider five: the arithmetic mean, weighted mean, the median, the mode, and the geometric mean.
Measures of Location
The purpose of a measure of location is to pinpoint the center of a set of observations. Measure of location: A single value that summarizes a set of data. It locates the center of the values. The arithmetic mean, or simply the mean, is the most widely used measure of location. Mean: The sum of observations divided by the total number of observations. The population mean is calculated as follows:
Population mean =
X N
[3 1]
Page 2
Where: represents the population mean. It is the Greek letter mu. N is the number of items in the population. X is any particular value. indicates the operation of adding all the values. It is the Greek letter sigma. X is the sum of the X values. [3-1] indicates the formula number from the text. Any measurable characteristic of a population is called a parameter.
Parameter: A characteristic of a population.
X=
X n
[3 2]
Where: X is the sample mean; it is read as X bar. n is the number of values in the sample. X is a particular value. indicates the operation of adding all the values. X is the sum of the X values. [3-2] is the formula number from the text. The mean of a sample, or any other measure based on sample data, is called a statistic.
Statistic: A characteristic of a sample.
The mean weight of a sample of laptop computers is 6.5 pounds, is an example of a statistic. In formulas [3-1] and [3-2] the mean is calculated by summing the observations and dividing by the total number of observations. Suppose the Kellogg Companys quarterly earnings per share for the last five quarters are: $0.89, $0.77, $1.05, $0.79, and $0.95. If the earnings are a population, the mean is found by:
Page 3
The mean quarterly earning per share is $0.89. In some situations the mean may not be representative of the data. As an example, the annual salaries of five vice presidents at AVX, LLC are $115,000, $135,000, $118,000, $126,000, and $350,000. The mean is:
=
X ($115, 000 + $135, 000 + $118, 000 + $126, 000 + $350, 000) = N 5 $884, 000 = = $168,800 5
Notice how the one extreme value ($350,000) pulled the mean upward. Four of the five vice presidents earned less than the mean, raising the question whether the arithmetic mean value of $168,800 is typical of the salary of the five vice presidents.
( X X ) = 0
Weighted Mean
The weighted mean is a special case of the arithmetic mean. It is often useful when there are several observations of the same value.
Weighted mean: The value of each observation is multiplied by the number of times it occurs. The sum of these products is divided by the total number of observations to determine the weighted mean.
In general, the weighted mean of a set of values, designated X1, X2, X3, Xn, with the corresponding weights w1, w2, w3, , wn is computed by:
Weighted Mean
Xw =
w1 X 1 + w2 X 2 + w3 X 3 + + wn X n w1 + w2 + w3 + + wn
[3 3]
The weighted mean is particularly useful when various classes or groups contribute differently to the total. For example, the coronary care unit of a hospital consists of nurses-aides who are paid $14 per hour, nurses- assistants who earn $18 per hour, and registered nurses who earn $28 per hour. To say the average hourly wage for the coronary unit is $20 per hour ($14 + $18 + $28) 3 would not be accurate unless there were the same number of people in each group.
Page 4
Suppose the coronary care unit has ten employees: two aides who earn $14 per hour, 3 nursesassistants who earn $18 per hour, and five registered nurses who earn $28 per hour. The weighted mean is:
Xw =
=
w1 X 1 + w2 X 2 + w3 X 3 + + wn X n w1 + w2 + w3 + + wn
(2 $14) + (3 $18) + (5 $28) $28 + $54 + $140 $222 = = = $22.20 2+3+5 10 10
The Median
It was pointed out that the arithmetic mean is often not representative of data with extreme values. The median is a useful measure when we encounter data with an extreme value.
Median: The midpoints of the values after all observations have been ordered from the smallest to the largest, or from largest to smallest.
Fifty percent of the observations are above the median and 50 percent are below the median. To determine the median, the values are ordered from low to high, or high to low, and the middle value selected. Hence, half the observations are above the median and half are below it. For the vice president incomes, the middle value is $126,000, the median. $115,000 $118,000 $126,000
median Obviously, it is a more representative value in this problem than the mean of $168,800.
$135,000
$350,000
Note that there were an odd number of vice president incomes (5). For an odd number of ungrouped values we just order them and select the middle value. To determine the median of an even number of ungrouped values, the first step is to arrange them from low to high as usual, and then determine the value half way between the two middle values. As an example, the number of bronze castings produced in a day at Markey Bronze is 87, 62, 91, 58, 99, and 85. Ordering these from low to high: 58 62 85
D D
87
91
99
The median number produced is halfway between the two middle values of 85 and 87. The median is 86. Thus we note that the median (86) may not be one of the values in a set of data.
Page 5
2. It can be computed for ordinal-level data or higher. 3. There is only one median value for each set of data.
The Mode
A third measure of location is the mode.
Mode: The value of the observation that appears most frequently.
The mode is the value that occurs most often in a set of raw data. The dividends per share declared on five stocks were: $3, $2, $4, $5, and $4. Since $4 occurred twice, which was the most frequent, the mode is $4.
If the distribution is not symmetrical, it is skewed and the relationship between the mean, median, and mode changes. If the long tail is to the right, the distribution is said to be a positively skewed distribution.
Number
The chart on the right shows the useful life of a sample of batteries used in a CD player. Note the symmetrical bell-shape of the distribution. In a symmetrical distribution the mean, median and mode are equal.
Positively skewed distribution: The long tail is to the right; that is, in the positive direction. The mean is larger than the median or the mode.
The chart on the right shows the years of service for a group of employees at an old manufacturing
20 18 16 14 12 10 8 6 4 2 0
Positive Skewness
Number
6 11 16 21 26 31 36 41 46 51
Years of Service
Page 6
plant that was revitalized with a new product line and experienced a hiring surge about 13 years ago. It is a positively skewed distribution. The mean is larger than the median, which is larger than the mode. For a negatively skewed distribution the mean is the smallest of the three measures of central tendency (because it is being pulled down by the small observations). The mode is the highest of the three measures.
Negatively skewed distribution: The long tail is to the left or in the negative direction. The mean is smaller than the median or mode.
The chart on the right shows the years of service for a group of teachers in a school system that has an experienced staff and has not hired many staff in recent years. The mean is smaller than the median, which is smaller than the mode. In skewed distributions the mode always appears at the apex or top (highest point) on the curve, and the mean is pulled in the direction of the tail. The median always appears between the mode and the mean, regardless of the direction of the tail.
Number
Negative Skewness
20 18 16 14 12 10 8 6 4 2 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53
Years of Service
The geometric mean is used to determine the average change of percentages, ratios, indexes, or growth rates.
Geometric mean: The n root of the product of n values.
th
GM = n ( X 1 )( X 2 )( X 3 )
(Xn)
[3 4]
The geometric mean can be used for averaging percents. Suppose the return on investment for Parnell International for the past 4 years is 0.4%, 2.9%, 2.1%, and 12.3%. The GM increase over the period is 4.3 percent, found by:
GM = n ( X 1 )( X 2 )( X 3 ) = 4 1.18455 = 1.043 (Xn)
Page 7
The geometric mean is fourth root of 1.18455, which is 1.043. The average return on the investment is found by subtracting one from the geometric mean. (1.043 1.000) = 0.043 = 4.3%. Another application of the geometric mean is to find average percent change over a period of time. Text formula [3-5] is used:
Average Percent Increase Over Time
GM = n
[3 5]
Measures of Dispersion
We will consider several measures of dispersion: the range, the mean deviation, the variance, and the standard deviation.
Range
The simplest measure of dispersion is the range.
Range: The difference between the largest and smallest values in a data set.
[3-6]
The statistics instructor referred to above has two classes with the ages indicated:
A.M. Class: 18, 20, 21, 21, 23, 23 P.M. Class: 17, 17, 18, 20, 25, 29
Page 8
Thus we can say that there is more spread in the ages of the students enrolled in the evening (P.M.) class compared with the morning (A.M.) class. The characteristics of the range are: Only two values are used in the calculation. It is influenced by extreme values. It is easy to compute and understand. It can be distorted by an extreme value.
Ages of Students 20 20 21 22 60
The range has two disadvantages. It can be distorted by a single extreme value. Suppose the same statistics instructor has a third class of five students. The ages of these students are given in the table.
The range of ages is 40 years, yet four of the five students ages are within two years of each other. The 60-year old student has distorted the spread. Another disadvantage is that only two values, the largest and the smallest, are used in its calculation.
Mean Deviation
In contrast to the range, the mean deviation considers all the data.
Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean.
X X
17 21 17 21 = 4 = 4
Absolute Deviation
= 4 = 4
Mean Deviation
Where: X is the value of each observation. X is the arithmetic mean of the values. n is the number of observations in the sample. indicates the absolute value. We take the absolute value of the deviations from the mean because if we didnt, the positive and negative deviations from the mean exactly offset each other, and the mean deviation would always be zero. Such a measure (zero) would be a useless statistic.
Page 9
The mean deviation is computed by first determining the difference between each observation and the mean. These differences are then averaged without regard to their signs. For the PM statistics class the mean deviation is 4.0 years, found by the table on the right: Then
18 21 20 21 25 21 29 21 XX
= 3 = 3 = 4 = 8
= 3 = 1 = 4 = 8 = 16
MD =
| X X | 16 = =4 n 4
The parallel lines indicate absolute value. To interpret, 4.0 years is the mean amount by which the ages differ from the arithmetic mean age of 21.0 years for the PM students.
Variance: The arithmetic mean of the squared deviations from the mean.
The variance is non-negative and is zero only if all observations are the same.
Population Variance
The formula for the population variance and the sample variance are slightly different. The formula for the population variance is:
Population Variance
2 =
( X ) 2 N
[3 8]
Page 10
Where:
2
X
is the symbol for the population variance ( is the Greek letter sigma). It is read as sigma squared. is a value of an observation in the population. is the arithmetic mean of the population. is the total number of observations in the population.
The major characteristics of the variance are: 1. All the observations are used in the calculations. 2. It is not as distorted by extreme observations as the range. 3. The units are somewhat difficult to work with. (They are the original units squared.)
( X ) 2 N
[3 9]
Sample Variance
The conversion of the population variance formula to the sample variance formula is not as direct as the change made when we went from the population mean formula to the sample mean formula. Recall in that instance we replaced with X and N with n. The conversion from population variance to sample variance requires a change in the denominator. Instead of substituting n, the number in the sample, for N, the number in the population, we replace N with (n 1). Thus the formula for the sample variance is:
Sample Variance
s2 =
( X X ) 2 n 1
[3 10]
Where: s2 X X n
is the symbol for the sample variance. It is read as s squared. is the value of each observation in the sample. is the mean of the sample. is the total number of observations in the sample.
Changing the denominator to (n 1) seems insignificant, however the use of n tends to underestimate the population variance. The use of (n 1) in the denominator provides an appropriate correction factor.
Page 11
Standard Deviation
s=
( X X ) 2 n 1
[3 11]
We can use Chebyshevs theorem to determine the percent of the values that lie within a specified number of standard deviations of the mean.
Chebyshevs theorem: For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1 1/k2, where k is any constant greater than 1.
The theorem holds for any set of observations regardless of the shape of the distribution.
The Empirical Rule
Chebyshevs theorem can be applied to any set of values: that is, the distribution of values can have any shape. If the distribution is approximately symmetrical and bell shaped, then the Empirical Rule, or Normal Rule as it is often called, is applied.
Empirical Rule: For a symmetrical, bell-shaped frequency distribution, approximately 68 percent of the observations will lie within plus and minus one standard deviation of the mean; about 95 percent of the observations will lie within plus and minus two standard deviations of the mean; and practically all (99.7 percent) will lie within plus and minus three standard deviations of the mean.
The rule states that: The mean, plus and minus one standard deviation, will include about 68% of the observations. The mean, plus and minus two standard deviations, will include about 95% of the observations. The mean, plus and minus three standard deviations, will include about 99.7% of the observations.