Sei sulla pagina 1di 9

C HAPTER 1: D ESCRIPTIVE S TATISTICS

Jiheng Zhang

A data set having a relatively small number of distinct values can be conveniently presented in a frequency table. For instance, Table 2.1 is a frequency table for a data set consisting of the starting Sets students Describing Data yearly salaries (to the nearest thousand dollars) of 42 recently graduated Paired Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets with B.S. degrees in electrical engineering. Table 2.1 tells us, among other things, that the Frequency Tables $47,000 was received by four of the graduates, whereas the highest lowest starting salary of and Graphs salary of $60,000 was received by a single student. The most common starting salary was A data set having by 10 of the students. $52,000, and was receiveda relatively small number of distinct values

can be conveniently presented in a frequency table.


TABLE 2.1 Starting Yearly Salaries

Starting Salary
47 48 49 50 51 52 53 54 56 57 60

Frequency
4 1 3 5 8 10 0 5 2 3 1

2.2 Describing Data Sets

11

12

Spring, 2010

10

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Data from a frequencyTABLE : Starting Yearly Salaries a line graph that plots the table can be graphically represented by 6 distinct data values on the horizontal axis and indicates their frequencies by the heights of Chapter 1: Jiheng Zhang vertical lines. A lineDescriptive Statistics presented in Table 2.1 is shown in Figure 2.1. graph of the data 4 When the lines in a line graph are given added thickness, the graph is called a bar graph. Figure 2.2 presents a bar graph. 2 Another type of graph used to represent a frequency table is the frequency polygon, which plots the frequencies of the different data values on the vertical axis, and then connects the 0 47 49 51 plotted points with straight48 Sets 50 Chebyshevs52 a53 54 Normal Data Sets 60 the dataData Sets lines. Figure 2.3 presents frequency 56 57 for Paired of polygon Describing Data Sets Summarizing Data Inequality Starting salary Table 2.1.
FIGURE Tables and Frequency 2.1 Starting salary data. Graphs 2.2.2 Relative Frequency Tables and Graphs

Frequency Tables and Graphs 2.2 Describing Data Sets


12

11

Consider a data set consisting of n values. If f is the frequency of a particular value, then 12 the ratio f /n is called its relative frequency. That is, the relative frequency of a data value is
10

10

Frequency

8 Frequency Frequency 47 48 49 50 51 52 53 54 56 57 60

47

48

49

50

Starting salary

51 52 53 Starting salary

54

56

57

60

FIGURE 2.1

Starting salary data.

FIGURE 2.2

Bar graph for starting salary data.

F IGURE : Line Graph


Chapter 1: Descriptive Statistics 12 Jiheng Zhang

F IGURE : Bar Graph


Chapter 1: Descriptive Statistics Jiheng Zhang

10

2.2 Describing Data Sets


Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets Paired Data Sets Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets Paired Data Sets

13

Frequency 12 Tables and Graphs


12

Chapter 2: Descriptive Statistics

Relative Frequency Tables and Graphs


TABLE 2.2

Starting Salary
47 48 49 50 51 52 53 54 56 57 60

Frequency
4/42 = .0952 1/42 = .0238 3/42 5/42 8/42 10/42 0 5/42 2/42 3/42 1/42

10

8 Frequency

0 47

48

49

50

51

52 53 Starting salary

54

56

57

60

FIGURE 2.3

Frequency polygon for starting salary data.

the proportion of the data that have that value. The relative frequencies can be represented F IGURE : Frequency Polygon graphically by a relative frequency line or bar graph or by a relative frequency polygon. Indeed, Descriptive frequency Chapter 1: these relativeStatistics graphs will look like the corresponding graphs of the Jiheng Zhang absolute frequencies except that the labels on the vertical axis are now the old labels (that gave the frequencies) divided by the total number of data points.
EXAMPLE 2.2a Table 2.2 is a relative frequency table for the data of Table 2.1. The relative frequencies are obtained by dividing the corresponding frequencies of Table 2.1 by 42, the size of the data set. I

TABLE : Starting Yearly Salaries


Chapter 1: Descriptive Statistics
2.2 Describing Data Sets

Melanoma 4.5%

Bladder 6%

Jiheng Zhang
13

TABLE 2.2

Starting Salary
47 48 49 50 51 52 53 54 56 57 60

Frequency
4/42 = .0952 1/42 = .0238 5/42 8/42 10/42 0 5/42 2/42 3/42 1/42

Lung 21%

A pie chart is often used to indicate relative frequencies when the data are not numerical
Describing Data Sets in nature. A circle is constructed and then sliced into different sectors; one for each distinct Summarizing Data Sets Chebyshevs Inequality Normal Data Sets Paired Data Sets Describing Data Sets Summarizing Data Sets

3/42 Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Relative Frequency Tables and Graphs

type of data value. The relative frequency of a data value is indicated by the area of its sector, this area being equal to the total area of the circle multiplied by the relative frequency of the data value.

Relative Frequency Tables and Graphs


A pie chart is often used to indicate relative frequencies when the data are not numerical in nature.
Prostate 27.5%
Melanoma 4.5% Bladder 6% Lung 21%

EXAMPLE 2.2b The following data relate to the different types of cancers affecting the 200

most recent patients to enroll at a clinic These data are The following chart presented in Figure 2.4. specializing in cancer.types of represented cancers I in the pie data relate to the different 14 affecting the 200 most recent patients to enroll at a clinic Chapter 2: Descriptive Statistics specializing in cancer.

Type of Cancer
Lung Breast Colon Prostate Melanoma Bladder

Number of New Cases


42 50 32 55 9 12

Relative Frequency
.21 .25 .16 .275 .045 .06

Prostate 27.5%

Breast 25%
Breast 25%

2.2.3 Grouped Stem and Leaf Plots

TABLE : Data of Ogives, and Data, Histograms, different Cancers


Colon 16%

As seen in Subsection 2.2.2, using a line or a bar graph to plot the frequencies of data values Chapter 1: Descriptive Statistics Jiheng Zhang is often an effective way of portraying a data set. However, for some data sets the number of distinct values is too large to utilize this approach. Instead, in such cases, it is useful to divide the values into groupings, or class intervals, and then plot the number of data values

Chapter 1: Descriptive Statistics


FIGURE 2.4

Jiheng Zhang

Colon 16%

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Grouped Data, Histograms, Ogives

For some data sets the number of distinct values is too large. It is useful to divide the values into groupings, or class intervals, and then plot the number of data values falling in each class interval. too few classes: loss too much information about the 2.2 Describing actual data values in a class Data Sets too many classes: the frequencies of each class being too small for a pattern to be discernible

frequencies of each class being too small for a pattern to be discernible. Although 5 to 10 Paired Data Sets class intervals are typical, the appropriate number is a subjective choice, and of course, you can try different Data, Histograms and Ogives of the resulting charts appears to Grouped numbers of class intervals to see which be most revealingTABLE 2.3 Life the data.Incandescentcommon, although not essential, to choose class about in Hours of 200 It is Lamps Item Lifetimes intervals of equal length. 1,067 919 1,196 785 The endpoints 855 a 1,092 1,162 1,170 are1,126 936 class 1,156 1,035 1,045 will adopt the of class interval called950 918 boundaries. 948 the 905 972 920 We 929 1,157 1,195 1,195 1,340 1,122 that a class interval956 938 970 left-end inclusion convention, which1,009 1,157 1,151 1,009 1,237 958 1,102 its left-end but stipulates contains 1,022 978 832 765 902 923 1,333 811 not its right-end boundary point. 1,217 1,085 instance, the class interval 2030 contains Thus, for 896 958 1,311 1,037 702 521 933 all values that are both greater 928 1,153 equal to 20 and less1,069 1,062 1,063 than1,063 1,002 858 1,071 1,021 30. 1,157 or 946 909 1,077 than 830 930 807 954 999 932 1,035 944 940 Table 2.3 presents the lifetimes 1,250200 incandescent lamps. A833 1,320 of 1,049 1,078 1,122 1,115 1,011 1,102 class frequency table for 901 1,324 818 1,203 890 1,303 996 is presented in Table 2.4. 621 780 900 1,106 704 854 1,178 1,138are of length 100, with 951 the data of Table 2.3 The class intervals 1,187 1,067 1,118 1,037 958 760 1,101 949 992 966 the rst one starting at 500. 980 935 878 934 910 1,058 730 980 824 653
1,037 TABLE 2.4 1,026 1,039 1,023 1,134 998 610 844 814 1,151 1,147 1,083 984 932 996 916 990 1,035 A Class 863 Frequency Table 883 867 990 1,040 856 938 1,133 1,001 1,289 924 1,078 765 895 699 801 1,180 775 709 1,103 1,000 788 1,083 880Frequency 1,029 658 912 1,122 1,292 1,116 (Number of 954 880 1,173 in Data 824 529 Values 1,106 1,184 1,105 1,081 the 1,171 705 1,425 Interval) 860 1,110 1,149 972 1,002 1,143 1,112 1,258 935 931 1,192 1,069 970 922 1,170 932 1,150 1,067 904 1,091

As seen in Subsection 2.2.2, using a line or a bar graph to plot the frequencies of data values is often an effective way of portraying a data set. However, for some data sets the number of distinct values is too large to utilize this approach. Instead, in such cases, it is useful to divide the values into groupings, or class intervals, and then plot the number of data values Describing Data Sets Chebyshevs Inequality Normal Data Sets falling Summarizing interval. The number of class intervals chosen should be a trade-off in each class Data Sets between (1) choosing too few classes at a cost of losing too much information about the actual data values in a class and (2) choosing too many classes, which will result in the

15

frequencies of each class being too small for a pattern to be discernible. Although 5 to 10 class intervals are typical, the appropriate number is a subjective choice, and of course, you can try different numbers of class intervals to see which of the resulting charts appears to be most revealing about the data. It is common, although not essential, to choose class intervals of equal length. Chapter 1: Descriptive Statistics Jiheng Zhang The endpoints of a class interval are called the class boundaries. We will adopt the left-end inclusion convention, which stipulates that a class interval contains its left-end but not its right-end boundary point. Thus, for instance, the class interval 2030 contains all values that are both greater than or equal to 20 and less than 30. Table 2.3 presents the lifetimes of 200 incandescent lamps. A class frequency table for Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets Paired Data Sets the data of Table 2.3 is presented in Table 2.4. The class intervals are of length 100, with the rst one starting atHistograms and Ogives Grouped Data, 500.
TABLE 2.4 A Class Frequency Table

Class Interval

500600 2 600700 : Life in Hours of 200 Incandescent Lamps 5 TABLE 700800 12 Chapter 1: Descriptive Statistics Jiheng Zhang 800900 25 9001000 58 10001100 41 11001200 43 12001300 7 Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets 13001400 6 14001500 1

Paired Data Sets

Grouped Data, Histograms and Ogives


Number of occurrences 60 50 40 30 20 10 0 0 5 6 7 8

Class Interval
500600 600700 700800 800900 9001000 10001100 11001200 12001300 13001400 14001500
Number of occurrences 60 50 40

Frequency (Number of Data Values in the Interval)


2 5 12 25 58 41 43 7 6 1

9 10 11 12 13 14 15

Life in units of 100 hours

TABLE : A Class Frequency Table


Jiheng Zhang

FIGURE 2.5

A frequency histogram.

F IGURE : Bar Graph


Jiheng Zhang

Chapter 1: Descriptive Statistics

Chapter 1: Descriptive Statistics

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

16

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Chapter 2: Descriptive Statistics

Paired Data Sets

Grouped Data, Histograms and Ogives

Grouped Data, Histograms and Ogives


1.0

Histogram: bar graph with bars representing the frequency frequency histogram relative frequency histogram Ogive: Cumulative frequency (or relative frequency) graph A point on the horizontal axis of such a graph represents a possible data value; its corresponding vertical plot gives the number (or proportion) of the data whose values are less than or equal to it.
FIGURE 2.6

0.8 0.6 0.4 0.2 0

500

700

900

1,100 Lifetimes

1,300

1,500

A cumulative frequency plot.

F IGURE : Cumulative Relative Frequency Plot

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Stem and Leaf Plot

An efcient way of organizing a small- to moderate-sized data set is to utilize a stem and leaf plot. For instance, if the data are all two-digit numbers, then we could let the stem part of a data value be its tens digit and let the leaf be its ones digit. The number 62 can be expressed as Stem 6 Leaf 2

A bar graph plot1:of class data, with the bars placed adjacent toJiheng Zhang each other, is called Chapter Descriptive Statistics a histogram. The vertical axis of a histogram can represent either the class frequency or the relative class frequency; in the former case the graph is called a frequency histogram and in the latter a relative frequency histogram. Figure 2.5 presents a frequency histogram of the data in Table 2.4. We are sometimes interested in plotting a cumulative frequency Data Sets (or cumulative relative Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Paired Data Sets frequency) graph. A point on the horizontal axis of such a graph represents a possible data value; its corresponding vertical plot gives the number (or proportion) of the data Stem and Leaf Plot whose values are less than or equal to it. A cumulative relative frequency plot of the data of Table 2.3 is given in Figure 2.6. We can conclude from this gure that 100 percent of theThe followingless than 1,500, approximately 40 percent are less than or equal to data values are data give noise levels measured at 36 different times directly outsideare less thanCentral to 1,100, in Manhattan. 900, approximately 80 percent of Grand or equal Station and so on. A cumulative frequency plot is called an ogive. 82, 89, 94, 110, 74, 122, 112, 95, 100, 78, 65, 60, An efcient way of organizing a small- to moderate-sized data set is to utilize a stem and leaf plot. Such a90, 83, 87, 75, 114, 85, 69, 94, 124, 115, 107, 88, into two parts plot is obtained by rst dividing each data value its stem and its leaf. 97, 74, 72, 68, 83, 91, 90, 102, 77, 125, 108, 65 For instance, if the data are all two-digit numbers, then we could let the stem part of a data value be its tens digit and let the leaf be its ones digit. Thus, for instance, the value 62 is expressed as 6 0, 5, 5, 8, 9 7 2, 4, 4, 5, 7, 8 Stem Leaf 8 2, 3, 3, 5, 7, 8, 9 6A stem and leaf plot is: 9 0, 0, 1, 4, 4, 5, 7 2 10 0, 2, 7, 8 and the two data values 62 and 67 can be represented as 11 0, 2, 4, 5 12 2, 4, 5 Stem Leaf 6 2, 7
Chapter 1: Descriptive Statistics Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Summarizing Data Sets

Sample Mean, Sample Median, and Sample Mode

To obtain a feel for a large amount of data, it is useful to be able to summarize it by some suitably chosen measures.

D EFINITION (S AMPLE M EAN ) The sample mean, designated by , is dened by x


n i=1 xi

Now, suppose we have a data set consisting of the n numerical values x1 , x2 , . . . , xn

= x

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Mean, Sample Median, and Sample Mode


If for constants a and b yi = axi + b,
n i=1 axi

Sample Mean, Sample Median, and Sample Mode


How to compute the sample mean from the frequency table? Age 15 16 17 18 19 20 Frequency 2 5 11 9 14 13

i = 1, . . . , n
n i=1 xi n i=1 b

then the sample mean of data set y1 , . . . , yn is = y +b n = a n + n = a + b x

Eg. The winning scores in the U.S. Masters golf tournament in the years from 1999 to 2008 were as follows 280, 278, 272, 276, 281, 279, 276, 281, 289, 280 Subtract 280 from each one yi = xi 280: 0, 2, 8, 4, 1, 1, 4, 1, 9, 0 So = 0.8, thus = + 208 = 279.2. y x y
Chapter 1: Descriptive Statistics Jiheng Zhang

= (152+165+1711+189+1914+2013)/54 18.24 x

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Mean, Sample Median, and Sample Mode

Sample Mean, Sample Median, and Sample Mode

Suppose we have k distinct values v1 , . . . , vk . They have corresponding frequencies f1 , . . . , fk . How many observations in this data set?
k

D EFINITION (S AMPLE M EDIAN ) Order the values of a data set of size n from smallest to largest. If n is odd, the sample median is the value in position (n + 1)/2 If n is even, the sample median is the average of the values in positions n/2 and n/2 + 1. The number of values which are bigger than (>) than the sample median is equal the number of values which are less than (<) the sample median.

n=
i=1

fi

According to the denition, the sample mean is = x


k i=1 vi fi

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Mean, Sample Median, and Sample Mode

Sample Variance and Sample Standard Deviation

D EFINITION (S AMPLE M ODE ) Sample mode is the value that occurs with the greatest frequency. If no single value occurs most frequently, then all the values that occur at the highest frequency are called modal values. Q: What is the relationship among sample mean, sample median and sample mode?

Think about two data sets, which have the same mean but different spread (variability). D EFINITION (S AMPLE VARIANCE ) The sample variance, denoted by s2 , of the data set x1 , . . . , xn with mean is dened by x s2 =
n i=1 (xi

)2 x n1

Note: for technical reason, the sum of squared distances is divided by n 1 rather than n.

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Variance and Sample Standard Deviation

Sample Variance and Sample Standard Deviation


A N A LGEBRAIC I DENTITY
n n

Example: nd the sample variance of data set A and B given below. A : 3, 4, 6, 7, 10 B : 20, 5, 15, 24 The sample variance for A is
n

(xi )2 = x
i=1 i=1

2 xi n2 x

s2 = [(3)2 + (2)2 + 02 + 12 + 42 ]/4 = 7.5


i=1

(xi )2 = x
i=1 n

2 (xi 2xi + 2 ) x x n 2 x xi 2 i=1 n i=1 2 xi 2n2 + n2 x x i=1 n

The sample variance for B is s2 = [(26)2 + (1)2 + 92 + 182 ]/3 360.67

= =

xi +
i=1

2 x

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Variance and Sample Standard Deviation


If for constants a and b yi = axi + b, i = 1, . . . , n

Sample Variance and Sample Standard Deviation


Example: The following data give the worldwide number of fatal airline accidents of commercially scheduled air transports in the years from 1985 to 1993.
Year Accidents 1985 22 1986 22 1987 26 1988 28 1989 27 1990 25 1991 30 1992 29 1993 24

How to compute the sample variance of data set y1 , . . . , yn . As we know, = a + b, and so y x


n n

Subtracting 22 from each value 0, 0, 4, 6, 5, 3, 8, 7, 2 b (xi ) x


i=1 2 2

(yi ) = y
i=1

Calling the transformed data y1 , . . . , y9 , we have


n n

If s2 and s2 are the respective sample variances, then y x s2 = b2 s2 y x


Chapter 1: Descriptive Statistics Jiheng Zhang

yi = 35
i=1 i=1

y2 = 16 + 36 + 25 + 9 + 64 + 49 + 4 = 203 i

Hence the sample variance is S2 = 203 9(35/9)2 8.361 8


Jiheng Zhang

Chapter 1: Descriptive Statistics

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Variance and Sample Standard Deviation

Sample Percentiles and Box Plots

D EFINITION (S AMPLE S TANDARD D EVIATION ) The quantity s, which is the square root of the sample variance, is called sample standard deviation.

D EFINITION (S AMPLE P ERCENTILE ) The sample 100p percentile is that data value such that at least 100p percent of the data are less than or equal to it at least 100(1 p) percent are greater than or equal to it If two data values satisfy this condition, then the sample 100p percentile is the arithmetic average of these two values.

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

26
Describing Data Sets Summarizing Data Sets Chebyshevs Inequality

Chapter 2: Descriptive Statistics


Normal Data Sets Paired Data Sets Describing Data Sets Summarizing Data Sets Chebyshevs Inequality Normal Data Sets Paired Data Sets

TABLE 2.6

Population of 25 Largest U.S. Cities, 1994

Rank
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

City
New York, NY . . . . . . . . . . . . . . . . . Los Angeles, CA . . . . . . . . . . . . . . . Chicago, IL . . . . . . . . . . . . . . . . . . . Houston, TX . . . . . . . . . . . . . . . . . . Philadelphia, PA . . . . . . . . . . . . . . . San Diego, CA. . . . . . . . . . . . . . . . . Phoenix, AR . . . . . . . . . . . . . . . . . . . Dallas, TX . . . . . . . . . . . . . . . . . . . . San Antonio, TX . . . . . . . . . . . . . . . Detroit, MI . . . . . . . . . . . . . . . . . . . San Jose, CA . . . . . . . . . . . . . . . . . . Indianapolis, IN . . . . . . . . . . . . . . . San Francisco, CA . . . . . . . . . . . . . . Baltimore, MD . . . . . . . . . . . . . . . . Jacksonville, FL . . . . . . . . . . . . . . . . Columbus, OH . . . . . . . . . . . . . . . . Milwaukee, WI . . . . . . . . . . . . . . . . Memphis, TN . . . . . . . . . . . . . . . . . El Paso, TX . . . . . . . . . . . . . . . . . . . Washington, D.C. . . . . . . . . . . . . . Boston, MA . . . . . . . . . . . . . . . . . . . Seattle, WA . . . . . . . . . . . . . . . . . . . Austin, TX . . . . . . . . . . . . . . . . . . . . Nashville, TN . . . . . . . . . . . . . . . . . Denver, CO . . . . . . . . . . . . . . . . . . .

Population
7,333,253 3,448,613 2,731,743 1,702,086 1,524,249 1,151,977 1,048,949 1,022,830 998,905 992,038 816,884 752,279 734,676 702,979 665,070 635,913 617,044 614,289 579,307 567,094 547,725 520,947 514,013 504,505 493,559
Jiheng Zhang

Sample Percentiles and Box Plots

What is the 10 percentile? 25 10/100 = 2.5 What is the 80 percentile? 25 80/100 = 20 1, 338, 113

25 percent being between the second and third quartile, and 25 percent being greater than the third quartile.

Chapter 1: Descriptive Statistics

Chapter 1: Descriptive Statistics

Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Percentiles and Box Plots

Sample Percentiles and Box Plots


The following data give noise levels measured at 36 different times directly outside of Grand Central Station in Manhattan.
82, 89, 94, 110, 74, 122, 112, 95, 100, 78, 65, 60,

D EFINITION the rst quantile: the sample 25 percentile the second quantile: the sample 50 percentile the third quantile: the sample 75 percentile

90, 83, 87, 75, 114, 85, 69, 94, 124, 115, 107, 88, 97, 74, 72, 68, 83, 91, 90, 102, 77, 125, 108, 65 6 7 8 9 10 11 12 0, 5, 5, 8, 9 2, 4, 4, 5, 7, 8 2, 3, 3, 5, 7, 8, 9 0, 0, 1, 4, 4, 5, 7 0, 2, 7, 8 0, 2, 4, 5 2, 4, 5

A stem and leaf plot is: What is another name for the second quantile?

the rst quartile is 74.5, the average of the 9th and 10th smallest data values the second quartile is 89.5, the average of the 18th and 19th smallest values the third quartile is 104.5, the average of the 27th and 28th smallest values
Chapter 1: Descriptive Statistics Jiheng Zhang Chapter 1: Descriptive Statistics Jiheng Zhang

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Describing Data Sets

Summarizing Data Sets

Chebyshevs Inequality

Normal Data Sets

Paired Data Sets

Sample Percentiles and Box Plots

Chebyshevs Inequality

60

70

80

90

100

110

120

F IGURE : A Box Plot

Chapter 1: Descriptive Statistics

Jiheng Zhang

Chapter 1: Descriptive Statistics

Jiheng Zhang

Potrebbero piacerti anche