Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
• Country of birth
– Example values are France, UK, Germany
– this is an unordered category because France is not
more or less than the UK
– We may assign numbers to category values for
convenience (e.g. 1 = UK, 2 = France), but you
cannot meaningfully add or subtract the numbers
– This severely restricts the type of statistics we can
use with categorical variables
Common Descriptive Statistics
• Count (frequencies)
• Percentage
• Mean
• Mode
• Median
• Range
• Standard deviation
• Variance
• Ranking
Descriptive Statistics
• Descriptive Statistics are Used by Researchers to Report
on Populations and Samples
• In Sociology:
Summary descriptions of measurements (variables) taken
about a group of people
Population Sample
Descriptive Statistics
An Illustration:
Which Group is Smarter?
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Each individual may be different. If you try to understand a group by remembering the
qualities of each member, you become overwhelmed and fail to understand the group.
Descriptive Statistics
Which group is smarter now?
110.54 110.23
• Median
– Middle Score
• Mean
– Arithmetic Average, etc.
Indicators of Central Tendency
Mode = 15 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Advantages
•Quick and easy to compute
•Unaffected by extreme scores
•Can be used at any level of measurement.
Indicators of Central Tendency
Mode = 15 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Disadvantages
•Terminal Statistic
• A given sub-group could make
this measure unrepresentative.
Indicators of Central Tendency
Median
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
50th Percentile = n + 1
2
Indicators of Central Tendency
Median = 19.5 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Advantages
•Unaffected by extreme scores
•Can be used at all levels above nominal.
Indicators of Central Tendency
Median = 19.5 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Disadvantages
•Only considers order- value ignored.
Indicators of Central Tendency
-Arithmetic Mean
-Harmonic Mean
-Geometric Mean
also.. -f mean
-Truncated mean
-Power mean
-Weighted arithmetic mean
-Chisini mean
-Identric mean, etc, etc…
Indicators of Central Tendency
Mean
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
∑X
X= n
Indicators of Central Tendency
Mean = 17.9 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
251
(10+11+11+15+15+15+19+20+21+21+22+22+24+25)
X= 14
Indicators of Central Tendency
Mean = 17.9 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Advantages
•Very sensitive measure
•Takes into account all the available information
•Can be combined with means of other groups to give the overall mean.
Indicators of Central Tendency
Mean = 17.9 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Disadvantages
•Very sensitive measure
•Can only be used on interval or ratio data
•Can only be used when scores are symmetrical above and below X.
Distribution
120
Number of People
100
80
60
40
20
140
120
Number of People
100
80
60
40
20
120
Number of People
100
…but first described
80
mathematically by
Abraham De Moivre Carl Friedrich Gauss
60
in 1733… Applied ND in 1809 to
…published 1924! establish the diameter
40 of lunar features
20
•Naturally Occurring
•Symmetrical
160
Normal Distribution
140
Mode
120
Number of People
Median
100
80
Mean
60
40
20
120 Point of
Number of People
100
Inflection: A
point of a curve at
80 which a change in
the direction of
60 68.26% curvature occurs.
40
20
40
20 2.15% 2.15%
13.59% 13.59%
0
100 SD = 1000
Z = +1
80 Study of SD size
34.13% 34.13% = ‘Kurtosis’
60
40
20 2.15% 2.15%
13.59% 13.59%
0
100 68.26%
80
60
40
20
100
Z = +0.5
80
60 68.26%
40
20
100
Mean
80
60
40
20
100
Mean
80
60
40
20
• Histograms
• Skewness
• Kurtosis
(x x) 2 n
i
(x x)
i
2
s
2 i 1
s i 1
n 1 n 1
s
CV 100%
x
Measures of Skewness and Kurtosis
• A fundamental task in many statistical analyses is to
characterize the location and variability of a data set
(Measures of central tendency vs. measures of
dispersion)
• Both measures tell us nothing about the shape of the
distribution
• A further characterization of the data includes
skewness and kurtosis
• The histogram is an effective graphical technique for
showing both the skewness and kurtosis of a data set
Histograms
48
Percent of cells in catchment
44
40
36
32
28
24
20
16 A proxy for
12
8 Soil Moisture
4
0
4 5 6 7 8 9 10 11 12 13 14 15 16
Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA:
Macmillan College Publishing Co., p. 91.
Further Moments of the Distribution
(x x)
i
3
skewness i 1
3
ns
• If skewness equals zero, the histogram is symmetric
about the mean
• Positive skewness vs negative skewness
Further Moments – Skewness
Source: http://library.thinkquest.org/10030/3smodsas.htm
Further Moments – Skewness
• Positive skewness
– There are more observations below the mean than
above it
– When the mean is greater than the median
• Negative skewness
– There are a small number of low observations and
a large number of high ones
– When the median is greater than the mean
Further Moments – Kurtosis
(x x)
i
4
kurtosis i
4
3
ns
• The kurtosis of a normal distribution is 0
• Kurtosis characterizes the relative peakedness or
flatness of a distribution compared to the normal
distribution
Further Moments – Kurtosis
• Platykurtic– When the kurtosis < 0, the frequencies
throughout the curve are closer to be equal (i.e., the
curve is more flat and wide)
• Thus, negative kurtosis indicates a relatively flat
distribution
• Leptokurtic– When the kurtosis > 0, there are high
frequencies in only a small part of the curve (i.e, the
curve is more peaked)
• Thus, positive kurtosis indicates a relatively peaked
distribution
Further Moments – Kurtosis
platykurtic leptokurtic
Source: http://www.riskglossary.com/link/kurtosis.htm
• Histograms
• Box plots
Functions of a Histogram
5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22,
23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47
5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22,
23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47
25th percentile: 11.75 75th percentile: 26
Interquartile range: 26 – 11.75 = 14.25
Other Descriptive Summary Measures