Quantitative Technique Analysis Notes

Descriptive Statistics
The farthest most people ever get

What is data?
• Data is made up of variables
• A variable is something that can take different values between
individuals or in the same individual at different time points
– Gender can take the value “male” or “female”

– Age can take a minimum numeric value of zero, and a
maximum numeric value of many years
– Time to react to your name being called out is an example of
a variable that would vary if you measured it in the same
individual at several time points
• It is usual in Psychology to measure the value of a variable in
many separate individuals
What does statistics do to data?
– Different types of variables
• categorical, ordinal, continuous (interval and ratio)
– If you have measured the same variable in many
individuals you need a way of summarising the
data
– What’s the “average” value?
– How much variation is there in the data?
• Compare – ask if one group differs from
another on the value of a variable
• Relate – ask how one variable changes as a
function of another one
Variables are classified according to their level
of measurement
• Country of birth
– Example values are France, UK, Germany
– this is an unordered category because France is not
more or less than the UK
– We may assign numbers to category values for
convenience (e.g. 1 = UK, 2 = France), but you
cannot meaningfully add or subtract the numbers
– This severely restricts the type of statistics we can
use with categorical variables
Common Descriptive Statistics
• Count (frequencies)
• Percentage
• Mean
• Mode
• Median
• Range
• Standard deviation
• Variance
• Ranking
• Descriptive Statistics are Used by Researchers to Report
on Populations and Samples
• In Sociology:
Summary descriptions of measurements (variables) taken
about a group of people
• By Summarizing Information, Descriptive Statistics

Speed Up and Simplify Comprehension of a Group’s
Characteristics
Sample vs. Population
Population Sample
An Illustration:
Which Group is Smarter?
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Each individual may be different. If you try to understand a group by remembering the
qualities of each member, you become overwhelmed and fail to understand the group.
Which group is smarter now?
Class A--Average IQ Class B--Average IQ
110.54 110.23
They’re roughly the same!
With a summary descriptive statistic, it is much easier to

answer our question
Indicators of Central Tendency
• Mode
– Most Frequently Occurring Score
• Median
– Middle Score
• Mean
– Arithmetic Average, etc.
Mode = 15 k.y-1
Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k
Advantages
•Quick and easy to compute
•Unaffected by extreme scores
•Can be used at any level of measurement.
Mode = 15 k.y-1
Annual
Disadvantages
•Terminal Statistic
• A given sub-group could make
this measure unrepresentative.
Median
Annual
50th Percentile = n + 1
2
Median = 19.5 k.y-1
Annual
For an even number of scores take

the mean of the middle two:
(19 + 20)
2 = 19.5
Median = 19.5 k.y-1
Annual
Advantages
•Unaffected by extreme scores
•Can be used at all levels above nominal.
Median = 19.5 k.y-1
Annual
Disadvantages
•Only considers order- value ignored.
-Arithmetic Mean
-Harmonic Mean
-Geometric Mean
also.. -f mean
-Truncated mean
-Power mean
-Weighted arithmetic mean
-Chisini mean
-Identric mean, etc, etc…
Mean
Annual
∑X
X= n
Mean = 17.9 k.y-1
Annual
251
(10+11+11+15+15+15+19+20+21+21+22+22+24+25)
X= 14
Mean = 17.9 k.y-1
Annual
Advantages
•Very sensitive measure
•Takes into account all the available information
•Can be combined with means of other groups to give the overall mean.
Mean = 17.9 k.y-1
Annual
Disadvantages
•Very sensitive measure
•Can only be used on interval or ratio data
•Can only be used when scores are symmetrical above and below X.
Distribution
• Often displayed graphically, where:
– X axis = measured variable

– Y axis = frequency.
160
Normal Distribution
140
120
Number of People
100
80
60
40
20
1500 2500 3500 4500 5500
Energy Intake (calories per day)

Para-Normal Distribution?
160
140
120
Number of People
100
80
60
40
20
1500 2500 3500 4500 5500
Paranormal events are phenomena described in any non-scientific

bodies of knowledge, whose existence within these contexts is
described to lie beyond normal experience or scientific explanation.
AKA
160
Normal Distribution -Bell Shaped
140
-Gaussian.
120
Number of People
100
…but first described
80
mathematically by
Abraham De Moivre Carl Friedrich Gauss
60
in 1733… Applied ND in 1809 to
…published 1924! establish the diameter
40 of lunar features
20
1500 2500 3500 4500 5500

Normal Distribution
Characteristics of ND Curve:
•Naturally Occurring
•Symmetrical
160
Normal Distribution
140
Mode
120
Number of People
Median
100
80
Mean
60
40
20
1500 2500 3500 4500 5500

160
Normal Distribution
140
120 Point of
Number of People
100
Inflection: A
point of a curve at
80 which a change in
the direction of
60 68.26% curvature occurs.
40
20
1500 2500 3500 4500 5500

160 Normal Distribution
140
Z = standard score Therefore,

120
for comparison: Average = 3500
Number of People
100 Raw score SD = 1000

versus
80
Group
34.13% 34.13%
60
40
20 2.15% 2.15%
13.59% 13.59%
0
1500 2500 3500 4500 5500

160
Normal Distribution
140
So, if: Therefore,
120
Raw score = 4500 Average = 3500
Number of People
100 SD = 1000
Z = +1
80 Study of SD size
34.13% 34.13% = ‘Kurtosis’
60
40
20 2.15% 2.15%
13.59% 13.59%
0
1500 2500 3500 4500 5500

Normal Distribution
So, if: Therefore,
160
Raw score = 4500 Average = 3500
140 SD = 500
Z = +2
120
Number of People
100 68.26%
80
60
40
20
1500 2500 3500 4500 5500

160 Normal Distribution
So, if: Therefore,
140
Average = 3500
120
Raw score = 4500 SD = 2000
Number of People
100
Z = +0.5
80
60 68.26%
40
20
1500 2500 3500 4500 5500

160 Non-Normal Distribution
Mode
140
Negative Skew
120
Median
Number of People
100
Mean
80
60
40
20
1500 2500 3500 4500 5500

160 Non-Normal Distribution
Mode
140
Positive Skew
120
Median
Number of People
100
Mean
80
60
40
20
1500 2500 3500 4500 5500

Coefficient of Variation (CV)
• Another Measure of Dispersion
• Histograms
• Skewness
• Kurtosis
• Other Descriptive Summary Measures

Measures of Dispersion – Coefficient of
Variation
• Coefficient of variation (CV) measures the spread of a set
of data as a proportion of its mean.
• It is the ratio of the sample standard deviation to the
sample mean
s
CV   100%
x
• It is sometimes expressed as a percentage
• There is an equivalent definition for the coefficient of
variation of a population
Chapel Hill Bend
(A) (B)
Mean 1198.10 298.07
Standard Deviation 191.80 82.08
Coefficient of Variation 0.16 0.28

(CV) (16%) (28%)
Coefficient of Variation (CV)
• It is a dimensionless number that can be used to

compare the amount of variance between populations
with different means
 (x x) 2 n
i
 (x  x)
i
2
s 
2 i 1
s i 1
n 1 n 1
s
CV   100%
x
Measures of Skewness and Kurtosis
• A fundamental task in many statistical analyses is to
characterize the location and variability of a data set
(Measures of central tendency vs. measures of
dispersion)
• Both measures tell us nothing about the shape of the
distribution
• A further characterization of the data includes
skewness and kurtosis
• The histogram is an effective graphical technique for
showing both the skewness and kurtosis of a data set
Histograms
Fig. 3. Histogram of crown width (m) measured for a random

sample (n = 63; mean = 9.3 m; SD = 4.64 m).
Frequency & Distribution
• A histogram is one way to depict a frequency

distribution
• Frequency is the number of times a variable takes on a
particular value
• Note that any variable has a frequency distribution
• e.g. roll a pair of dice several times and record the
resulting values (constrained to being between and 2 and
12), counting the number of times any given value
occurs (the frequency of that value occurring), and take
these all together to form a frequency distribution
Frequency & Distribution
• Frequencies can be absolute (when the frequency

provided is the actual count of the occurrences) or
relative (when they are normalized by dividing the
absolute frequency by the total number of observations
[0, 1])
• Relative frequencies are particularly useful if you want
to compare distributions drawn from two different
sources (i.e. while the numbers of observations of each
source may be different)
Histograms
• We may summarize our data by constructing
histograms, which are vertical bar graphs
• A histogram is used to graphically summarize the
distribution of a data set
• A histogram divides the range of values in a data set
into intervals
• Over each interval is placed a bar whose height
represents the frequency of data values in the interval.
Building a Histogram
• To construct a histogram, the data are first grouped

into categories
• The histogram contains one vertical bar for each
category
• The height of the bar represents the number of
observations in the category (i.e., frequency)
• It is common to note the midpoint of the category on
the horizontal axis
Building a Histogram – Example
• 1. Develop an ungrouped frequency table
– That is, we build a table that counts the number of
occurrences of each variable value from lowest to highest:
TMI Value Ungrouped Freq.
4.16 2
4.17 4
4.18 0
… …
13.71 1
• We could attempt to construct a bar chart from this table, but it
would have too many bars to really be useful
• 2. Construct a grouped frequency table

– Select an appropriate number of classes
Class Frequency Percentage

4.00 - 4.99 120
5.00 - 5.99 807
6.00 - 6.99 1411
7.00 - 7.99 407
8.00 - 8.99 87
9.00 - 9.99 33
10.00 - 10.99 17
11.00 - 11.99 22
12.00 - 12.99 43
13.00 - 13.99 19
• 3. Plot the frequencies of each class

– All that remains is to create the bar graph
Pond Branch TMI Histogram
48
Percent of cells in catchment
44
40
36
32
28
24
20
16 A proxy for
12
8 Soil Moisture
4
0
4 5 6 7 8 9 10 11 12 13 14 15 16
Topographic Moisture Index

Further Moments of the Distribution
• While measures of dispersion are useful for helping

us describe the width of the distribution, they tell us
nothing about the shape of the distribution
Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA:
Macmillan College Publishing Co., p. 91.
Further Moments of the Distribution
• There are further statistics that describe the shape of

the distribution, using formulae that are similar to
those of the mean and variance
• 1st moment - Mean (describes central value)
• 2nd moment - Variance (describes dispersion)
• 3rd moment - Skewness (describes asymmetry)
• 4th moment - Kurtosis (describes peakedness)
Further Moments – Skewness
• Skewness measures the degree of asymmetry exhibited

by the data
n
 (x  x)
i
3
skewness  i 1
3
ns
• If skewness equals zero, the histogram is symmetric
about the mean
• Positive skewness vs negative skewness
Source: http://library.thinkquest.org/10030/3smodsas.htm
• Positive skewness
– There are more observations below the mean than
above it
– When the mean is greater than the median
• Negative skewness
– There are a small number of low observations and
a large number of high ones
– When the median is greater than the mean
Further Moments – Kurtosis
• Kurtosis measures how peaked the histogram is

n
 (x  x)
i
4
kurtosis  i
4
3
ns
• The kurtosis of a normal distribution is 0
• Kurtosis characterizes the relative peakedness or
flatness of a distribution compared to the normal
distribution
• Platykurtic– When the kurtosis < 0, the frequencies
throughout the curve are closer to be equal (i.e., the
curve is more flat and wide)
• Thus, negative kurtosis indicates a relatively flat
distribution
• Leptokurtic– When the kurtosis > 0, there are high
frequencies in only a small part of the curve (i.e, the
curve is more peaked)
• Thus, positive kurtosis indicates a relatively peaked
distribution
platykurtic leptokurtic
Source: http://www.riskglossary.com/link/kurtosis.htm
• Kurtosis is based on the size of a distribution's

tails.
• Negative kurtosis (platykurtic) – distributions with
short tails
• Positive kurtosis (leptokurtic) – distributions with
relatively long tails
Why Do We Need Kurtosis?
• These two distributions have the same variance,

approximately the same skew, but differ markedly in
kurtosis.
Source: http://davidmlane.com/hyperstat/A53638.html
How to Graphically Summarize Data?
• Histograms
• Box plots
Functions of a Histogram
• The function of a histogram is to graphically

summarize the distribution of a data set
• The histogram graphically shows the following:
1. Center (i.e., the location) of the data
2. Spread (i.e., the scale) of the data
3. Skewness of the data
4. Kurtosis of the data
4. Presence of outliers
5. Presence of multiple modes in the data.
Functions of a Histogram
• The histogram can be used to answer the following

questions:
1. What kind of population distribution do the data
come from?
2. Where are the data located?
3. How spread out are the data?
4. Are the data symmetric or skewed?
5. Are there outliers in the data?
Source: http://www.robertluttman.com/vms/Week5/page9.htm (First three)
http://office.geog.uvic.ca/geog226/frLab1.html (Last)
Box Plots
• We can also use a box plot to graphically summarize
a data set
• A box plot represents a graphical summary of what is
sometimes called a “five-number summary” of the
distribution
– Minimum
– Maximum
– 25th percentile 75th
max.
– 75th percentile %-ile
median
– Median 25th
min. %-ile
• Interquartile Range (IQR)
Rogerson, p. 8.
Box Plots
• Example – Consider first 9 Commodore prices ( in
$,000)
6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0
• Arrange these in order of magnitude
3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0
• The median is Q2 = 6.7 (there are 4 values on either
side)
• Q1 = 5.9 (median of the 4 smallest values)
• Q3 = 10.2 (median of the 4 largest values)
• IQR = Q3 – Q1 = 10.2 - 5.9 = 4.3
• Example (ranked)
3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0
• The median is Q1 = 6.7
• Q1 = 5.9 Q3 = 10.2 IQR = Q3 – Q1 = 10.2 - 5.9 = 4.3
Box Plots
Example: Table 1.1 Commuting data (Rogerson, p5)
Ranked commuting times:
5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22,
23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47
25th percentile is represented by observation (30+1)/4=7.75

75th percentile is represented by observation 3(30+1)/4=23.25
25th percentile: 11.75
75th percentile: 26
Interquartile range: 26 – 11.75 = 14.25
Example (Ranked commuting times):
5, 5, 6, 9, 10, 11, 11, 12, 12, 14, 16, 17, 19, 21, 21, 21, 21, 21, 22,
23, 24, 24, 26, 26, 31, 31, 36, 42, 44, 47
25th percentile: 11.75 75th percentile: 26
Interquartile range: 26 – 11.75 = 14.25
Other Descriptive Summary Measures
• Descriptive statistics provide an organization and

summary of a dataset
• A small number of summary measures replaces the
entirety of a dataset
• We’ll briefly talk about other simple descriptive
summary measures
• You're likely already familiar with some simple

descriptive summary measures
– Ratios
– Proportions
– Percentages
– Rates of Change
– Location Quotients
• Ratios –
# of observations in A
=
# of observations in B
e.g., A - 6 overcast, B - 24 mostly cloudy days
• Proportions – Relates one part or category of data to
the entire set of observations, e.g., a box of marbles
that contains 4 yellow, 6 red, 5 blue, and 2 green gives
a yellow proportion of 4/17 or
colorcount = {yellow, red, blue, green}
ai
acount = {4, 6, 5, 2} proportion 
 ai
• Proportions - Sum of all proportions = 1. These are
useful for comparing two sets of data w/different sizes
and category counts, e.g., a different box of marbles
gives a yellow proportion of 2/23, and in order for this
to be a reasonable comparison we need to know the
totals for both samples
• Percentages - Calculated by proportions x 100, e.g.,
2/23 x 100% = 8.696%, use of these should be
restricted to larger samples sizes, perhaps 20+
observations
• Location Quotients - An index of relative concentration

in space, a comparison of a region's share of something
to the total
• Example – Suppose we have a region of 1000 Km2
which we subdivide into three smaller areas of 200, 300,
and 500 km2 (labeled A, B, & C)
• The region has an influenza outbreak with 150 cases in
A, 100 in B, and 350 in C (a total of 600 flu cases):
Proportion of Area Proportion of Cases Location Quotient
A 200/1000=0.2 150/600=0.25 0.25/0.2=1.25
B 300/1000=0.3 100/600=0.17 0.17/0.3 = 0.57
C 500/1000=0.5 350/600=0.58 0.58/0.5=1.17

Quantitative Technique Analysis Notes

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Quantitative Technique Analysis Notes

Caricato da

Copyright:

Formati disponibili

Descriptive Statistics

The farthest most people ever get

– Gender can take the value “male” or “female”

• By Summarizing Information, Descriptive Statistics

Class A--Average IQ Class B--Average IQ

They’re roughly the same!

With a summary descriptive statistic, it is much easier to

For an even number of scores take

• Often displayed graphically, where:

– X axis = measured variable

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

Paranormal events are phenomena described in any non-scientific

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

Z = standard score Therefore,

100 Raw score SD = 1000

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

1500 2500 3500 4500 5500

Energy Intake (calories per day)

• Other Descriptive Summary Measures

Standard Deviation 191.80 82.08

Coefficient of Variation 0.16 0.28

• It is a dimensionless number that can be used to

Fig. 3. Histogram of crown width (m) measured for a random

• A histogram is one way to depict a frequency

• Frequencies can be absolute (when the frequency

• To construct a histogram, the data are first grouped

• 2. Construct a grouped frequency table

Class Frequency Percentage

• 3. Plot the frequencies of each class

Pond Branch TMI Histogram

Topographic Moisture Index

• While measures of dispersion are useful for helping

• There are further statistics that describe the shape of

• Skewness measures the degree of asymmetry exhibited

• Kurtosis measures how peaked the histogram is

• Kurtosis is based on the size of a distribution's

• These two distributions have the same variance,

• The function of a histogram is to graphically

• The histogram can be used to answer the following

Example: Table 1.1 Commuting data (Rogerson, p5)

Ranked commuting times:

25th percentile is represented by observation (30+1)/4=7.75

• Descriptive statistics provide an organization and

• You're likely already familiar with some simple

• Location Quotients - An index of relative concentration

Potrebbero piacerti anche