Skew

LOCATING VARIABLE VALUES, DESCRIBING
DISTRIBUTIONS, AND MEASURES OF AVERAGES

AND VARIATIONS
PART 1: PERCENTILES
* the outcome or score below which a given percentage of

the distribution falls
( pi )( N ) − c p
Pi = Lp+ (Wi)
fp
Pi = score of the ith percentile
Lp = true lower limit of interval containing the ith percentile

pi = ith percentile written as a proportion
N = total number of observations
cp = cumulative frequency up to but not including the
interval containing Pi
fp = frequency in the interval containing the ith percentile
Wi = width of the interval containing Pi; Up – Lp where
Up and Lp are the upper and lower true limits of
the interval containing Pi
PART 2: MEASURES OF CENTRAL
TENDENCY OR AVERAGE
Central tendency or average is a value that describes the typical
outcome of a distribution of scores (or the typical value of a variable)
1. MODE
• For both discrete and continuous variables
• Value or category of the variable which has the largest frequency
2. MEDIAN
• For continuous variables
• Ordinarily defined as the value of the variable that divides an
orderable distribution of values in a variable into two equal parts:
those above and then those below the median
• Textbooks say that the determination of the median would depend
on whether the data is ungrouped or grouped (into intervals)
FOR UNGROUPED DATA
• For odd distribution, the median is the middle value when the
distribution is ordered from the lowest to the highest
• For even distribution, the median is halfway between the two
middle values. In this instance, the median is the sum of the two
middle values divided by two
FOR GROUPED DATA
• Some stat textbooks recommend the 50th percentile as the median.
However, Knoke and Bohrnstedt recommend that the median for
grouped data is the value of that category in which cumulative
percentage equals 50.0%. In contrast, I (AB) assert that given
today’s computers, the notion of medians for grouped data is not
relevant because we can easily input raw data using computers and
get the median based on the ungrouped variable values
3. MEAN
• This is the popular notion of an average but which is only
appropriate for continuous data. The mean is obtained by adding
all the values of the variable and dividing by the number of cases,
N or n. Here N is for the population and n is for the sample.
Xi Xi
X = ∑N
i =i or X = ∑ in= i
N n
PART 3: ADVANTAGES & DISADVANTAGES
OF EACH MEASURE OF AVERAGE
MEAN
ADVANTAGES DISADVANTAGES
• used for interval or ratio data • Cannot be used on open ended
• affected by every score in a intervals or incomplete
distribution yet it is a stable enumeration (in this case, mode or
measure of central tendency (when median is used)
we draw several samples from a • Fluctuation in one score can have a
population big impact if the distribution is
• amenable to advanced math or small (affected by extreme scores)
statistical procedures
MEDIAN
• can be used for ordinal data • amenable to only a few math
• unaffected by extreme scores (but operations
affected by the size of the sample or • Less stable than the mean
population)
• can be useful for incomplete
enumeration
MODE
• useful for nominal data • can be drastically affected by a
• locates highest concentration of single value or is the least stable
scores measure
• quickest estimate • cannot be used in math operations
• can be useful in an incomplete
enumeration
PART 4: SKEW OF DISTRIBUTION
1. For continuous variables or ratio data, if the mean, median, and mode
coincide, the distribution is “normally” distributed and symmetric. If they
do not coincide, the distribution is negatively or positively skewed: there
are more values of a variable with less occurrence or observations that
result into a tail (with the variable values on the x-axis and the frequency
of occurrence on the y-axis).
2. When the tail is at right, we have a positively skewed distribution. A

negatively skewed distribution is when the tail of the distribution is at
left. A positively skewed distribution has more categories above the
median than below but these have low frequencies, observations, or
occurrence. Meanwhile, a negatively skewed has more categories below
the median but the categories or values have low frequencies,
observations, or occurrence. A skewed distribution is asymmetric.
3. NEGATIVELY SKEWED
frequencies
mdn
mean
mode
4. POSITIVELY SKEWED
frequencies
mode mean
mdn
PART 5: MEASURES OF VARIATION
A more complete description of distribution must take account
of variation or a description of how close the values of a variable
distribution relative to the central tendency or mean.
1. RANGE
This is the difference between the largest and smallest outcomes

(or values of a variable) in a distribution. However, this material
recommend that simply stating the lowest and highest values may be
more descriptive of the range.
2. AVERAGE DEVIATION (AD) ABOUT THE MEAN
AD = ∑ d i / N
3. VARIANCE
2
( X i − X )
SAMPLE VARIANCE: s 2X = ∑ ni= 1
n−1
2 N ( X i − X )2
POPULATION VARIANCE: σ X = ∑ i = 1
N
4. STANDARD DEVIATION
SAMPLE STANDARD DEVIATION: s 2X
POPULATION STANDARD DEVIATION: σ 2X

PART 6: Z-SCORES
Supposedly, Z-scores are used to describe a distribution,

especially if 2 values from two different populations or
distributions are compared using the same scale. However,
we can also conceive the Z-score as a measure of distance
from the mean done in the manner that would allow for
comparisons across distribution.
To obtain the Z score of a variable i:
(Xi − X)
Zi =
sx

Skew

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Skew

Caricato da

Copyright:

Formati disponibili

LOCATING VARIABLE VALUES, DESCRIBING

DISTRIBUTIONS, AND MEASURES OF AVERAGES

* the outcome or score below which a given percentage of

Pi = score of the ith percentile

Lp = true lower limit of interval containing the ith percentile

2. When the tail is at right, we have a positively skewed distribution. A

This is the difference between the largest and smallest outcomes

2. AVERAGE DEVIATION (AD) ABOUT THE MEAN

SAMPLE STANDARD DEVIATION: s 2X

POPULATION STANDARD DEVIATION: σ 2X

Supposedly, Z-scores are used to describe a distribution,

To obtain the Z score of a variable i:

Potrebbero piacerti anche