Sei sulla pagina 1di 5

Pie Charts Show distribution of a categorical

Bar Graph show distribution of a categorical


Histograms for this graph we group values into groups of a set size, take the range of the data set and
pick equal size width for the classes. The size should effectively show distribution when graphing, put
the measured variable on the x-axis
Stem plots are useful for smaller data sets, start with lowest value at the top and work downwards
stem plot preserves the actual values of the observations, histograms dont
Time plot shows the value of a variable (y) vs the time (x) it was recorded
histograms and time plots give different kind of data, time plots gives time series data that shows that
change in time were as histograms displays cross-sectional data that shows change in values compared
to different individuals
Shape, Center, Spread, Outliers
Center is described in a few ways, 2 of which are
1) by showing the midpoint, where roughly half are above and below.
2) state the highest and lowest values in the range
Measuring Center
Mean add all values together then divide by the number of observations (n)

Median the formal measure of midpoint, order all observations then use
M = (n+1)/2 which gives the location of M in the list of ordered pairs

Mean is not resistant measure of center


Median is resistant measure of center
The Quartiles Q1 and Q3
Finding the Quartiles:
1) find M
2) Q1 is the median of the first set of observations of M of all observations
3) Q3 is the median of the rest of the observations
The 5 Number system
Minimum-----Q1-----M-----Q3 -----Maximum

Inter-quartile Range IQR is resistant measure of spread


IQR = Q1 - Q3
IQR is useful for finding outliers
The 1.5 x IQR rule of outliers
if a observation falls 1.5 x IQR above Q3 or below Q1 , it is a suspected outlier

Measuring Spread
Standard Deviation s
Variance s is the average of squares of the deviations of the observations from their mean

Important about standard deviations


- s measures spread about the mean and should be used only when the mean is
chosen as the measure of center.
- s is always zero or greater than zero.
- As the observations become more spread out about their mean, s gets larger.
- s is not resistant. A few outliers can make s very large.
The 5 Number summary is better for skewed or many outlier observations
while mean and standard deviation ( x , s) are better for the rest (symmetric distribution)
Density Curve
has exactly 1 underneath it
describes the overall pattern of the distribution
often a good description of the overall pattern of distribution. Outliers are not displayed
though
Median and Mean of a density Curve

density curve is an idealization of the actual data, when computing the mean and standard deviation we
use different symbols to distinguish them
mean of a density curve is
SD of a density curve as
the standard deviation controls the spread, the larger the is, the larger the spread
Normal distribution N( ,)

The 68 95 99.7 Rule


68% of the observations fall within of the mean
95% of the observation fall within 2 of the mean
99.7% of the observation fall within 3 of the mean
Standardizing and Z scores
The standardize value of x is

this is known as the Z score


Z-score tells us how many standard deviations from the mean the observation falls on
positive z-score means larger than the
negative z-score means smaller than the
Standard normal Distribution is the normal distribution N(0,1)
N( ,) so = 0 = 1
Cumulative proportion for a value x in a distribution are the proportion of observations in the
distribution that are less than or equal to x

Put values into z-scores then use table to find the portion to the left of the value
To find normal proportions by table
draw picture of curve, proportion needed
standardize
use table
To find normal proportion by percentage and table
draw picture of curve, proportion needed
use table
nu-standardize using x = + (table value)

Potrebbero piacerti anche