Descriptive Statistics: Describin Data !ith "umbers # $%R& ' Measures of Variability Variability refers to how dispersed are the data points in a distribution, and how similar or different each data point is from the other data points. There are three common measures of variability: range, variance, and standard deviation. Rane What is it? The range is the distance or difference between the lowest score and the highest score. How to find the range? For any given set of scores, subtract the lowest score from the highest score. For eample, a data set has !" data points #on a $%point scale&: ', (, ', ', (, ', ), (, !, $ The range would be: $ % ! * +. When is it used? Type of ,uestion answered What does this set of scores loo- li-e? How variable are the scores in the distribution? Type of data Variables: .ne #!& continuous variable /easurement levels: 0nterval, ratio What do you need to -now? 1ll the above, plus the range is affected by etreme scores2 recognition of 3433 output. How to report descriptive statistics? 3ee the following eample. ! COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Variance an( Stan(ar( Deviation What are they? The variance is a measure of the amount that a set of data varies about its mean. Variance is a -ey concept, and it forms the heart of all statistics. 3tandard deviation is the undoing of the s,uaring that we did to find the variance. The standard deviation therefore is really a sort of 5average distance5 of each point from the mean.. very important concept in normal distribution. The standard deviation is the s,uare root of the variance How to find the variance and standard deviation? For a given population or universe of scores, the formula for variance is 6#first score % mean score& 7 8 #second score % mean score& 7 8 9 8 #last score % mean score& 7 : ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; number of scores 0f the set of scores is a sample, then the formula for variance is 6#first score % mean score& 7 8 #second score % mean score& 7 8 9 8 #last score % mean score& 7 : ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; number of scores ) 1 The standard deviation is the s,uare root of the variance. When are they used? Type of ,uestion answered What does this set of scores loo- li-e? How variable are the scores in the distribution? Type of data Variables: .ne #!& continuous variable /easurement levels: 0nterval, ratio What do you need to -now? 1ll the above, plus variance and standard deviation are meaningful only for continuous variables, measured with interval or ratio scales2 7 COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability recognition of 3433 output. How to report descriptive statistics? 3ee the following eample. 4rocedures in 3433 1naly<e = >escriptive 3tatistics = Fre,uencies 9 3elect variable. 3elect the appropriate chart. ?lic- @3tatisticsA. ?hec- @/eanA, @/edianA @/odeA. ?hec- @/inimumA, @/aimumA, @3td. deviationA, @VarianceA, @BangeA. ?lic- @4aste.A Co to the synta file. Highlight the appropriate section, and clic- D. E COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Result (example from lect13_4 data set # also show bar chart eample& Statistics EXAMSCOR N Valid 20 Missing 0 Mean 79.2000 Median 80.0000 Mode 80.00 Std. Deviation 7.34!0 Va"ian#e 3.9789 S$e%ness .8&& Std. E""o" o' S$e%ness .&2 ()"tosis 2.498 Std. E""o" o' ()"tosis .992 Range 34.00 Mini*)* !!.00 Ma+i*)* &00.00 EXAMSCOR &00.0 9.0 90.0 8.0 80.0 7.0 70.0 !.0 EXAMSCOR , " e - ) e n # . 8 ! 4 2 0 Std. Dev / 7.3 Mean / 79.2 N / 20.00 Beport 0n this group of 7" students, the mean score was F$.7" #standard deviation * F.E'&. The range was E(, with the highest score * !"", and the lowest score * )). ( COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Describin Data !ith $ictures There are several ways to describe data. .ne of way is to use pictures. These techni,ues include fre,uency distributions, histograms and bar graphs. We will focus on histograms and bar graphs. *istorams What are they? 1 histogram shows in a picture form how many times a given score appears in a data set. There are two main aes on the histogram. The hori<ontal ais #the G ais& is where the scores are represented. Typically the scores are grouped into score intervals. Hach score interval is represented by one rectangular bar, and the mid%point of the score interval is highlighted. The rectangular bars touch each other, because the data is continuous. The vertical ais #the I ais& indicates the fre,uency of those scores occurring. 1 tall bar indicates a high fre,uency of occurrence, meaning that the score occurs many times. 1 short bar indicates a low fre,uency of occurrence, meaning that the score occurs few times. Hample 90.0 87. 8.0 82. 80.0 77. 7.0 72. 70.0 !7. Ra% s#o"e in #o)"se ," e- )e n# . &0 8 ! 4 2 0 Std. Dev / .20 Mean / 84.& N / 20.00 Beport The raw scores of the 7" students ranged from )).+$ to $!."!, with a mean raw score of +(.!. 1 histogram depicting the continuum of scores reveals that the pea- score interval is between +E.F' and ++.F'. The distribution of the scores is negatively s-ewed, with most of the scores on the higher ranges. ' COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability +ar Charts What are they? 1 bar chart shows in a picture form how many times a given score appears in a data set. Jar charts are almost identical to histograms, ecept that they deal with categorical data, rather than continuous data. The hori<ontal ais #the G ais& is where the categories are represented. Hach category is represented by one bar. The vertical ais #the I ais& indicates the fre,uency of membership in a category. 1 tall bar indicates a high fre,uency of membership, meaning that the category has many members. 1 short bar indicates a low fre,uency of membership, meaning that the category has few members. Hample Ra% g"ade 'o" #o)"se Ra% g"ade 'o" #o)"se D C C01l)s 2 *in)s 2 201l)s A *in)s , " e - ) e n # . &2 &0 8 ! 4 2 0 Beport The above bar charts show the distribution of the raw grades earned by the students. .ne student earned a grade of 1%2 four students earned a grade of J82 ten students earned a grade of J2 two students earned a grade of J%2 one student earned a grade of ?82 one student earned a grade of ?2 one student earned a grade of >. ) COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability "ormal Distribution an( ,ts Measures What are they? When a set of data is collected, the data forms a distribution, meaning that each data point has a particular value along some dimension. Together the data points can form a visual shape of the distribution, as depicted by the histogram and bar chart. The shape of the distribution can be normal, loo-ing li-e a bell. 0t means that most of the data points are clustered near one set of middle scores, and that the data points gradually and symmetrically decrease in fre,uency in both directions away from the middle. >istributions can be non%normal. .ne of ways distributions can be non%normal is by being s-ewed. 3-ew refers to the symmetry of a distributionKs tails. 0f one of the distributionKs tail is stretched out at one end, and compressed at another end, then the distribution is s-ewed. Lurtosis is also used to chec- the normality of a data set #see net page for s-ewness and -urtosis& ?f. 3tandard normal distribution F COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Mote on s-ewness !& 3-ewness characteri<es the degree of asymmetry of a distribution around its mean. 7& 4ositive s-ewness indicates a distribution with an asymmetric tail etending towards more positive values #right s-ewed& E& Megative s-ewness indicates a distribution with an asymmetric tail etending towards more negative values #left s-ewed& (& /ost often, median is used as a measure of central tendency when data sets are s-ewed. '& Mormal distributions will have a s-ewness value of approimately <ero. )& Typically, the s-ewness value will range from negative E to positive E. F& 1s a rule of thumb, if s-ewness is more than 8N% E #more accurately, if the absolute value of s-ewness is more than twice the standard error of s-ewness 6ses:& , consider using median rather than #or along with& mean. Jut, this is a rule of thumb. Ose of particular statistics is at the Pudgment of a researcher. +& Hample Q sometimes, the eistence of super rich people li-e Jill Cates ma-es the distribution highly positively s-ewed, which ma-es median a better choice for describing the data than mean. + COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Mote on -urtosis !& Lurtosis characteri<es the relative pea-edness or flatness of a distribution #i.e., how narrow or broad the distribution is& compared to the normal distribution. 7& 4ositive -urtosis indicates a relatively pea-ed distribution. #or relatively wider tail than normal distribution& E& Megative -urtosis indicates a relatively flat distribution #or relatively less tail than normal distribution& (& Mormal distributions produce a -urtosis statistic of about " #in fact, itKs E, but people standardi<ed -urtosis by subtracting E. 3o, it becomes "& '& Values of 7 standard errors of -urtosis #se-& or more #regardless of sign& #more accurately, if the absolute value of -urtosis is more than twice the standard error of -urtosis 6se-:& probably differ from the normal distribution to a significant degree. )& Repto-urtic: very highly pea-ed #highly positive -urtosis score& 2 4layty-urtic: somewhat flattened #negative -urtosis& Cf- Sho! ho! .urtosis varies accor(in to (ata manipulation /use sample normal (ata0- $ COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Hample of Mormal distribution #lecture !(;7 dataset& Statistics VAR0000& N Valid !9 Missing 0 Mean 4.9982 Std. Deviation &.72849 Va"ian#e 2.987!7 S$e%ness .003 Std. E""o" o' S$e%ness .&02 ()"tosis .00& Std. E""o" o' ()"tosis .204 VAR0000& 9.0 8.0 7.0 !.0 .0 4.0 3.0 2.0 &.0 VAR0000& , " e - ) e n # . 200 &00 0 Std. Dev / &.73 Mean / .0 N / !9.00 Beport 1nalysis of the distribution shows that the data are almost normally distributed #s-ewnees * . ""E2 Lurtosis * .""!&. !" COMM 301: Empirical Research in Communication Descriptive Statistics: Measures of Variability Hample of Mon%normal distribution Statistics EXAMSCOR N Valid &42 Mean 9!.93!! Std. Deviation .4943& Va"ian#e 30.&8744 S$e%ness 03.&08 Std. E""o" o' S$e%ness .203 EXAMSCOR &00.0 9.0 90.0 8.0 80.0 7.0 70.0 80 !0 40 20 0 Std. Dev / .49 Mean / 9!.9 N / &42.00 Beport 1nalysis of the distribution shows that the data are highly negatively s-ewed #s-ewnees * % E.!!&. This suggests the non%normality of the data distribution. !!