Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Module B2 Session 13
SADC Course in Statistics
Learning Objectives
students should be able to Construct a dot plot for a numeric variable
split by a categorical variable
Apply EDA concepts to a large dataset Explain the use of Excels pivot tables
and filters, in the EDA process
Relate EDA
to the principles of official statistics .
data 5.3 5.4 6.0 .. 11.1 11.9 Stem and leaf plot Stacked dot plot
Even if automated, that is too many! The essence of a stem and leaf plot
is to look at all the possible values
Some results
10
11
So what should be done before analysis? First look further at the data Excel can help it can drill down to examine individual records The concept:
Use the table to look for oddities Then examine them in more detail
12
4 of these values are from the same village so same enumerator To put your footer here go to View > Header and Footer
13
14
Results
Did enumerators have different interpretations
of the precision required in the percentages This needs further exploration and the analysis needs to take account of this
To put your footer here go to View > Header and Footer 15
Swaziland data
apply the concepts checking factors as well as numeric columns
18
Exploratory graphics
need to help the analyst and data checker see dot plots on next slide
Outliers (typing errors) are clear, but only because of the 2nd variable They are not outliers overall
To put your footer here go to View > Header and Footer 20
22
23
Missing also
Odd crop areas were ALL associated with odd codes for the column PRESENCE
It was found to be a data transfer problem with one byte missing in these records
To put your footer here go to View > Header and Footer 26
29
Now you can organise the data for analysis And then do an exploratory analysis
We show next how the analysis is easy IF your objectives are clear
30