Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Analyzing data is a process of inspecting, cleaning, transforming and modeling data with the
goal of discovering useful information, suggesting conclusions and supporting decision
making(Mugenda,1999).Data can be analyzed using different approaches such as descriptive and
inferential analysis. The distinction between descriptive and inferential data analysis is mainly
based on the manner they analyze data from a population. Descriptive statistics is a function of
the sample data that is interested in describing some feature of the data. Classic descriptive
statistics include mean, min, max, standard deviation, median, skew, and kurtosis while
inferential statistics are a function of the sample data that assists you to draw an inference
regarding a hypothesis about a population parameter. Classic inferential statistics include z, t, 2,
F-ratio, etc. When it comes to statistic analysis, there are two classifications: descriptive statistics
and inferential statistics. In a nutshell, descriptive statistics intend to describe a big hunk of data
with summary charts and tables, but do not attempt to draw conclusions about the population
from which the sample was taken. You are simply summarizing the data you have with charts
and graphskind of like telling someone the key points of a book (executive summary) as
opposed to just handing them a thick book (raw data) (Mugenda,1999).Conversely, with
inferential statistics, you are testing a hypothesis and drawing conclusions about a population,
based on your sample. In this case, you are going to run into fancy sounding concepts like
ANOVA, T-Test, Chi-Squared, confidence interval, regression, etc., but well save those for
another day.
Descriptive Statistics
Descriptive data analysis gives information that describes the data obtained from a population
designated for gathering data from. Researchers use descriptive statistics techniques to
summarize statistical data in a way that easily describes it: using only a few indices or statistics.
The type of indices used depends on the type of variables in the study and the scale to
measurement used. Descriptive analysis techniques are used in the analysis of age, race, social
class or even academic purposes like grading students examinations. Descriptive statistics is
divided into three; measures of central tendency, measurers of variability and the frequency
distribution.
Measures of central tendency
They are used in social sciences to give expected summary statistics of variables being studied
(Bird,1995).A researcher obtains a single score that represents the general magnitude of scores in
a distribution by providing information about the score at or near the middle of the distribution
(Bird,1995).Measurers of central tendency include means, mode and median.
Using measures of central tendency to;
-
Score (x)
60-69
50-59
40-49
30-39
R.C.L
59.5-69.5
49.5-49.5
39.5-49.5
29.5-39.5
Midpoint Xm
64.5
54.5
44.5
34.5
F
5
15
21
6
Cfb
47
42
27
6
=47
XmF
322.5
817.5
934.5
207
XmF=2281.50
The researcher has to compute the measures of central tendency in order to describe the data and
also compare scores.
1. Mode- the most frequent score in a data set the highest frequency in the given distribution
15-21 in the class
40+49
=44.5
Midpoint =
2
Mode= 44.5
The mode is best used by a researcher when analyzing qualitative data because it shows
the score that is most likely to occur
2. Median it is the score that divides ranked data into two equal parts
N
cfb
2
Median = l+
cl
fn
( )
Where
N= total number of scores
Cfb= cumulative frequency below the class containing the median score
Fn = frequency within (frequency for the class continuing the median score)
Cl= class interval
L= lower real limits of the class containing the median score
In the given literature test frequency distribution
47
6
2
Median = 39.5+
10
21
( )
10
( 23.56
21 )
39.5+
10
( 17.5
21 )
39.5+
39.5+ ( 0.833 ) 10
39.5+8.33
47.83
Median is best used when a researcher has outliers (scores that are either unusually too small or
too large in a data set). For example, if we had a score of 2, the median would not have been
highly affected.
3. Mean the arithmetic average of all the scores in data set
x f
X = m
x =
Mean
of Xf
of f
2281.50
=48.54
47
The researcher has to describe the shape of the distribution of the scores by drawing comparisons
Mean = 48.54
Median = 47.83
Mode= 44.5
The mean of the dependent variable differs significantly among the levels of program type.
However, we do not know if the difference is between only two of the levels or all three of the
levels. (The F test for the Model is the same as the F test for prog because prog was the only
variable entered into the model. If other variables had also been entered, the F test for the
Model would have been different from prog.) To see the mean of write for each level of
program type, means tables = write by prog.
From this we can see that the students in the academic program have the highest mean
writing score, while students in the vocational program have the lowest.
Analysis of Covariance
Analysis of covariance (ANCOVA) is a statistical technique that blends analysis of variance and
linear regression analysis. It is a more sophisticated method of testing the significance of
differences among group means because it adjusts scores on the dependent variable to remove
inclusion of covariates can also increase statistical power because it accounts for some of the
variability (Rossman, 2014).
Assumptions
The model assumes that the data in the two groups are well described by straight lines that have
the same slope. An example of ANCOVA is a pretest-posttest randomized experimental design,
in which pretest scores are statistically controlled. In this case, the dependent variable is the
posttest scores, the independent variable is the experimental/ comparison group status, and the
covariate is the pretest scores (Rossman, 2014).
With ANCOVA, the F-ratio statistic is used to determine the statistical significance (p .05) of
differences among group means. When a researcher reports the results from analysis of
covariance (ANCOVA), he or she needs to include the following information: verification of
parametric assumptions; verification that covariate(s) measured before treatment; verification of
reliability of the covariate(s); verification that covariates are not too strongly correlated with one
another; verification of linearity; verification of homogeneity of regression slopes; dependent
variable scores; independent variable, levels; covariate(s); statistical data: significance, F-ratio
scores, probability, means, and effect size (partial eta squared) (Rossman,2014).
(also known as a predictor) or both. Upper-level undergraduate courses and graduate courses in
statistics teach multivariate statistical analysis. This type of analysis is desirable because
researchers often hypothesize that a given outcome of interest is affected or influenced by more
than one thing (Huberman, 1994).
Example of analysis using Multivariate Statistics
At the end of the course, students were asked to rate each different assessment strategy in terms
of difficulty, appropriateness to their needs while learning multivariate statistics, and how well
they felt they learned using that kind of assessment. Students were also asked to rank the four
assessment strategies in order of preference ( 1 = most preferred, 4 = least preferred). The survey
was anonymous and was administered during the last class of the semester by a student volunteer
while the instructor was not present. The purpose was explained as an opportunity for the
instructor to understand how to better structure course assessments in the future. Upon agreeing
to participate, the students were given the survey. A four-point scale was used with response
categories as indicated in the table below.
The results presented are based on 14 students who completed the course assessment (three
students were absent or chose not to participate). Students were also asked for additional
comments regarding their likes and dislikes about each assessment strategy, suggestions for how
to improve the use of that strategy, and any other observations or comments they might have. In
terms of establishing a course grade, the two open ended assignments (midterm and final) were
weighted 25% each; the scores on the other assignments (five altogether) were averaged and
worth 50% of the total. The results for each assessment strategy are provided in the table that
follows. The modal response for each technique is indicated in bold print. Other interesting
results to be discussed below are presented in bold print as well.
Table 1. Student Assessment Ratings
Difficulty ratings: frequency (and percent).
Response Categories
Structured Computer
Assignments
Open-Ended
Assignments
Article
Analysis
Annotating
Output
1 (7.1%)
0 (0.0%)
1 (7.1%)
0 (0.0%)
5 (35.7%)
2 (14.3%)
4 (28.6%)
8 (57.1%)
3. Difficult but
challenging
7 (50%)
12 (85.7%)
7 (50%)
6 (42.9%)
4. Too difficult
1 (7.1%)
0 (0.0%)
2 (14.3%)
0 (0.0%)
Structured Computer
Assignments
Open-Ended
Assignments
Article
Analysis
Annotating
Output
0 (0.0%)
0 (0.0%)
1 (7.1%)
0 (0.0%)
2 (14.3%)
1 (7.1%)
5 (35.7%)
5 (35.7%)
11 (78.6%)
7 (50.0%)
5 (35.7%)
9 (64.3%)
1 (7.1%)
6 (42.9%)
3 (21.4%)
0 (0.0%)
Appropriateness ratings.
Response Categories
Structured
Computer
Assignments
Open-Ended
Assignments
Article
Analysis*
Annotating
Output
0 (0.0%)
0 (0.0%)
1 (7.1%)
0 (0.0%)
3 (21.4%)
1 (7.1%)
6 (42.9%)
4 (28.6%)
3. Learned enough to be
comfortable with the topic
8 (57.1%)
7 (50.0%)
4 (28.6%)
8 (57.1%)
3 (21.4%)
6 (42.9%)
2 (14.3%)
2 (14.3%)
Response Categories
* one student did not answer this question for the article analysis.
Preference Scores
Students were asked to indicate their order of preference among the four assessment strategies,
with 1 = most preferred form of assessment, and 4 = least preferred. Average preference ratings
indicated that the most preferred forms of assessment were the structured computer assignments
(mean = 2.00), followed by annotating the output (mean = 2.31), use of open-ended assignments
(mean = 2.38), and article analysis (mean = 3.15). Yet in terms of difficulty level,
appropriateness, and learning ratings, the open-ended assignments actually received the better
ratings overall. Approximately 86% of the students found the open-ended assignments
challenging (response category 3); 93% found the open-ended assignments appropriate for many
or all of their needs (response categories 3 and 4); and 93% reported that through the open-ended
assignments they learned either enough to be comfortable with the topic or more than they would
have thought (response categories 3 and 4).
Chi- Square
The Chi Square statistic is commonly used for testing relationships on categorical variables. The
null hypothesis is that no relationship exists on these categorical variables in the population; they
are independent. The Chi-Square statistic is most commonly used to evaluate Tests of
Independence when using a cross tabulation (also known as a bivariate table). Cross tabulation
presents the distributions of two categorical variables simultaneously, with the intersections of
the categories of the variables appearing in the cells of the table (Peshkin, 1992). The Test of
Independence assesses whether an association exists between the two variables by carefully
examining the pattern of responses in the cells; calculating the Chi-Square statistic and
comparing it against a critical value from the Chi-Square distribution allows the researcher to
assess whether the association seen between the variables in a particular sample is likely to
represent an actual relationship between those variables in the population.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:
the first row labeled Pearson Chi-Square. This statistic can be evaluated by comparing the actual
value against a critical value found in a Chi-Square distribution (where degrees of freedom is
calculated as # of rows 1 x # of columns 1), but it is easier to simply examine the p-value
provided by SPSS. To make a conclusion about the hypothesis with 95% confidence, the value
labeled Asymp. Sig. (which is the p-value of the Chi-Square statistic) should be less than .05
(which is the alpha level associated with a 95% confidence level). Is the p-value (labeled Asymp.
Sig.) < .05? If so, conclude that the variables are dependent in the population and that there is a
statistical relationship between the categorical variables.
In this example, there is an association between fundamentalism and views on teaching sex
education in public schools. While 17.2% of fundamentalists oppose teaching sex education,
only 6.5% of liberals are opposed. The p-value indicates that these variables are dependent in the
population and that there is a statistical relationship between the categorical variables.
Data Analysis through Computer Software Programmes
Multitude of software programs designed for use with quantitative data is available today.
Quantitative research, predominantly statistical analysis, is still common in the social sciences
and such software is frequently used among social science researchers. Most notably, statistical
packages find applications in the social sciences (Peshkin, 1992). The main demand made of
such packages in social science research is that they be comprehensive, flexible, and can be used
with almost any type of file. A useful statistical software tool can generate tabulated reports,
charts, and plots of distributions and trends, as well as generate descriptive statistics and more
complex statistical analyses. Lastly, a user interface that makes it very easy and intuitive for all
levels of users is a must.
Conclusion
In conclusion data analysis is the process of systematically applying statistical and/or logical
techniques to evaluate data. There are many well-developed methods available for conceptually
or statistically analyzing the different kinds of data that can be gathered. Frequency data and chisquare analysis can supplement the narrative interpretation of such comments. For the analysis of
quantitative data, a variety of statistical tests are available, ranging from the simple (t-tests) to
the more complex Researchers performing analysis on either quantitative or qualitative analyses
should be aware of challenges to reliability and validity.
References
Bird, K. (Eds.). (1995). Computer-aided qualitative data analysis: Theory, methods and practice.
Sage.
Huberman, A. M. (1994). Qualitative data analysis. Sage publications. Thousand Oaks, CA.
Mugenda, O. M. (1999). Research methods: Quantitative and qualitative approaches. African
Centre for Technology Studies.
Peshkin, A. (1992). Becoming qualitative researchers: An introduction (p. 6).
NY: Longman.
Rossman, G. B. (2014). Designing qualitative research. Sage publications.
White Plains,