Sei sulla pagina 1di 16

Introduction

Analyzing data is a process of inspecting, cleaning, transforming and modeling data with the
goal of discovering useful information, suggesting conclusions and supporting decision
making(Mugenda,1999).Data can be analyzed using different approaches such as descriptive and
inferential analysis. The distinction between descriptive and inferential data analysis is mainly
based on the manner they analyze data from a population. Descriptive statistics is a function of
the sample data that is interested in describing some feature of the data. Classic descriptive
statistics include mean, min, max, standard deviation, median, skew, and kurtosis while
inferential statistics are a function of the sample data that assists you to draw an inference
regarding a hypothesis about a population parameter. Classic inferential statistics include z, t, 2,
F-ratio, etc. When it comes to statistic analysis, there are two classifications: descriptive statistics
and inferential statistics. In a nutshell, descriptive statistics intend to describe a big hunk of data
with summary charts and tables, but do not attempt to draw conclusions about the population
from which the sample was taken. You are simply summarizing the data you have with charts
and graphskind of like telling someone the key points of a book (executive summary) as
opposed to just handing them a thick book (raw data) (Mugenda,1999).Conversely, with
inferential statistics, you are testing a hypothesis and drawing conclusions about a population,
based on your sample. In this case, you are going to run into fancy sounding concepts like
ANOVA, T-Test, Chi-Squared, confidence interval, regression, etc., but well save those for
another day.

Descriptive Statistics
Descriptive data analysis gives information that describes the data obtained from a population
designated for gathering data from. Researchers use descriptive statistics techniques to
summarize statistical data in a way that easily describes it: using only a few indices or statistics.
The type of indices used depends on the type of variables in the study and the scale to
measurement used. Descriptive analysis techniques are used in the analysis of age, race, social
class or even academic purposes like grading students examinations. Descriptive statistics is
divided into three; measures of central tendency, measurers of variability and the frequency
distribution.
Measures of central tendency
They are used in social sciences to give expected summary statistics of variables being studied
(Bird,1995).A researcher obtains a single score that represents the general magnitude of scores in
a distribution by providing information about the score at or near the middle of the distribution
(Bird,1995).Measurers of central tendency include means, mode and median.
Using measures of central tendency to;
-

Describe grouped data


Compare two or more sets of data.

Example of analysis using Measure of Central Tendency


Given that a researcher, has obtained the following distribution representing scores obtained by
27 form three students in a literature tests;

Score (x)
60-69
50-59
40-49
30-39

R.C.L
59.5-69.5
49.5-49.5
39.5-49.5
29.5-39.5

Midpoint Xm
64.5
54.5
44.5
34.5

F
5
15
21
6

Cfb
47
42
27
6

=47

XmF
322.5
817.5
934.5
207

XmF=2281.50

The researcher has to compute the measures of central tendency in order to describe the data and
also compare scores.
1. Mode- the most frequent score in a data set the highest frequency in the given distribution
15-21 in the class
40+49
=44.5
Midpoint =
2
Mode= 44.5
The mode is best used by a researcher when analyzing qualitative data because it shows
the score that is most likely to occur
2. Median it is the score that divides ranked data into two equal parts
N
cfb
2
Median = l+
cl
fn

( )

Where
N= total number of scores
Cfb= cumulative frequency below the class containing the median score
Fn = frequency within (frequency for the class continuing the median score)
Cl= class interval
L= lower real limits of the class containing the median score
In the given literature test frequency distribution

47
6
2
Median = 39.5+
10
21

( )

10
( 23.56
21 )

39.5+

10
( 17.5
21 )

39.5+

39.5+ ( 0.833 ) 10

39.5+8.33

47.83

Median is best used when a researcher has outliers (scores that are either unusually too small or
too large in a data set). For example, if we had a score of 2, the median would not have been
highly affected.
3. Mean the arithmetic average of all the scores in data set
x f
X = m

x =

Mean

of Xf
of f

2281.50
=48.54
47

The researcher has to describe the shape of the distribution of the scores by drawing comparisons

Mean = 48.54

Median = 47.83

Mode= 44.5

Mean > Median> Mode


48.54> 47.83>44.5
This is a positively skewed distribution. The mean is the largest measure of central tendency
while the mode is the least. In this case majority of the students got low marks while few
students got high marks. Since this distributing is skewed, the median is the best measure of
central tendency to use. And also, the data was grouped according to class intervals hence the
median it most preferred (Bird, 1995).
Analysis of variance (ANOVA)
(ANOVA) is used when you have a categorical independent variable (with two or more
categories) and a normally distributed interval dependent variable and you wish to test for
differences in the means of the dependent variable broken down by the levels of the
independent variable.

Example of analysis using Analysis of variance (ANOVA)


For example, using the hsb2 data file, say we wish to test whether the mean of write differs
between the three program types (prog). The command for this test would be:oneway write by
prog.

The mean of the dependent variable differs significantly among the levels of program type.
However, we do not know if the difference is between only two of the levels or all three of the
levels. (The F test for the Model is the same as the F test for prog because prog was the only

variable entered into the model. If other variables had also been entered, the F test for the
Model would have been different from prog.) To see the mean of write for each level of
program type, means tables = write by prog.

From this we can see that the students in the academic program have the highest mean
writing score, while students in the vocational program have the lowest.

Analysis of Covariance
Analysis of covariance (ANCOVA) is a statistical technique that blends analysis of variance and
linear regression analysis. It is a more sophisticated method of testing the significance of
differences among group means because it adjusts scores on the dependent variable to remove

the effect of confounding variables. ANCOVA is based on inclusion of additional variables


(known as covariates) into the model that may be influencing scores on the dependent variable.
(Covariance simply means the degree to which two variables vary together the dependent
variable covaries with other variables.) This lets the researcher account for inter-group variation
associated not with the "treatment" itself, but from extraneous factors on the dependent variable,
the covariate(s) (Bird, 1995).
ANCOVA can control one or more covariates at the same time. In order to accurately identify
possible covariates, one needs sufficient background knowledge of theory and research in the
topic area. Ideally, there should only be a small number of covariates. Covariates need to be
chosen carefully and should have the following qualities: Continuous (at interval or ratio level,
such as anxiety scores) or dichotomous (such as male/ female); reliable measurement; correlate
significantly with the dependent variable; linear relationship with dependent variable; not highly
correlated with one another (should not overlap in influence); and relationship with dependent
variable the same for each of the groups (homogeneity of regression slopes) (Bird, 1995).
Each covariate should contribute uniquely to the variance. The covariate must be measured
before the intervention is performed. Correct analysis requires that the covariate not be
influenced by the treatment it therefore must be measured prior to treatment. The independent
variable is a categorical (nominal-level) variable. ANCOVA tests whether certain factors have an
effect on the outcome variable after removing the covariate effects.
It is capable of removing the obscuring effects of pre-existing individual differences among
subjects. It allows the researcher to compensate for systematic biases among the samples. The

inclusion of covariates can also increase statistical power because it accounts for some of the
variability (Rossman, 2014).
Assumptions
The model assumes that the data in the two groups are well described by straight lines that have
the same slope. An example of ANCOVA is a pretest-posttest randomized experimental design,
in which pretest scores are statistically controlled. In this case, the dependent variable is the
posttest scores, the independent variable is the experimental/ comparison group status, and the
covariate is the pretest scores (Rossman, 2014).
With ANCOVA, the F-ratio statistic is used to determine the statistical significance (p .05) of
differences among group means. When a researcher reports the results from analysis of
covariance (ANCOVA), he or she needs to include the following information: verification of
parametric assumptions; verification that covariate(s) measured before treatment; verification of
reliability of the covariate(s); verification that covariates are not too strongly correlated with one
another; verification of linearity; verification of homogeneity of regression slopes; dependent
variable scores; independent variable, levels; covariate(s); statistical data: significance, F-ratio
scores, probability, means, and effect size (partial eta squared) (Rossman,2014).

Example of analysis using Analysis of Covariance

A 2 by 2 between-groups analysis of covariance was conducted to assess the effectiveness of two


programs in reducing fear of statistics for male and female participants. The independent
variables were the type of proram (math skills, confidence building) and gender. The dependent
variable was scores on the Fear of Statistics Test (FOST), administered following completion of
the intervention programs (Time 2). Scores on the FOST administered prior to the
commencement of the programs (Time 1) were used as a covariate to control for individual
differences.
Preliminary checks were conducted to ensure that there was no violation of the assumptions of
normality, linearity, homogeneity of variances, homogeneity of regression slopes, and reliable
measurement of the covariate. After adjusting for FOST scores at Time 1, there was a significant
interaction effect. F (1, 25) = 31.7, p < .0005, with a large effect size (partial eta squared = .56).
Neither of the main effects were statistically significant, program: F (1, 25) = 1.43, p = .24;
gender: F (1, 25) = 1.27, p = .27.
These results suggest that males and females respond differently to the two types of
interventions. Males showed a more substantial decrease in fear of statistics after participation in
the math skills program. Females, on the other hand, appeared to benefit more from the
confidence building program.
Multivariate Statistics
Multivariate statistical analysis refers to multiple advanced techniques for examining
relationships among multiple variables at the same time.
Researchers use multivariate procedures in studies that involve more than one dependent variable
(also known as the outcome or phenomenon of interest), more than one independent variable

(also known as a predictor) or both. Upper-level undergraduate courses and graduate courses in
statistics teach multivariate statistical analysis. This type of analysis is desirable because
researchers often hypothesize that a given outcome of interest is affected or influenced by more
than one thing (Huberman, 1994).
Example of analysis using Multivariate Statistics
At the end of the course, students were asked to rate each different assessment strategy in terms
of difficulty, appropriateness to their needs while learning multivariate statistics, and how well
they felt they learned using that kind of assessment. Students were also asked to rank the four
assessment strategies in order of preference ( 1 = most preferred, 4 = least preferred). The survey
was anonymous and was administered during the last class of the semester by a student volunteer
while the instructor was not present. The purpose was explained as an opportunity for the
instructor to understand how to better structure course assessments in the future. Upon agreeing
to participate, the students were given the survey. A four-point scale was used with response
categories as indicated in the table below.
The results presented are based on 14 students who completed the course assessment (three
students were absent or chose not to participate). Students were also asked for additional
comments regarding their likes and dislikes about each assessment strategy, suggestions for how
to improve the use of that strategy, and any other observations or comments they might have. In
terms of establishing a course grade, the two open ended assignments (midterm and final) were
weighted 25% each; the scores on the other assignments (five altogether) were averaged and
worth 50% of the total. The results for each assessment strategy are provided in the table that

follows. The modal response for each technique is indicated in bold print. Other interesting
results to be discussed below are presented in bold print as well.
Table 1. Student Assessment Ratings
Difficulty ratings: frequency (and percent).

Response Categories

Structured Computer
Assignments

Open-Ended
Assignments

Article
Analysis

Annotating
Output

1. Not at all difficult

1 (7.1%)

0 (0.0%)

1 (7.1%)

0 (0.0%)

2. Slightly difficult but


not too challenging

5 (35.7%)

2 (14.3%)

4 (28.6%)

8 (57.1%)

3. Difficult but
challenging

7 (50%)

12 (85.7%)

7 (50%)

6 (42.9%)

4. Too difficult

1 (7.1%)

0 (0.0%)

2 (14.3%)

0 (0.0%)

Structured Computer
Assignments

Open-Ended
Assignments

Article
Analysis

Annotating
Output

1. Not at all appropriate or


useful

0 (0.0%)

0 (0.0%)

1 (7.1%)

0 (0.0%)

2. Slightly appropriate but


not for all my needs

2 (14.3%)

1 (7.1%)

5 (35.7%)

5 (35.7%)

3. Appropriate for many of


my needs

11 (78.6%)

7 (50.0%)

5 (35.7%)

9 (64.3%)

4. Very appropriate for all


my needs

1 (7.1%)

6 (42.9%)

3 (21.4%)

0 (0.0%)

Appropriateness ratings.

Response Categories

Level of learning ratings.

Structured
Computer
Assignments

Open-Ended
Assignments

Article
Analysis*

Annotating
Output

1. Didnt learn anything

0 (0.0%)

0 (0.0%)

1 (7.1%)

0 (0.0%)

2. Learned a little bit

3 (21.4%)

1 (7.1%)

6 (42.9%)

4 (28.6%)

3. Learned enough to be
comfortable with the topic

8 (57.1%)

7 (50.0%)

4 (28.6%)

8 (57.1%)

4. Learned a great deal more than I would have


thought

3 (21.4%)

6 (42.9%)

2 (14.3%)

2 (14.3%)

Response Categories

* one student did not answer this question for the article analysis.

Preference Scores
Students were asked to indicate their order of preference among the four assessment strategies,
with 1 = most preferred form of assessment, and 4 = least preferred. Average preference ratings
indicated that the most preferred forms of assessment were the structured computer assignments
(mean = 2.00), followed by annotating the output (mean = 2.31), use of open-ended assignments
(mean = 2.38), and article analysis (mean = 3.15). Yet in terms of difficulty level,
appropriateness, and learning ratings, the open-ended assignments actually received the better
ratings overall. Approximately 86% of the students found the open-ended assignments
challenging (response category 3); 93% found the open-ended assignments appropriate for many
or all of their needs (response categories 3 and 4); and 93% reported that through the open-ended
assignments they learned either enough to be comfortable with the topic or more than they would
have thought (response categories 3 and 4).

Chi- Square
The Chi Square statistic is commonly used for testing relationships on categorical variables. The
null hypothesis is that no relationship exists on these categorical variables in the population; they
are independent. The Chi-Square statistic is most commonly used to evaluate Tests of
Independence when using a cross tabulation (also known as a bivariate table). Cross tabulation
presents the distributions of two categorical variables simultaneously, with the intersections of
the categories of the variables appearing in the cells of the table (Peshkin, 1992). The Test of
Independence assesses whether an association exists between the two variables by carefully
examining the pattern of responses in the cells; calculating the Chi-Square statistic and
comparing it against a critical value from the Chi-Square distribution allows the researcher to
assess whether the association seen between the variables in a particular sample is likely to
represent an actual relationship between those variables in the population.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:

where fo = the observed frequency (the observed counts in the cells)


and fe = the expected frequency if NO relationship existed between the variables
As depicted in the formula, the Chi-Square statistic is based on the difference between what is
actually observed in the table and what would be expected if there was truly no relationship
between the variables.
Example of analysis using chi-square
the Chi-Square statistic appears as an option when requesting a crosstabulation in SPSS. The
output is labeled Chi-Square Tests; the Chi-Square statistic used in the Test of Independence is in

the first row labeled Pearson Chi-Square. This statistic can be evaluated by comparing the actual
value against a critical value found in a Chi-Square distribution (where degrees of freedom is
calculated as # of rows 1 x # of columns 1), but it is easier to simply examine the p-value
provided by SPSS. To make a conclusion about the hypothesis with 95% confidence, the value
labeled Asymp. Sig. (which is the p-value of the Chi-Square statistic) should be less than .05
(which is the alpha level associated with a 95% confidence level). Is the p-value (labeled Asymp.
Sig.) < .05? If so, conclude that the variables are dependent in the population and that there is a
statistical relationship between the categorical variables.

In this example, there is an association between fundamentalism and views on teaching sex
education in public schools. While 17.2% of fundamentalists oppose teaching sex education,
only 6.5% of liberals are opposed. The p-value indicates that these variables are dependent in the
population and that there is a statistical relationship between the categorical variables.
Data Analysis through Computer Software Programmes
Multitude of software programs designed for use with quantitative data is available today.
Quantitative research, predominantly statistical analysis, is still common in the social sciences
and such software is frequently used among social science researchers. Most notably, statistical
packages find applications in the social sciences (Peshkin, 1992). The main demand made of
such packages in social science research is that they be comprehensive, flexible, and can be used
with almost any type of file. A useful statistical software tool can generate tabulated reports,
charts, and plots of distributions and trends, as well as generate descriptive statistics and more
complex statistical analyses. Lastly, a user interface that makes it very easy and intuitive for all
levels of users is a must.
Conclusion
In conclusion data analysis is the process of systematically applying statistical and/or logical
techniques to evaluate data. There are many well-developed methods available for conceptually
or statistically analyzing the different kinds of data that can be gathered. Frequency data and chisquare analysis can supplement the narrative interpretation of such comments. For the analysis of
quantitative data, a variety of statistical tests are available, ranging from the simple (t-tests) to
the more complex Researchers performing analysis on either quantitative or qualitative analyses
should be aware of challenges to reliability and validity.

References
Bird, K. (Eds.). (1995). Computer-aided qualitative data analysis: Theory, methods and practice.
Sage.
Huberman, A. M. (1994). Qualitative data analysis. Sage publications. Thousand Oaks, CA.
Mugenda, O. M. (1999). Research methods: Quantitative and qualitative approaches. African
Centre for Technology Studies.
Peshkin, A. (1992). Becoming qualitative researchers: An introduction (p. 6).
NY: Longman.
Rossman, G. B. (2014). Designing qualitative research. Sage publications.

White Plains,

Potrebbero piacerti anche