Sei sulla pagina 1di 83

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262151892

Introduction to SPSS

Article June 2014

CITATIONS READS

2 10,520

1 author:

Saiyidi Mat Roni


Edith Cowan University
23 PUBLICATIONS 7 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Research project: Goods & services tax noncompliance model View project

All content following this page was uploaded by Saiyidi Mat Roni on 05 June 2014.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Saiyidi MAT RONI

2014
Introduction to SPSS

Files used in the exercises are datasets tailored specifically for the
illustration purposes.

Digitally signed by Saiyidi MAT

Saiyidi
RONI
SOAR Centre DN: cn=Saiyidi MAT RONI,
Graduate Research School o=Edith Cowan University, ou,

MAT RONI
email=m.matroni@ecu.edu.au,
Edith Cowan University c=AU
Joondalup, Australia. Date: 2014.06.05 16:29:36
+08'00'
Table of Contents

1 Preliminary data analysis: An analysis before the analysis ............................................................. 4


1.1 Cleaning up your data: Monotone .......................................................................................... 5
1.2 Cleaning up your data: Missing values and outliers ............................................................... 6
1.3 Missing values analysis (MVA) .............................................................................................. 10
1.3.1 Multiple imputation (MI) .............................................................................................. 11
1.3.2 Expected maximisation (EM) ........................................................................................ 15
1.4 Normality check .................................................................................................................... 18
1.5 Assess the quality of your instrument .................................................................................. 25
1.5.1 Reliability and validity tests .......................................................................................... 25
1.5.2 Content validity ............................................................................................................. 28
1.5.3 Construct validity .......................................................................................................... 28
1.6 Addressing biases .................................................................................................................. 31
1.6.1 Non-response bias ........................................................................................................ 31
1.6.2 Common method bias................................................................................................... 32
2 Factor analysis............................................................................................................................... 39
3 Latent variable .............................................................................................................................. 44
4 Test of differences among groups ................................................................................................ 47
4.1 t-test ...................................................................................................................................... 47
4.1.1 Independent samples t-test .......................................................................................... 47
4.1.2 Paired samples t-test .................................................................................................... 49
4.2 ANOVA .................................................................................................................................. 51
4.3 ANCOVA ................................................................................................................................ 55
5 Test of correlations ....................................................................................................................... 59
5.1 Pearson correlation............................................................................................................... 59
5.2 Regression ............................................................................................................................. 61
5.2.1 Multiple regression analysis.......................................................................................... 61
5.2.2 Hierarchical multiple regression ................................................................................... 68
6 Non-parametric test...................................................................................................................... 71
6.1 Mann-Whitney U................................................................................................................... 71
6.2 Kruskal-Wallis ........................................................................................................................ 73
7 Appendices .................................................................................................................................... 76
1 Preliminary data analysis: An analysis before the analysis | SOAR Centre
7.1 Appendix 1: Cleaning your data ............................................................................................ 76
7.2 Appendix 2: Quality of measurement model........................................................................ 77
7.3 Appendix 3: Useful citations & readings ............................................................................... 78
7.4 Appendix 4: Statistical remedies for CMB ............................................................................ 79
8 References .................................................................................................................................... 80

2 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Screen Delete monotones
data

MVA Choose MI or EM

Check Trim or Winsor


outliers

Check transform if
necessary
normality

Extraction: PCA, PAF


Run factor or ML
Rotation:
analysis Orthogonal or
oblique

Cronbach's alpha
Reliability AVE

CMB -> Harman's


Address single factor score
NRB -> split half,
biases then test means
difference

Preliminary data analysis checklist


3 Preliminary data analysis: An analysis before the analysis | SOAR Centre
1 Preliminary data analysis: An analysis before the analysis

In quantitative research, particularly when primary data is collected from surveys, a


preliminary analysis of your data analysis is a critical step required before the actual data analyses
such as regression, analysis of variance (ANOVA), analysis of co-variance (ANCOVA), structural
equation modelling (SEM) or other parametric or non-parametric statistics can be performed. The
preliminary data analysis is crucial to make sure that the subsequent analyses are all valid.

In the early days of the computer, there was an infamous term used to describe
characteristics of the number crunching machine GIGO! Its an acronym for garbage in garbage
out, which means that no matter how smart the computer is, if the input is wrong, a wrong output
will eventually suffice. Thats a fair description for most computer processes.

SmartPLS, SPSS, AMOS and other statistical software will process your data. But consider
this. If you have a kilo of garbage and later put it on a bathroom scales. Youll get a kilo of garbage
sitting on a really sophisticated measuring instrument.

However, if you pre-sort and pre-sift the garbage into appropriate categories, and then you
measure it accordingly, youll get something more valuable, more meaningful and most importantly
more useful and VALID. For example, if you sort and sift that 1 kilogram of garbage and you get 600
grams of papers and 400 grams of aluminium cans. This is more meaningful, more valuable data
even though it is still a pile of 1kg of garbage in total. But with such preliminary analysis (sort, sift
and measure), that should prompt you what to do next recycle! And not simply buried them at a
landfill.

Consider
missing
values

Preliminary
Normality data Consider
test analysis outliers
(PDA)

Quality of
measurement model

Figure 1.1: Preliminary data analysis.

4 Preliminary data analysis: An analysis before the analysis | SOAR Centre


The preliminary data analysis (PDA) works in a way that, before anything else, making sure
the data that you are about to analyse to accept or reject a hypothesis, is ready for such rigorous
statistical bombardment. It is an analysis required before the actual analyses can be performed.

And remember, the statistical software wont tell you which analyses to run for your data, so
it is important that you have a good understanding of general statistical principles, including which
analyses are appropriate for different types of data.

1.1 Cleaning up your data: Monotone


Simply put, monotone responses are responses that have no variance. This type of response
has little or no value for an analysis. You can simply delete such cases and report it in your writings
(see Yeh, 2009).

Consider case 74 in the following example. The respondent answered 5 for all questions
having 5-point Likert scale. This case was removed from the actual analysis.

Figure 1.2: Monotone response.

When to use:

1. Before you run any other analysis.


2. Dataset has been transferred to Excel format.
3. This exercise uses MS Excel program NOT SPSS.

To run monotone check:

Open or copy your dataset to Excel

Cell AJ2 (last column) > type this formula =VAR.S(G2:AI2)

Copy the formula to the rest of the cases > find cases with 0 (zero) variance > delete the
case(s).

Copy the dataset back to SPSS.

Example:

1. Dataset: Missing value and monotone.xlsx


2. This exercise is to identify monotone cases in the dataset.

5 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Last column (cell AJ2) > type the
formula:

=VAR.S(G2:AI2)

Copy the formula into all cells in


the column. Delete case(s) with
0 (zero) variance.

This is the column range of the


variables of interest. This range
excludes demographic variables.

1.2 Cleaning up your data: Missing values and outliers


The first thing that you need to do before you run any statistical test (and that includes the
PDA), is to clean up your data. Two most obvious raw data issues requiring cleaning and polishing
are the missing values and the outliers. The missing values are responses that have not been
completed while the outliers are data that are significantly distant from the rest of the data set (i.e.
the mean value).

Both missing values and outliers can happen by chance in any data distribution. While the
missing values are common in a large data set, the outliers may indicate measurement errors or that
the sample has a heavy-tailed distribution. In either case, the outliers can be removed or maintained
depending on context of the study and of course, with a sound theoretical backing.

In SPSS, removing the data is pretty straight forward press delete button or exclude the
case from any subsequent analysis. The earlier is simple but not recommended because the
case/data is completely removed. My preference is the latter option because that allows future
analyses or reviewers to work with the incomplete data. In the latter, SPSS maintains the data but
does not account for it in the statistical analysis.

When to use:

1. There are missing values in your dataset.


2. Missing value code has been designated in SPSS (variable view).

To run recode:

Transform > Recode into Same Variables

Select required variables > move to Variables field > click Old and New Values

Select System- or user-missing > New Value > Value > type 99 > Add > Continue

6 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Example:

1. Dataset: Missing value.sav


2. This exercise is to recode missing values as 99.

Variable view.

Make sure that 99 has been


designated as missing value.
Click this field to input missing
value code.

Note: You can choose any


number to tell SPSS that its a
missing value. But this code has
to be outside your actual data
range, e.g. if your data range is
between 1 to 100, you have to
choose 999 as the missing value
code.

Select Discrete missing values >


type 99 > OK.

7 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Select required variables > move
to Variables field > click Old and
New Values

New Value > Value: > type 99 >


click Add > Continue

8 Preliminary data analysis: An analysis before the analysis | SOAR Centre


1. To completely remove the case/data:
Click the case > Press Delete.

2. To maintain the case but exclude from subsequent analysis:


Data > Select cases > If condition is satisfied > If
Choose the variable(s) > >type var1=992 > Continue > OK.

Data > Select cases > If condition is satisfied > If

Choose the variable(s) > var1=992 > Continue > OK.

Figure 1.3: Exclude cases from analysis.

As for the outliers, you can delete the case containing the outliers as in step 1 above or
constraint the outliers to a maximum (or minimum) value. Put is simply, you replace the outlier data
with a value that is more likely, more meaningful based on a logical context of the study, and of
course, a sound theoretical backing.

1. Recode into a different variable:


Transform > Recode into different variables > select variable(s) > > Old and New values
Select Range > key in the range (e.g. 6 through 98) > New Values (e.g. 5) > Add > Continue >
OK.

1
var = the variable of your choice that contains missing values.
2
99 = my personal coding structure to instruct SPSS how to mark missing values in the data set. You can
specify any number on the Variable view > Missing value column > type the number of your choice.
9 Preliminary data analysis: An analysis before the analysis | SOAR Centre
Transfrom > Recode into different variables > select
variable(s) > > Old and New values

Select Range > key in the range (e.g. 6 through 98) >
New Values (e.g. 5) > Add > Continue > OK.

Figure 1.4: Recode outliers into different variables.

1.3 Missing values analysis (MVA)


There are many treatments available for missing values. Frequently used MVA methods are:

1. Pairwise/Listwise deletion: removes cases with missing values.


2. Replace with mean
3. Multiple imputations (MI)
4. Expected maximisation (EM)

Although method 1 and 2 are the simplest, many studies do not recommend such
treatments. This leaves us with option 3 (MI) and 4 (EM) as viable alternatives.

Before running MVA, it is advisable to recode your missing values. In this example, the
missing values are coded as 99. Run Recode into Same Variables process.

Transform > Recode into Same Variables

Select all variables > move to Numeric Variables field > Old and New Values

10 Preliminary data analysis: An analysis before the analysis | SOAR Centre


In Recode into Same Variables: Old and New Values dialog > Old Value > Select System- or
user-missing > New Value > Type 99 > Add > Continue > OK.

1.3.1 Multiple imputation (MI)


Multiple imputation (MI) method yields several sets data sets depending on how many
imputation cycles you set in SPSS. From the new dataset, you can choose which imputation cycle
that corresponds to the most likely complete data. An issue with MI method is that the estimated
value(s) may exceed your actual scale range. For example, in a 5-point Likert scale, the resulting
imputed values can in certain cases exceed the maximum score of 5.

To run MI:

Analyze > Multiple Imputation > Impute Missing Data Values

In Impute Missing Data Values dialog:

Selects required variables > move to Variables in Model field > Set imputations values
(default is 5) > Location of Imputed Data > Create a new dataset > Dataset name: > Type MI
> OK

Figure 1.5: Multiple imputation.

11 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.6: Setting parameters to multiple imputations.

Once completed, you should have a new dataset with N x i + N number of cases, where N is
the number of cases in your original dataset, i is the number of imputations you specify in the MI
procedure. In our example, N = 131 (original dataset), N = 786 (131 x 5 + 131) (see Figure 1.9). By
default, SPSS adds a new variable called Imputation_ to identify several datasets in the file as follow:

Imputation_ = 0 (original dataset without imputation)

Imputation_ = 1 (dataset after 1st imputation process)

Imputation_ = 2 (dataset after 2nd imputation process)

Imputation_ = 5 (dataset after 5th imputation process)

Using the values after the 5th imputation, you can split this dataset as a new file.

12 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Data > Select Cases > Select Imputation Number > Output > Copy selected cases to a new
dataset > Dataset name: > type postMI (see Figure 1.7)

Select If condition is satisfied > click If (see Figure 1.7)

In Select cases: If dialog > Click variable Imputation Number on the left panel > click move
icon to move the variable to right panel > click = sign > type 5 > Continue > OK. (see Figure
1.8)

Figure 1.7: Parameters settings to select 5th imputation dataset as a new file.

13 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.8: Select cases of the 5th imputation process.

Figure 1.9: Multiple imputations result.

14 Preliminary data analysis: An analysis before the analysis | SOAR Centre


1.3.2 Expected maximisation (EM)

Another preferred method dealing with missing data is expected maximisation (EM)
technique. EM is an iterative procedure producing variances, covariances and means in an
initial step, and repeating the whole process until changes in the parameters are so small,
that the final solution is said to have converged (Graham, 2012). EM is argued to be more
suitable when the pattern of missing values is at least missing at random (MAR). However,
EM performs best when the missing data is missing completely at random (MCAR). For
discussions on MAR and MCAR, see Graham (2012), Karanja, Zaveri, and Ahmed (2013) and
Bennett (2001).

Analyze > Missing Values Analysis

On Missing Value Analysis dialog (see

Figure 1.10) > Select required variables > move to Quantitative Variables field

Estimation > Select EM

Click button EM > Select Save completed data (see Figure 1.11) > Create a new
dataset > Dataset name: > Type PostEM > Continue > OK.

Figure 1.10: Expected maximisation procedure.

15 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.11: Save EM result as a new dataset.

Look at Univariate Statistics table in the result. You should check for percentage of missing
data for each variable. Some scholars (e.g. Hair et al., 2006; Scheffer, 2002) set the missing value
upper limit to 20%. Anything more than this ceiling can potentially bias the final result.

16 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Univariate Statistics
a,b
N Mean Std. Deviation Missing No. of Extremes

Count Percent Low High

PEOU1 130 4.01 .641 1 .8 . .


PEOU2 131 3.95 .689 0 .0 . .
PEOU3 129 3.89 .752 2 1.5 . .
PEOU4 129 3.98 .673 2 1.5 . .
PEOU5 130 4.05 .697 1 .8 . .
PEOU6 129 3.96 .642 2 1.5 . .
PU1 129 4.08 .680 2 1.5 2 0
PU2 129 4.02 .739 2 1.5 1 0
PU3 130 4.14 .679 1 .8 2 0
PU4 129 4.02 .770 2 1.5 2 0
PU5 129 4.05 .732 2 1.5 2 0
PU6 129 4.05 .721 2 1.5 3 0
INT1 129 3.84 .843 2 1.5 1 0
INT2 130 3.84 .815 1 .8 1 0
INT3 131 3.88 .744 0 .0 0 0
INT4 131 3.66 .856 0 .0 1 0
ATT1 131 5.82 1.233 0 .0 3 0
ATT2 131 6.06 1.128 0 .0 13 0
ATT3 131 6.15 1.046 0 .0 12 0
INL1 129 3.84 .744 2 1.5 1 0
INL2 131 3.70 .751 0 .0 0 0
INL3 130 3.74 .721 1 .8 0 0
ID1 131 3.56 .776 0 .0 2 0
ID2 130 3.71 .802 1 .8 2 0
ID3 130 3.86 .745 1 .8 1 0

a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR).


b. . indicates that the inter-quartile range (IQR) is zero.

Table 1.1: EM analysis result - univariate statistic.

The following table is the most important. This is a result of Littles MCAR test (Little, 1988).
In this case, the p-value is not significant3 (i.e. p > .05) indicating that the pattern of missing values in
the data set is MCAR. Thus, the imputed values derived from EM procedure can be used for
subsequent analyses.

3
Littles MCAR tests the pattern of missing values to be missing completely at random (MCAR). If the p-value is
not significant, that means the pattern of missing values in the dataset is similar to an expected MCAR pattern.
17 Preliminary data analysis: An analysis before the analysis | SOAR Centre
Table 1.2: Little's MCAR test.

1.4 Normality check


For a more powerful statistical explanation, a parametric analysis is preferred. However,
these parametric tests are mostly based on an assumption that the data distribution is normal4. To
check for the normality, use step 1 below. In many cases, it is good to assess the normality visually
the famous bell shape! A simpler way to get to it is on step 2. However, researchers are comfortable
with objective measure of normality assessment, i.e. using Kolmogorov-Smirnov or Shapiro-Wilk
tests. These tests nonetheless are not recommended for sample size of less than 30 and they are
sensitive for a large sample size (Hair et al., 2006), e.g. more than 1000.

1. Explore the normality:


Analyse > Descriptive Statistics > Explore > set appropriate parameters.

2. Bell shape display:


Graphs > Legacy Dialogs > Histogram.

Select the required variable > > OK.

You can also check for normality using z-score of kurtosis and skewness of each variable. Use this
formula to calculate for zk and zs.
Zk and Zs values should be within
2.58 for p = .01 or 1.96 for p =
.05 (Hair, Black, Babin, Anderson,
& Tatham, 2006) for the data to
be considered as normally
distributed.

Skewness and kurtosis values can be


taken from SPSS output.

Analyse > Explore > Statistics >


tick Descriptives > Continue

4
A simple discussion is also available at following webpages
http://www.stattutorials.com/SPSS/TUTORIAL-SPSS-Assess-Normality.htm and
http://www.isixsigma.com/tools-templates/normality/tips-recognizing-and-transforming-non-normal-data/
18 Preliminary data analysis: An analysis before the analysis | SOAR Centre
Select variable comp skill from
left panel > move to Dependent
List

19 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Set the following parameters
for Statistics and Plots

20 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Both Kolmogrov-Smirnov and
Spapiro-Wilk tests indicate data
for all variables are not normally
distributed.

We want value which are NOT


significant for both tests (p > .05).

Box plot:
Potential outlier in the dataset

21 Preliminary data analysis: An analysis before the analysis | SOAR Centre


22 Preliminary data analysis: An analysis before the analysis | SOAR Centre
Graphs > Legacy Dialogs >
Histogram.

Select the required variable >

Select Display normal curve > OK.

Figure 1.12: Bell shape histogram to visually analyse normality.

23 Preliminary data analysis: An analysis before the analysis | SOAR Centre


My experience in some of research projects with relatively large sample size was that its a
total relief to see a nice symmetrical bell shape! That bell shape was like caffeine in a cup of strong
coffee. But what if your data is non-normal?

There are two ways to deal with non-normal data consider the non-parametric statistics
OR transform the data.

Figure 1.13: Non parametric test in SPSS.

If you opt to proceed with stats that require normality, a data transformation is certainly
your next move. There are 5 different methods to transform non-normal data into an approximate
normal distribution (Osborne, 2010). These are:

i. Square root transformation.


ii. Log transformation.
iii. Inverse transformation.
iv. Arcsine transformation.
v. Box-Cox transformation5 (Box & Cox, 1964).

5
A practical explanation is available at
http://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-
transformation/

and the theoretical unpinning the method is available at


http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm

Should you wish to study the theoretical makeup of the Box-Cox transformation, the original article of Box and
Cox (1964) is available from the reference list of this document.
24 Preliminary data analysis: An analysis before the analysis | SOAR Centre
The first 4 in the list are all power transformation which means that mathematical square
and/or square roots are applied. Although the log transformation method is common in social
science (Osborne, 2010), Box-Cox transformation is a popular non-normal data transformation in
many studies. However, there is no straight-forward method for Box-Cox transformation in SPSS.

1.5 Assess the quality of your instrument


To come out with a functional and practical survey instrument is hard enough, assessing the
quality of the hard-thought items in the questionnaire is another challenge for a researcher. There
are two aspects of instrument quality that you need to work on at the preliminary data analysis
(PDA) level. These are:

i. Reliability and validity.


ii. Assessment of bias contaminations.

1.5.1 Reliability and validity tests


The test name speaks all This category of tests is to ensure that the instrument used in the
study is reliable which means that it measures the constructs consistently; and valid which implies
that it measures what it is designed to measure. In a simple term, if you design a bathroom scale
and you weigh yourself 10 times on that instrument. The bathroom scale should give you the same
kilograms weight each time you step on it (unless of course, you gain some weights during the
period of 10 attempts). That shows your instrument is reliable because it consistently measures your
weight. And the bathroom scale is valid because it measures your weight in kilograms or pound, not
in Celsius or Fahrenheit!

Two most common ways to test for reliability are:

i. Cronbachs alpha min. value of .70 (Hair et al., 2006).


ii. Composite reliability min. value of 0.7.

Run reliability test:


Analyze > Scale > Reliability Analysis
Select items >
Statistics > select these options: > Continue > OK.
Scale if item deleted
Means
Variances
Correlations
Inter-item: Correlations

25 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Analyze > Scale > Reliability Analysis
Statistics > select these options:
Select items > Scale if item deleted
Means
Variances
Correlations
Inter-item: Correlations

Figure 1.14: Cronbach's alpha reliability.

26 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Note that there are 130 cases analysed with
1 excluded because of missing data.

Smile! Your Cronbachs alpha exceeds the


minimum threshold of .70.

Note that both main and standardised are equal.


This shows that the inter-item correlation is good.
If these values differ significantly, you may want to
check the inter-item correlation matrix. It may
indicate that your instrument has no internally
consistency not reliable.

Check this video for an explanation:


https://www.youtube.com/watch?v=9rS49o1rdnk

I only load PEOU items for the reliability


test (6 items in total). The actual instrument
consists of other components with varied
numbers of items. Each items group has to
be assessed separately. After all, you will
not get a satisfactory Cronbach value if you
mix weight values in kilograms of the
bathroom scale and Celsius values from a
thermometer. They measure different
things!

6
Figure 1.15: Reliability test result .

6
This reliability test was run on an actual research data, conducted in 2011, which was based on Technology
Acceptance Model (TAM). It comes to no surprise that the Cronbachs alpha is high because the items were
adapted from a pool of highly cited studies, not to mention that the TAM itself is a saturated researched
theory.
27 Preliminary data analysis: An analysis before the analysis | SOAR Centre
1.5.2 Content validity
When you have a pool of items designed to measure a construct then you should be able to
demonstrate that a great care has been made during the construction of the survey instrument. The
content validity is the degree to which the items represent the dimensions of the construct being
measured (Hu, Dinev, Hart, & Cooke, 2012). The validity of the items is normally assessed through
literature and also preferably reviewed by domain experts (Straub, Boudreau, & Gefen, 2004). The
instrument has to be carefully constructed and pilot tested.

Now it becomes obvious why the literature review and the proposal stage is pretty much
7
critical prior to the data collection. It is unconventional to have the items reviewed after the data
has been collected. In short, content validity is a subjective evaluation of the items which can be
accomplished through:

i. Literature
ii. Domain expert review

1.5.3 Construct validity


Unlike content validity which is assessed prior to the data collection, construct validity can
only be performed when you have the data on hand. Thats pretty scary! (Trust me I know). The
construct validity defines the items used to measure a given construct actually measure that
construct and nothing else. In a simpler term, you have multiple items (questions) measuring a single
construct (factor). Coming back to the bathroom scale analogy, the construct is the weight, and the
items are the numeric values of the weight both in pounds and kilograms.

Weight Temperature
(Construct) (Construct)

Kilograms Pounds Celsius Fahrenheit


(item 1) (item 2) (item 1) (item 2)

Figure 1.16: The construct (latent factor or variable) is measured by items (observed variables).

7
Make a note of this statement. Well visit this later down the road.
28 Preliminary data analysis: An analysis before the analysis | SOAR Centre
The construct validity can be assessed by:

i. Convergent validity: AVE > .5, square root of AVE > variance between constructs
ii. Discriminant validity: MSV < AVE and ASV < AVE

The convergent validity is the extent to which the items correlate with each other within
their parent construct (factor). While the discriminant validity is the extent to which the items do not
correlate with other items of a different construct. In short, the convergent validity means that the
theoretically predicted items measuring a factor only load well into that factor; not outside its
parent factor. Conversely, the items which are predicted by theories to be not measuring a construct
must have low loading outside their parent factor.

In the bathroom scale and thermometer case kilograms and pounds should correlate well
with each other, measuring the weight, while Celsius and Fahrenheit correlate with each other
measuring the temperature. If kilogram correlates with Celsius, that suggests we have a large cross
loading value indicating kilogram is also measuring temperature (which doesnt make sense!). This
violates the theory and shows that we have a serious discriminant validity issue. If on the other
hand, Celsius and Fahrenheit do not correlate well, this indicates we have convergent validity
problems.

29 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.17: Discriminant validity analysis (an excerpt from Hu et al., 2012, p. 635).

Figure 1.18: Discriminant validity analysis (an excerpt from Dinev, Goo, Hu, & Nam, 2009, pp. 403-404).

30 Preliminary data analysis: An analysis before the analysis | SOAR Centre


1.6 Addressing biases
Bias contaminates data, masking the true value of an observed correlation between
variables. Causes of bias vary according to the nature of each study, the field of the study and the
methodology the study uses. Nonetheless, these causes can be examined in two types of bias and
addressed accordingly.

1.6.1 Non-response bias


The non-response bias is closely related to the sampling technique. For an example, if you
send out 1,000 questionnaires and only 25% were returned, then the other 75% is a cause of
concern. That 75% unreturned questionnaire is the non-response bias. The questions that you need
to answer are:

i. Will the result of the 25% be similar to the other 75%?


ii. Could the result from the analysis on the 25% be generalised over the population?

In many occasions, the non-response has nothing to do with human psychological reluctance
to reply, its just a bad sample. Its not a rocket science to understand that if you mail a
questionnaire to a company which has moved to a different address, then surely that contributes to
non-response bias. Therefore, in a case of mail survey, it is important to use the most current
database of addresses to reduce the non-response bias. This step also applies in the email survey.

After all the precautions you make, you must be wondering how many is considered
sufficient. The answer is theres no hard rule for that. REF Shannon (1948) put 70% response rate is
acceptable while REF Kerlinger (1968) agreed at 80%.

However, a little consolation was offered by Paxson (1995) who posited that low response
rate is inevitable and tolerable. This is because the studies in social science discipline mostly centre
on homogenous populations with a strong commonality in group identity that affect decision making
people in this homogenous group make a decision on a pretext of being a member of the group
rather than an isolated individual (Leslie, 1972). Leslies analysis of 29 studies showed that the non-
response bias only affects the demographic variables, rather than the independent variables, on the
dependent variables.

As a precautionary however, its a good and wise move to run statistical analyses to test for
the non-response bias. Scholars have been using proxies to test for the non-response bias because
technically its very hard, if not impossible, to get a response from those who choose not to respond.
Testing the presence of the non-response bias can be made by:

i. Small scale post-hoc analysis effectively, you run a small-scale data collection after
the actual data collection stage ends, and later compare if the results of the two to
see if theres any significant difference.
ii. Split the data between two response waves (early and late reply)8.

8
Use t-test procedure (Section 4.1.1) for each variable in the study to find if theres any difference. If your
dataset is not normally distributed, use Mann-Whitney U (section 6.1) to test differences among early and late
respondents.
31 Preliminary data analysis: An analysis before the analysis | SOAR Centre
1.6.2 Common method bias
Common method bias (CMB) is a measurement error (Podsakoff, MacKenzie, Lee, &
Podsakoff, 2003; Podsakoff, MacKenzie, & Podsakoff, 2012) that threatens the validity of a
conclusion drawn upon statistical results. This bias is observed via a presence of a systematic
variance (Bagozzi & Yi, 1990) that can inflate or deflate a given relationship among variables (Doty &
Glick, 1998) leading to unsound conclusions.

The common method bias can be attributed by raters (e.g. consistency motif and social
desirability), item charateristics (e.g. complex and ambiguous items) and context (e.g. context-
induced mood), and measurement context (e.g. time and location of measurement, common
medium to obtain measurement) (Podsakoff et al., 2003)9. Identifying the sources of the common
method bias allows us to better control for their influence on the data. Both procedural and
statistical measures are normally used to control for the biasing effect. The procedural measures
concern over the approach the data is collected and the instrument design. The measures include:

i. Obtain measures of the predictor and criterion variables from different sources.
ii. Use temporal, proximal, psychological or methodological separation of
measurement.
iii. Protect respondent anonymity and reduce evaluation apprehension.
iv. Counterbalance question order.
v. Improve scale items.

As much as the procedural methods are important, the statistical approaches also play
another objective role to control for the common method bias influence. Statistical controls against
the common method bias are:

i. Harmans single factor test.


ii. Partial correlation procedures.
iii. Control the effects of a directly measured latent method factor.
iv. Control the effects of an unmeasured latent method factor.
v. Multiple methods factors (multi-traits multi-method).
a. Confirmatory factor analysis model.
b. Correlated uniqueness model.
c. Direct product model.

Among all 4 main methods above, Harmans single factor test is the simplest measure. The
test is the most widely used in the literature (Podsakoff et al., 2003). What is required is a little
tweak in the factor analysis. Simply load all variables into the factor analysis, but constraint the
number of factor to 1. From the result, search for Total Variance Explained table and look at the
first component. If the first component accounts less than 50% of the all variables in the model, take
a deep breath and smile! Your instrument is free from significant common method bias effects.

The justification for the conclusion is that, if CMB is presence, than the unrotated factor
analysis will show that one item accounts for the majority of the variances in the model. Podsakoff
et al. (2003) disputed this approach claiming that the technique does nothing to control for the CMB

9
See Podasakoff (2003) and Bagozzi (1998) for a good discussion on the common method bias.
32 Preliminary data analysis: An analysis before the analysis | SOAR Centre
effect. Of course, it doesnt! Its a test to see if theres any substantial CMB presence in the data. The
only challenge you face using this method is to scratch your head looking for justifications when the
result shows you have more than 50% of the total variance explained by a single factor (see Figure
1.19).

To run the Harmans single factor score:

1. Analyze > Dimansion Reduction > Factor.


2. Select all independent viariables >
3. Extraction > Extract > Fixed number of factors > Factors to extract > set to 1 (see Figure
1.22) > Continue.
4. Rotation > Method > None > Continue >OK

The 43% variance explained by a single


factor shows that the common method bias
is not a major concern in this study (less
than 50% cut-off point). The result is
obtained by running unrotated, a single-
factor constraint of factor analysis in SPSS
statistic.

33 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.19: Total variance explained for an unrotated factor analysis.

Figure 1.20: Factor analysis in SPSS Statistics version 21.

Figure 1.21: Select all independent variables to test for the common method bias.

34 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.22: Extraction dialog box.

Extraction > Extract > Fixed number of


factors > Factors to extract > set to 1 >
Continue.

35 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Figure 1.23: Rotation dialog box.

Rotation > Method > None > Continue >OK

36 Preliminary data analysis: An analysis before the analysis | SOAR Centre


Controlling for CMB requires both procedural and statistical remedies. One of the most
effective ways to control for CMB is where the sources for measure of independent variables
(predictor variables) can be made separately from that of dependent (criterion variables)10. For an
example, if you are studying a relationship between employee satisfaction (predictor) and job
performance (criterion), then you may want to sought employee satisfaction data from the
employee and the job performance data from superior evaluation of their job performance. This
separation segregates two sources of data with two entirely different methods.

However, not all studies can be made in such contextual manner. Thus, concern over CMB
persists. Podsakoff et al. (2003) recommended a very assessment as to which method is suitable for
which study context. Figure 1.24 shows the summary of 7 study contexts in which different controls
can be made against CMB. It should be noted that the figure should be read together with Appendix
4.

10
Do you still remember footnote 7? Again, this is another good reason why the research proposal stage is
critical. Not only it defines your study context, but also the very mechanisms you are going to use for the data
collection that will substantially impact downstream stages data analysis, discussion and conclusion.
37 Preliminary data analysis: An analysis before the analysis | SOAR Centre
Figure 1.24: Controlling for common method bias (Podsakoff et al., 2003, p. 898).

38 Preliminary data analysis: An analysis before the analysis | SOAR Centre


2 Factor analysis

Factor analysis allows researchers to substantiate a given construct based on multiple


indicators (manifest variables11). Social science discipline regularly uses this technique to measure
latent (unobserved) variables which are reflected by their corresponding observed variables.

These points are to be considered when performing a factor analysis:

1. Extraction methods:
a. Principal component analysis (PCA)
b. Principal axis factoring (PAF)
c. Maximum likelihood (ML)

2. In any of the extraction method, consider the rotation method:


a. Orthogonal (items are considered to be uncorrelated, i.e. factors are rotated at
900 angle). E.g Varimax rotation.
b. Oblique (items are allowed to be correlated). E.g. Direct oblimin.

3. KMO and Barlette test of sphericity


4. Item loading and cross-loading

When to use:

1. There are multiple indicators for a given latent (unobserved) variable.


2. Before computation of summated/composite scale.

To run factor analysis:

Analyze > Dimension Reduction > Factor

Descriptives > Select Initial solution > Select KMO and Barletts test of sphericity.

Example:

1. File: FactorAnalysis.sav
2. Find which items cluster into which factor or component.

11
Manifest variables, observed variables and indicators refer to the same type of variables, i.e. variables which
can be directly measured. These terminologies are used interchangeably throughout this text.
39 Factor analysis | SOAR Centre
Figure 2.1: Factor analysis > Descriptives settings.

Extraction > Method > Principal components.

Figure 2.2: Factor analysis > Extraction method.

Rotation > select Varimax > Display > Select Rotated solution.

40 Factor analysis | SOAR Centre


Figure 2.3: Factor analysis > rotation method.

Options > Exclude cases listwise > Coefficien Display Format > Select Supress small
coefficients > Absolute value below > Type .3.

Figure 2.4: Factor analysis > Options.

KMO > .6 is acceptable (Allen & Bennett,


2010). KMO measures the amount of
variance that can be explained by the
factors.
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .898


Approx. Chi-Square 2642.944

Bartlett's Test of Sphericity df 406

Sig. .000

41 Factor analysis | SOAR Centre


Table 2.1: KMO and Barlett's tests results.
Barlett < .001 (significant) (Allen &
Bennett, 2010). This indicates the dataset
is suitable for factor analysis.

Point of inflection. From 7 onwards, the


chart starts to scree, indicating that 6-
factor (component) solution is
appropriate.

Figure 2.5: Scree plot.

A scree-plot is used to determine appropriate number of factors (components) to retain.


Looking at the point of inflection where the line starts to flatten, the number of factor to retain
corresponds to the number before the inflection.

Depending on the extraction method, rotated solution is normally used to identify which
item belongs to which component. Some scholars use minimum loading of .5 with low crossloading
as a basis to evaluate items to be retained for composite scores. Note that in the following rotated
solution, loading of less than .3 is not displayed. This was supressed in the steps above.

42 Factor analysis | SOAR Centre


a
Rotated Component Matrix

Component

1 2 3 4 5 6

ease1 .796
ease2 .336 .704
ease3 .411 .731
ease4 .790 Internalisation and Identification items
ease5 .740 load into a single factor. Depending on
ease6 .711
the questions and theories, you may
want to re-specify these as a single factor
usefulness1 .648 .311 .394
rather than 2 distinct factors.
usefulness2 .655 .355
usefulness3 .799
usefulness4 .805
usefulness5 .824
usefulness6 .596 .362
Intention1 .779
Intention2 .322 .309 .755
Intention3 .786
Intention4 .726
Attitude1 .780
Attitude2 .837
Attitude3 .319 .806
Internalisation1 .787
Internalisation2 .797
Internalisation3 .766
Identification1 .725
Identification2 .394 .315 .609
Identification3 .403 .697
Compliance1 .708
Compliance2 .796
Compliance3 .849
Compliance4 .855

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 6 iterations.

Table 2.2: PCA results.

43 Factor analysis | SOAR Centre


3 Latent variable
Once you have analysed the components or factors extracted from factor analysis stage, you
can create composite scales (latent variables) that will be used in subsequent analyses. There are
several ways to create a composite scale, most commonly used are:

1. Summated scale. Add up raw scores from all indicators which load into a given factor.
2. Means. Calculate mean scores for all indicators for a given factor and sum them.
3. Factor scores. Add factor scores for all indicators for a given factor.

Of all three methods above, mean gives researchers more control over the calculations (Hair
et al., 2006) and facilitates interpretation of descriptive analysis results.

Example:

1. File: CompositeScore.sav
2. Find which items cluster into which factor or component.

44 Latent variable | SOAR Centre


In order to create the composite scale, use:

Transform > Compute Variables


2. Select All > Sum
1. Type variable name INTENT

3. Select Intention1 to Intention4 , separated by comma > OK

4. A new variable called


INTENT is added in the last
column.

45 Latent variable | SOAR Centre


2. Type PEOU 1. Select All > Mean

3. Select ease1 to ease6 , separated by comma > OK

4. A new variable
called PEOU is
added in the last
column.

46 Latent variable | SOAR Centre


4 Test of differences among groups

4.1 t-test
This type of test provides a basis to analyse if there is any difference between two sample
groups, e.g. between male and female, before and after treatment (in an experiment), and
early and late respondents. If the test is significant (i.e. p < .05) then we can conclude that the
two samples means are statistically different.

4.1.1 Independent samples t-test


When to use:

1. Dependent variable (DV) data type is scale.


2. Number of independent variable (IV) is 1.
3. Number of sample groups is 2.
4. Sample groups are independent of each other.
5. Number of covariates is 0.

In order to run independent samples t-test, the following 4 assumptions have to be met:

1. Scale of measurement. DV has to be interval or ratio.


2. Independent. Each participant is measured once and has no influence on others in the
survey or experiment.
3. Normality. Each latent variable is normally distributed.
4. Homogeneity of variance. Each set of scores is approximately equal. This is assessed by
Levenes test which is part of the t-test result.

To run independent samples t-test:

Analyze > Compare Means > Independent-Samples T Test

Select the variable to be tested > move to Test Variable(s) field

Define Groups > Group 1 > Type 1 > Group 2 > Type 2 > OK.

Example:

1. Dataset: IndependentTTest.sav
2. This exercise is to test any significant difference in attitude between male and female
participants toward e-learning system.

47 Test of differences among groups | SOAR Centre


Select INTENT > move to Test
Variable(s) field .

Select GENDER > move to


Grouping Variables.

Define Groups > set the


parameters as shown

Continue > OK

Levenes test is significant.

Thus, read the result from


Equal variances not assumed.

48 Test of differences among groups | SOAR Centre


Compare the result above with the following.

Group size is not equal

Levenes test is not


significant.

Thus, read the result from


Equal variances assumed.

4.1.2 Paired samples t-test


When to use:

1. Dependent variable (DV) data type is scale.


2. Number of independent variable (IV) is 1.
3. Number of sample groups is 2.
4. Sample groups are related to each other.
5. Number of covariates is 0.

In order to run related (paired)-samples t-test, the following 3 assumptions have to be met:

1. Scale of measurement. DV has to be interval or ratio.


2. Normality. Each latent variable is normally distributed.
3. Normality of difference scores. Each pair of scores is approximately normally distributed.

To run paired-samples t-test:

Analyze > Compare Means > Paired-Samples T Test

Select the variables to be tested > move to Paired Variables field > OK.

Example:

3. Dataset: Paired t Test 1.sav


4. This exercise is to test any significant difference in attitude toward e-learning system of
those who participated in an e-learning course.

49 Test of differences among groups | SOAR Centre


Variations in mean
score if this
experiment is
repeated.

Shows whether
correlation
between before
and after course
groups is
significant.

Shows whether the


effect of course is
significant.

50 Test of differences among groups | SOAR Centre


4.2 ANOVA
t-test is useful to test for statistical difference between 2 groups. However, when we more
than 2 groups of sample, this prompts for other approach in this case, analysis of variance
(ANOVA) is a good choice to start.

When to use:

4. Dependent variable (DV) data type is scale.


5. Number of independent variable (IV) is 1.
6. Number of sample groups is more than 2.
7. Sample groups are independent of each other. Use one-way repeated measure ANOVA
instead if the samples are related.
8. Number of covariates is 0.

In order to run independent samples ANOVA, the following 4 assumptions have to be met:

1. Scale of measurement. DV has to be interval or ratio.


2. Normality. Each latent variable is normally distributed.
3. Independent. Each participant is measured once and has no influence on others in the
survey or experiment.
4. Homogeneity of variance. Each set of scores is approximately equal. This is assessed by
Levenes test which is part of the ANOVA result.

To run independent samples ANOVA:

Analyze > Compare Means > One-way ANOVA

Select DV > move to Dependent List field

Select IV > move to Factor field

Click Post Hoc > Equal Variances Assumed12 > Select Tukey > Continue

Example:

3. Dataset: ANOVA 1.sav


4. This exercise is to test if education level influences intention to use e-learning.

12
Tukey test is appropriate for equal sample size.
Gabriel procedure is more appropriate for an unequal sample size.
Hochberg GT2 is appropriate when the sample size is extremely unequal.
Games-Howell is normally when equal variance cannot be assumed (i.e. Levenes test is significant).
51 Test of differences among groups | SOAR Centre
52 Test of differences among groups | SOAR Centre
Extremely unequal sample
size. Read result from
Hochbers GT2.

Homogeneity of
variance
assumption holds.

53 Test of differences among groups | SOAR Centre


Main result show the
difference academic
level is not significant.

Extremely unequal sample size.


Hochberg result should be used.

54 Test of differences among groups | SOAR Centre


4.3 ANCOVA
ANOVA and t-test are straightforward analyses to test samples. These statistics exclude other
factors outside the interest of a study. However, there are many circumstances where a study has to
take into account other contributing factors that influence the final result. These factors are called
covariates. For example, gender, age and work experience are factors which may not be variables of
interest in the main study, but these variables somehow influence the result. Thus, these factors
should be controlled. Analysis of covariance (ANCOVA) is therefore suitable.

When to use:

1. Dependent variable (DV) data type is scale.


2. Number of independent variable (IV) is 1.
3. Number of sample groups is more than 1.
4. Sample groups are independent of each other. Use one-way repeated measure ANOVA
instead if the samples are related.
5. Number of covariates is 1 or more.

In order to run ANCOVA, the following 5 assumptions have to be met:

1. Linearity. The relationship of covariate and DV is linear. Scatterplot can be used to assess
this relationship.
2. Normality. Each latent variable is normally distributed.
3. Independent. Each participant is measured once and has no influence on others in the
survey or experiment.
4. Homogeneity of variance. Each set of scores is approximately equal. This is assessed by
Levenes test which is part of the ANOVA/ANCOVA result.
5. Homogeneity of regression slope. The regression slope of IV measured on DV has to be the
same for all groups.

To run ANCOVA:

Analyze > General Linear Model > Univariate

In Univariate dialog:
> Move DV to Dependent Variable field
> Move IV to Fixed Factor(s) field
> Move covariate (e.g. gender) to Covariate(s) field

Click Model > Specify Model > Select Full factorial


Click Options > Select DV > move to Display means for > Select Compare main effects >
Display > Descriptive statistics, Estimates effect size, Observed power, and Homogeneity test
> Continue.

55 Test of differences among groups | SOAR Centre


Example:

1. Dataset: ANCOVA 1.sav


2. This exercise is to test significant difference of intention to use e-learning among instructors
who has different level of highest academic qualification. Computer skill level is introduced
as a control variable.

56 Test of differences among groups | SOAR Centre


Homogeneity of
variance
assumption holds.

57 Test of differences among groups | SOAR Centre


Main ANCOVA output.

Covariate COMPSKILL is not


Main ANCOVA output.
significantly related to Intention.
EDU is not significantly related to
Intention.

If EDU is significantly related to


Intention (in previous table), Pairwise
Comparisons will indicate which
group(s) is significantly different.

58 Test of differences among groups | SOAR Centre


5 Test of correlations

5.1 Pearson correlation


When to use:

1. Data type is scale.


2. Number of independent variable (IV) is 1.
3. Sample groups are independent of each other. Use one-way repeated measure ANOVA
instead if the samples are related.
4. Number of covariates is 0.

In order to run independent samples Pearsons r, the following 4 assumptions have to be met:

1. Linearity. Linear relationship between variables.


2. Normality. Each latent variable is normally distributed.
3. Independent. Each participant is measured once and has no influence on others in the
survey or experiment.
4. Homoscedasticity. Error variance is assumed to be the same across all values of other
variable.

To run Pearsons r:

Analyze > Correlate > Bivariate

Select variables > move to Variables field

Correlation coefficients > select Pearson

Test of Significance > select two-tailed

Example:

1. Dataset: Pearson.sav
2. This exercise is to test if attitude correlates with intention to use e-learning.

59 Test of correlations | SOAR Centre


Attitude and Intention have a
positive significant correlation, r(45)
= .284, p = .05.

60 Test of correlations | SOAR Centre


5.2 Regression
5.2.1 Multiple regression analysis
When to use:

1. Data type is scale.


2. Number of independent variable (IV) is > 1.
3. Cases to predictors ratio is reasonable. N = 50 + 8(k) for a full regression model (see Allen &
Bennett, 2010)
4. Number of covariates is 0.

In order to run multiple regression, the following 5 assumptions have to be met:

1. Linearity. Linear relationship between variables.


2. Normality. Each latent variable is normally distributed.
3. Multicollinearity. High predictor-predictor correlation (r > .85) results in unstable regression
model. Check Tolerence and VIF (variance inflation factor) to see if multicollinearity is an
issue in the model.
4. Homoscedasticity. Error variance is assumed to be the same across all values of other
variable.
5. Outliers. Multiple regression is sensitive to outliers. Treatment of outliers is to be made prior
to this analysis.

To run multiple regression:

Analyze > Regression > Linear

Select dependent variable > move to Dependent field

Select independent variables > move to Independent(s) field

Statistics > select Estimates, Confidence intervals level (%): 95, Model fit, Part and partial
correlations, Collinearity diagnostics

Plots > *ZPRED > move to X (axis), *ZRESID > move to Y (axis) > select Normal probability
plot

Save > select Distances: Mahalanobis, and Cooks (this is to check for multivariate outliers)

Options > leave the default parameters setting.

Example:

1. Dataset: Multiple regression.sav


2. This exercise is to test the effects of perceived ease of use (PEOU), perceived usefulness
(USEFUL) and attitude (ATT) on intention (INTENT) to use e-learning.

61 Test of correlations | SOAR Centre


PEOU

INTENT

ATT

USEFUL

62 Test of correlations | SOAR Centre


63 Test of correlations | SOAR Centre
R2 indicates that 41.5%
variations in INTENT can be
explained by IVs.

Changes in F value is
significant from zero.

64 Test of correlations | SOAR Centre


Only USEFUL has
significant impact on
INTENT

Tolerance < .1 indicates


multicollinearity.

VIF > 10 indicates


multicollinearity

PEOU

INTENT

ATT

USEFUL

65 Test of correlations | SOAR Centre


The plot indicates residuals
are normally distributed.

Non-normal if points
substantially deviate from
the diagonal line.

Assessment of normality,
linearity and
homoscedasticity of
residuals.

An absence of clear
pattern of data spread
indicates all three
assumptions above are
met.

66 Test of correlations | SOAR Centre


Cooks distance > 1
warrants a closer
Mahal. distance is to be inspection. Multivariate
compared with critical X2 table outlier may be present.
with df = k. Multivariate
outlier may be present if
critical Mahalanobis exceeds
critical X2

67 Test of correlations | SOAR Centre


5.2.2 Hierarchical multiple regression
Refer to Multiple regression analysis section for assumptions and requirements.

To run hierarchical multiple regression:

Analyze > Regression > Linear

Select dependent variable > move to Dependent field

Select independent variables > move to Independent(s) field

Next > select new IV(s) > move to Independent(s) field

Statistics > select Estimates, Confidence intervals level (%): 95, Model fit, Part and partial
correlations, Collinearity diagnostics

Plots > *ZPRED > move to X (axis), *ZRESID > move to Y (axis) > select Normal probability
plot

Save > select Distances: Mahalanobis, and Cooks (this is to check for multivariate outliers)

Options > leave the default parameters setting.

Example:

1. Dataset: Multiple regression.sav


2. This exercise is to test the effect of compliance (COMPLY) on intention (INTENT), above what
has been accounted for by use e-learning perceived ease of use (PEOU), perceived
usefulness (USEFUL) and attitude (ATT).

Parameters for Statistics, Plots, Save and Options remain the same as in Multipl regression section.

68 Test of correlations | SOAR Centre


Model 1 accounts for 42%
while Model 2 accounts for
44% variations in INTENT .

Both models are significant.

69 Test of correlations | SOAR Centre


Tolerance < .1 indicates
multicollinearity.

VIF > 10 indicates


multicollinearity

See other assessment of assumptions in Multiple regression section.

PEOU

INTENT

ATT

USEFUL

COMPLY

70 Test of correlations | SOAR Centre


6 Non-parametric test

6.1 Mann-Whitney U
When to use:

1. Non-normal data distribution.


2. Independent sample groups.
3. Number of sample groups is 2.

In order to run Mann-Whitney U, the following 2 assumptions have to be met:

1. Scale of measurement. At minimum, DV has to be ordinal.


2. Independent. Each participant is only tested responded once.

To run Mann-Whitney U:

Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples

Select the variables to be tested > move to Test Variable List field

Select grouping variable (e.g. gender, early and late reply) > move to Grouping Variable
field

Define Groups > Group 1 > Type 1 (e.g. 1 is male group) > Group 2 > Type 2 (e.g. 2 is
female group) > Continue

Test Type > select Mann-Whitney U > OK.

Example:

1. Dataset: MannWhitney.sav
2. This exercise is to test any significant difference in attitude toward e-learning system among
male and female participants of an e-learning course.

71 Non-parametric test | SOAR Centre


Mean rank shows that
male slightly lower attitude
towards using e-learning
system.

Mann-Whitney U shows that the


difference of attitude between
male and female is not
significant, p = .65.

72 Non-parametric test | SOAR Centre


6.2 Kruskal-Wallis
When to use:

1. Non-normal data distribution.


2. Dependent variable (DV) data type is scale.
3. Number of sample groups > 2.

In order to run Kruskal-Wallis test, the following 3 assumptions have to be met:

1. Scale of measurement. At minimum, DV has to be ordinal.


2. Independent. Each participant is only tested responded once.
3. Data distribution shape. Similar shape and spread across groups.

To run Kruskal-Wallis:

Analyze > Nonparametric Tests > Legacy Dialogs > k Independent Samples

Select the variables to be tested > move to Test Variable List field

Select grouping variable (e.g. expertise level, age group) > move to Grouping Variable field

Define Range > Minimum 1 > Type 1 (e.g. 1 is a group with the lowest skill) > Maximum >
Type 4 (e.g. 4 is the maximum group level - professional) > Continue

Test Type > select Kruskal-Wallis H> OK.

Example:

1. Dataset: KruskalWallis.sav
2. This exercise is to test any significant difference in intention to use e-learning system among
instructors with different levels of computer skill.

73 Non-parametric test | SOAR Centre


Mean rank shows that
each group has different
level of intention towards
using e-learning system.

74 Non-parametric test | SOAR Centre


Kruskal-Wallis shows that the
difference of Intention among 4
groups of instructor is not
significant, p = .17.

75 Non-parametric test | SOAR Centre


7 Appendices
7.1 Appendix 1: Cleaning your data

Any missing Yes Too many missing.


values?
Discard
Clean up
your data No
Acceptable.
Transform data
Any outliers?
Quality of
Transform data
measurement No
model
Normality?
Remove

Transform data

Consider non-parametric stats.

76 Appendices | SOAR Centre


7.2 Appendix 2: Quality of measurement model

Cronbachs alpha1.1.1
Reliability1.1
Quality of
Validity &
measurement
reliability tests1 Composite reliability1.1.2
model
Content validity1.2

Address biases2
1.3 Domain expert1.2.1
Construct validity

Non-response bias2.1 Literature1.2.2

Common method bias2.2


Small scale post-hoc2.1.1
Convergent [=]1.3.1

Data collection Split data2.1.2 into


design early reply & late reply
(Procedural2.2.2) Discriminant []1.3.2
Post-hoc analysis
(Statistical2.2.3)

77 Appendices | SOAR Centre


7.3 Appendix 3: Useful citations & readings

1. Validity & reliability tests13


1.1. Reliability test (Hair, Black, Babin, & Anderson, 2010)
1.1.1. Cronbachs alpha: > 0.7
1.1.2. Composite reliability (CR): Treshold > 0.7

1.2. Content validity


1.2.1. Domain experts
1.2.2. Literature

1.3. Construct validity (Hair et al., 2010)


1.3.1. Convergent validity: CR > (AVE), AVE > 0.5
1.3.2. Discriminant validity: MSV < AVE, ASV < AVE

2. Address potential biases


2.1. Non-response bias
2.1.1. Small scale post-hoc analysis
2.1.2. Split data into two categories early replies and late reply

2.2. Common (correlated) method bias (see Bagozzi & Yi, 1990; Podsakoff et al., 2003;
Podsakoff et al., 2012)

2.2.1. Procedural remedies


2.2.1.1. Obtain measures of the predictor and criterion variables from different
sources.
2.2.1.2. Use temporal, proximal, psychological or methodological separation of
measurement.
2.2.1.3. Protect respondent anonymity and reduce evaluation apprehension.
2.2.1.4. Counterbalance question order.
2.2.1.5. Improve scale items.

2.2.2. Statistical remedies


2.2.2.1. Harmans single factor test.
2.2.2.2. Partial correlation procedures.
2.2.2.3. Control the effects of a directly measured latent method factor.
2.2.2.4. Control the effects of an unmeasured latent method factor.
2.2.2.5. Multiple methods factors (multi-traits multi-method).
2.2.2.5.1. Confirmatory factor analysis model.
2.2.2.5.2. Correlated uniqueness model.
2.2.2.5.3. Direct product model.

13
CR = Composite Reliability
AVE = Average Variance Extracted
MSV = Maximum Shared Squared Variance
ASV = Average Shared Squared Variance
78 Appendices | SOAR Centre
7.4 Appendix 4: Statistical remedies for CMB

79 Appendices | SOAR Centre


8 References

Allen, P., & Bennett, K. (2010). PASW statistics by SPSS: A practical guide, version 18.0 (1 ed.).
Sydney: Cengage Learning Australia Pty Limited.

Bagozzi, R. P., & Yi, Y. (1990). Assessing method variance in multitrait-multimethod matrices: The
case of self-reported affect and perceptions at work. Journal of Applied Psychology, 75(5),
547-560. doi: 10.1037/0021-9010.75.5.547

Bennett, D. A. (2001). How can I deal with missing data in my study? Australian and New Zealand
Journal of Public Health, 25(5), 464-469. doi: 10.1111/j.1467-842X.2001.tb00294.x

Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. Journal of the Royal Statistical
Society. Series B (Methodological), 26(2), 211-252. doi: 10.2307/2984418

Dinev, T., Goo, J., Hu, Q., & Nam, K. (2009). User behaviour towards protective information
technologies: The role of national cultural differences. Information Systems Journal, 19(4),
391-412. doi: 10.1111/j.1365-2575.2007.00289.x

Doty, D. H., & Glick, W. H. (1998). Common methods bias: Does common methods variance really
bias results? Organizational Research Methods, 1(4), 374-406. doi:
10.1177/109442819814002

Graham, J. W. (2012). Missing Data : Analysis and Design Retrieved from


http://ECU.eblib.com.au/patron/FullRecord.aspx?p=1156148

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7 ed.).
Upper Saddle River, NJ, USA: Prentice-Hall, Inc.

Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate data analysis
(6 ed.). New Jersey: Pearson Prentice Hall.

Hu, Q., Dinev, T., Hart, P., & Cooke, D. (2012). Managing Employee Compliance with Information
Security Policies: The Critical Role of Top Management and Organizational Culture*. Decision
Sciences, 43(4), 615-660. doi: 10.1111/j.1540-5915.2012.00361.x

Karanja, E., Zaveri, J., & Ahmed, A. (2013). How do MIS researchers handle missing data in survey-
based research: A content analysis approach. International Journal of Information
Management, 33(5), 734-751. doi: http://dx.doi.org/10.1016/j.ijinfomgt.2013.05.002

Leslie, L. L. (1972). Are high response rates essential to valid surveys? Social Science Research, 1(3),
323-334. doi: http://dx.doi.org/10.1016/0049-089X(72)90080-4

Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing
values. Journal of the American Statistical Association, 83(404), 1198-1202. doi:
10.2307/2290157

80 References | SOAR Centre


Osborne, J. W. (2010). Improving your data transformations: Applying the Box-Cox transformation.
Practical Assessment, Research & Evaluation, 15(12), 1-6. Retrieved from
http://pareonline.net/pdf/v15n12.pdf

Paxson, M. C. (1995). Increasing survey response rates: Practical instructions from the total-design
method. Cornell Hotel and Restaurant Administration Quarterly, 36(4), 66-66.

Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in
behavioral research: A critical review of the literature and recommended remedies. Journal
of Applied Psychology, 88(5), 879-903. doi: 10.1037/0021-9010.88.5.879

Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of Method Bias in Social
Science Research and Recommendations on How to Control It. Annual Review of Psychology,
63(1), 539-569. doi: doi:10.1146/annurev-psych-120710-100452

Scheffer, J. (2002). Dealing with missing data. Research letters in the information and mathematical
sciences, 3(1), 153-160.

Straub, D., Boudreau, M.-C., & Gefen, D. (2004). Validation guidelines for IS positivist research.
Communications of the Association for Information Systems, 13, 380-427.

Yeh, K.-J. (2009). Reconceptualizing technology use and information system success: Developing and
testing a theoretically integrated model. (Doctor of Philosophy Doctorate thesis), University
of Texas, Arlington, Texas.

81 References | SOAR Centre

View publication stats

Potrebbero piacerti anche