Sei sulla pagina 1di 40

07/05/2015

Faculty Development Program 2015

Research Methodology

Session 11-15

Tea-Test

The t-test assesses whether the means of two groups are


statistically different from each other.

1
07/05/2015

Difference between the two means


t =
Variability or dispersion of the scores

2
07/05/2015

• A teacher wants to know if his introductory RM class has a


good grasp of basic concepts. Six participants are chosen at
random from the class and given a statistic proficiency test.
The teacher wants the class to be able to score above 70 on the
test. The six students get scores of 62, 92, 75, 68, 83, and 95.
Can the professor have 90 percent confidence that the mean
score for the class on the test would be above 70?

Because the computed t-value of 1.71 is larger than the critical value in
the table, the null hypothesis can be rejected, and the professor has
evidence that the class mean on the statistics test would be at least 70.

Two Independent Samples

• Users and non-users of the brand differs in terms of their


brand perceptions
• High income groups spend more time on entertainment
than low income groups

3
07/05/2015

You have 2 samples which may be from 1 distribution or 2.


To assess the likelihood, find how many sds the means of the
2 populations are apart:

How many SD’s?

Calculate t = (μ1 - μ2) / pooled sd

μ1 μ2

• Homogeneity of Variance: The amount of variability in


each of the two groups is equal

X1 – X2
t=

(n1-1)s12 + (n2-1)s22 n1 + n2
(n1 + n2 -2) n1n2

4
07/05/2015

Do visual aids and examples increase the


learning of the students?

• Two groups were taught


• Group 1 was taught in a normal class room
• Group 2 was taught in a classroom and with lots of visual
aids and examples
• Is there any difference in the learning of the students?

5
07/05/2015

Group 1 Group 2
7 5 5 5 3 4
3 4 7 4 2 3
3 6 1 4 5 2
2 10 9 5 4 7
3 10 2 5 4 6
8 5 5 7 6 2
8 1 2 8 7 8
5 1 12 8 7 9
8 4 15 9 5 7
5 3 4 8 6 6

s1 = 3.42 s2 = 2.06

• Step 1: Null Hypothesis

• Step 2: Setting the level of risk


• Step 3: Select appropriate t-statistics: t-test for
independent means
• Step 4: Compute the t-value
• Step 5: determine the critical t-value
• Step 6: compare
• Step 7: decide

6
07/05/2015

If the obtained value does not exceed the critical value the
null hypothesis is the most attractive explanation

• OK! there is a significant difference but what about the


magnitude of difference

• How two groups are different from one another

X1 – X2
ES =
√ σ12 + σ22
2
Relative position of one group to another

7
07/05/2015

Two Dependent (Paired) Samples

• Do people differ in terms of their attitude towards


corruption and ministers
• Is there any difference between a 15 second and a 30
second TV commercial

Tea Test for Related Groups

• The difference between students scores on the pretest and


on the posttest
• Participants are being tested more than once
• There are two groups of scores
• The appropriate test statistic is t test for dependent means.

8
07/05/2015

n= no of pairs of observations

∑D

√ n∑D2- (∑D)2
n-1

Pretest Posttest
3 7
5 8
4 6
6 7
5 8
5 9
4 6
5 6
3 7
6 8
7 8
8 7
7 9
6 10
7 9
8 9

9
07/05/2015

Pretest Posttest
Difference (D) D2
3 7 4 16
5 8 3 9
4 6 2 4
6 7 1 1
5 8 3 9
5 9 4 16
4 6 2 4
5 6 1 1
3 7 4 16
6 8 2 4
7 8 1 1
8 7 1 1
7 9 2 4
6 10 4 16
7 9 2 4
8 9 1 1
∑D = 37 ∑D2 = 107

(∑D)2 = 1369

n= no of pairs of observations

∑D
7.737

√ n∑D2-
n-1
(∑D)2

10
07/05/2015

Are you examining differences between groups on one or more


variables

Examining differences
between groups on one or more
variables

Are the same participants


being tested more than once?
Yes No

How many groups are


How many groups are
you dealing with?
you dealing with?

More More
Two Two
than Two than Two
groups groups
Groups Groups

• What to do when there are more than TWO


Groups?

1‐(1‐α)k
α = Type I error rate
k = No of comparisons

11
07/05/2015

Analysis of Variance

How to decide when means are different enough, relative to the


spread of the observation in each group

Y2
Y2

Y1 Y1

12
07/05/2015

ANOVA looks at the way groups differ internally


versus what the difference is between them.

Determine the existence of a statistically significant


difference among several group means.

The test uses variances to help determine if the means are


equal or not

Three basic assumptions


Each population from which a sample is taken is assumed
to be normal.
Each sample is randomly selected and independent.
The populations are assumed to have equal standard
deviations (or variances).

13
07/05/2015

SSB = Sum of Square Between Groups

SSW = Sum of Squares within groups

TSS = Total Sum of Squares

TSS = SSW + SSB MSB


F Ratio =
MSS = MSW + MSB MSW

WITHIN BETWEEN
difference: difference
group data - group mean group mean - overall mean
data group mean plain squared plain squared
5.3 1 6.00 -0.70 0.490 -0.4 0.194
6.0 1 6.00 0.00 0.000 -0.4 0.194
6.7 1 6.00 0.70 0.490 -0.4 0.194
5.5 2 5.95 -0.45 0.203 -0.5 0.240
6.2 2 5.95 0.25 0.063 -0.5 0.240
6.4 2 5.95 0.45 0.203 -0.5 0.240
5.7 2 5.95 -0.25 0.063 -0.5 0.240
7.5 3 7.53 -0.03 0.001 1.1 1.188
7.2 3 7.53 -0.33 0.109 1.1 1.188
7.9 3 7.53 0.37 0.137 1.1 1.188
TOTAL 1.757 5.106
TOTAL/df 0.25095714 2.55275

overall mean: 6.44 F = 2.5528/0.25025 = 10.21575

14
07/05/2015

If the Between Group Variation is

significantly greater than the Within Group

Variation, then it is likely that there is a

statistically significant difference between the

groups.

Analysis of Variance for days


Source DF SS MS F P
treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00

15
07/05/2015

ANOVA

• The test statistic for ANOVA is an F-ratio, which is a ratio


of two sample variances. In the context of ANOVA, the
sample variances are called mean squares, or MS values.

• The top of the F-ratio MSbetween measures the size of mean


differences between samples. The bottom of the ratio
MSwithin measures the magnitude of differences that would
be expected without any treatment effects.

31

ANOVA

• The F-ratio has the same basic structure as the independent-


measures t statistic.

Obtained mean differences (including treatment effects) MSbetween


F = ─────────────────────── = ───────
Differences expected by chance (without treatment effects) MSwithin

32

16
07/05/2015

ANOVA

The differences (or variance) between means can be caused


by two sources:
1. Treatment Effects: If the treatments have different effects,
this could cause the mean for one treatment to be higher (or
lower) than the mean for another treatment.
2. Chance or Sampling Error: If there is no treatment effect at
all, you would still expect some differences between samples.
Mean differences from one sample to another are an example
of random, unsystematic sampling error.

34

17
07/05/2015

ANOVA

• Within-Treatments Variability: MSwithin measures the size


of the differences that exist inside each of the samples.

• Because all the individuals in a sample receive exactly the


same treatment, any differences (or variance) within a
sample cannot be caused by different treatments.

35

ANOVA

Thus, these differences are caused by only one source:

1. Chance or Error: The unpredictable differences that exist


between individual scores are not caused by any
systematic factors and are simply considered to be
random chance or error.

36

18
07/05/2015

ANOVA

• Considering these sources of variability, the structure of


the F-ratio becomes,

treatment effect + chance/error


F = ──────────────────────
chance/error

37

19
07/05/2015

ANOVA

• To supplement the hypothesis test, it is recommended that


you calculate a measure of effect size.

• For an analysis of variance the common technique for


measuring effect size is to compute the percentage of
variance that is accounted for by the treatment effects.

39

ANOVA

SSbetween treatments
η2 = ───────────
SStotal

40

20
07/05/2015

Product Moment Correlation

• The product moment correlation, r, summarizes the


strength of association between two metric (interval or ratio
scaled) variables, say X and Y.

• It is an index used to determine whether a linear or straight-


line relationship exists between X and Y.

• Also known as the Pearson correlation coefficient.


It is also referred to as simple correlation, bivariate
correlation, or merely the correlation coefficient.

Scatter Diagram

21
07/05/2015

Simple Linear Regression

22
07/05/2015

Simple Linear Regression


Y

X
Y=  + βX + e

Assumption
• The mean or expected value of e is zero
• The relationship between X and Y is linear

23
07/05/2015

24
07/05/2015

25
07/05/2015

Least Square Criterion: min ∑ (yi-yest)2

∑ (Xi-Xavg)(Yi-yavg)
Slope =
∑ (Xi-Xavg)2

 = Yavg-βXavg

Simple Linear Regression


Y
(Xi, Yi)
Yi - Yavg Yi-Yest
Yest-Yavg
Yavg
Yest

X
Y est = 60 + 5X

26
07/05/2015

Total Sum of Square Deviations = ∑ (Yi-Yavg)2

Sum of Square due to Regression


Explained Deviation = ∑ (Yest-Yavg)2

Sum of Square due to Error


Unexplained Deviation = ∑ (Yi-Yest)2
SSR
SST = SSR + SSE R2 =
SST

How well the estimated regression equation fit the data

Coefficient of Determination : How well the estimated


regression equation fit the data

Coefficient of Determination = SSR/ SST

Correlation coefficient = (sign of β ) √(Coefficient of


determination)

27
07/05/2015

SSE
MSE =
(n-k-1)

SSR
MSR = MSR
No of IV F Ratio =
MSE

If F > Fc: Reject H0


If F < Fc: Accept Ho

28
07/05/2015

Multiple Regression
How a dependent variable is related to 2 or more
independent variable
Y=  + β1X1 + β2X2 + β3X3 + β4X4 +… + e

Mean (Y)=  + β1X1 + β2X2 + β3X3 + β4X4 +…

Multiple coefficient of determination R2 = SSR/SST

k (1-R2)
Adj. Multiple coeff. of Determination Ra2 = R2 -
n-k-1

Sum of Mean
Model Squares df Square F Sig.
1 Regression 4.272 3 1.424 3.204 .023(a)
Residual 206.221 464 .444
Total 210.493 467
2 Regression 29.948 4 7.487 19.200 .000(b)
Residual 180.545 463 .390
Total 210.493 467

a Predictors: (Constant), TENURE, GENDER, SALARY


b Predictors: (Constant), TENURE, GENDER, SALARY, HAPPINESS
c Dependent Variable: PERFORMANCE

29
07/05/2015

Testing the assumptions for Regression

Normality (interval level variables)


– Skewness & Kurtosis must lie within acceptable limits
(-1 to +1)
• How to test?
– You can examine a histogram, but SPSS also provides
procedures, and these have convenient rules that can be
applied (see following slides)
• If condition violated?
– Regression procedure can overestimate significance, so
should add a note of caution to the interpretation of
results (increases type I error rate)

Testing the assumptions for Regression

• Linearity & homoscedasticity for interval level


variables
• How to test?
– Scatterplot
• If condition violated?
– Can underestimate significance

30
07/05/2015

The scatterplot for evaluating linearity

• Homoscedasticity refers to the assumption that that


the dependent variable exhibits similar amounts of
variance across the range of values for an
independent variable.

31
07/05/2015

The scatterplot for evaluating homoscedasticity

• Amount of variability in the selected IV not explained by


other IV

Variance Inflation Factor= 1/Tolerance

Multicollinearity

32
07/05/2015

Transformations

• Three common transformations that we use: the


logarithmic transformation, the square root
transformation, and the inverse transformation.

Scale Development

Item Generation

Scale Purification

Reliability and Validity analysis

33
07/05/2015

Factor Analysis

Condense (summarize) the information contained in


several items into smaller set of new, composite
dimensions or variates (factors) with a minimum loss of
information
Identifying structure through data summarization

Data Reduction

Structure of relationships may exist among items or


among respondents

Visit to a Bank

34
07/05/2015

Assumptions in Factor Analysis


Ensure that the data matrix has sufficient correlations to justify
the application of factor analysis

Bartlett test of Sphericity


Measure of Sampling Adequacy (0 to 1)

Statistical probability that correlation matrix has significant


correlations at least among some of the items/ variables
Degree of intercorrelations among the items/ variables
MSA
.80and above Meritorious
.70 and above Middling
.60 and above Mediocre
below.50 unacceptable

Visit to a Bank

Respondents Item 1 Item 2 Item 3 … … Item


18
5 4 3 6 4 3

35
07/05/2015

Correlation Matrix for Item 1 through Item 16

2 3 … … 17
Item 1 .87 .12 .11 .92 .08
Item 2 .07 .17 .89 .13
Item 3 .86 .10 .88
… .14 .91
Item 18 .06

Visit to a Bank

Respondents Item 1 Item 2 Item 3 … … Item


18
5 4 3 6 4 3

36
07/05/2015

Component
1 2 3
1 -.542 .216 .643
2 -.553 .212 .604
3 -.441 .248 .637
4 -.284 .591 -.213
5 -.317 .622 -.192
6 -.247 .582 -.201
7 -.336 .546 -.223
8 -.388 .547 -.230
9 .600 .093 .034
10 .632 .012 -.046
11 .716 .183 .014
12 .793 .158 .080
13 .760 .277 .044
14 .742 .132 .120
15 .711 .230 .172
16 .694 .216 .130
17 .693 .207 .154
18 .710 .230 .172

Criteria for number of Factors to Extract


r=.20
r = .30
r2= .04
r2= .09 r = .70
r2= .49
r = .75
r2= .5625 r = .80
r2= .64

The eigen value for a given factor measures the variance


in all the variables/ items which is accounted for by
that factor.

37
07/05/2015

Criteria for number of Factors to Extract

The Kaiser criterion: Retain only factors with eigen


values greater than 1
The eigen value for a given factor measures the variance
in all the variables/ items which is accounted for by
that factor.
If a factor has a low eigen value, then it is contributing
little to the explanation of variances in the
variables/items and may be ignored as redundant with
more important factors.

Initial Eigen values


Component Total % of Variance Cumulative %
1 6.301 35.007 35.007
2 2.181 12.117 47.124
3 1.535 8.529 55.654
4 .856 4.754 60.407
5 .761 4.229 64.636
6 .727 4.041 68.677
7 .705 3.919 72.596
8 .633 3.514 76.110
9 .599 3.327 79.437
10 .544 3.022 82.459
11 .493 2.740 85.199
12 .474 2.631 87.830
13 .443 2.460 90.290
14 .418 2.323 92.613
15 .373 2.070 94.683
16 .335 1.859 96.542
17 .318 1.767 98.309
18 .304 1.691 100.000

38
07/05/2015

Visit to a Bank

C
D
E
F

A
B

G
H

39
07/05/2015

Rotation does not change the amount of variance


accounted for but simply redistributes the variance
across the factors to facilitate interpretation
C
D
E
Orthogonal F

Varimax Oblique
A
Quartimax B

Equimax
G
H

THANK YOU

40

Potrebbero piacerti anche