Sei sulla pagina 1di 71

1

Research Methods
(Use of Statistical tools in Data Analysis)
Abid Ali Khan PhD (UL, Ireland)
Associate Professor,
Ergonomics Research Division
Department of Mechanical Engineering, Aligarh Muslim University, Aligarh
Email: abidalikhan@zhcet.ac.in
Processing & Analysis of Data
Measurement of central tendency
Mean
Median
Measurement of dispersion
Range
Variance
Standard deviation
Measurement of skewness
2
Central Tendency
3
4
Variance
Need to know the
variability of a data set
How much each number in set
varies from central point
Types of variability
Range
Variance
Standard deviation
5
Skewness
6
Measurement of relationship
Correlation (strength of relationship between DV and IV)
Karl Pearsons coefficient of correlation (r)
7
Normal Distribution /SND
65% of scores fall within
1 st.dev. of mean
95% of scores fall within
2 st.dev. of mean
Only 5% of scores fall in
extreme portions
8
9
Simple Regression Analysis
10
Least square estimates of regression line
11
12
Testing of Hypothesis
(Parametric or Standard Tests of Hypotheses)
What is Hypothesis?
Null Hypothesis
Alternative Hypothesis
The level of significance
Decision rule
Type I and Type II errors
Two tailed and One tailed tests
Power of Hypothesis test
13
14
Important parametric tests
t- test
Z-test
Chi-square test
F-test
ANOVA
ANCOVA
15
Test concerning the mean of
Normal population
Case for known variance
16
17
Case for unknown variance
t-test
18
Test of equality of means of two
normal populations
19
Hypothesis Testing
Tests Concerning MEANS
- known
H
0
: =
0
=
(
0
)
(

)

H
1
Critical Zone
<
0
Z<-Z

>
0
Z>-Z


0
Z<-Z
/2
and Z>Z
/2
- unknown
H
0
: =
0
H
1
Critical Zone
<
0
t<-t

>
0
t>-t


0
t<-t
/2
and t>t
/2
=
(
0
)
(

)
; = 1

1
&
2
- known
H
0
:
1
-
2
= d
0
H
1
Critical Zone

1
-
2
< d
0
Z<-Z

1
-
2
> d
0
Z>-Z

1
-
2
d
0
Z<-Z
/2
and Z>Z
/2
=
(
1

2
)
0

1
2

2
2

1
=
2
- unknown
H
0
:
1
-
2
= d
0
H
1
Critical Zone

1
-
2
< d
0
t<-t

1
-
2
> d
0
t>-t

1
-
2
d
0
t<-t
/2
and t>t
/2
=
(
1

2
)
0

1
+
1

; =
1
+
2
2

=
(
1
1)
1
2
+(
2
1)
2
2
(
1
+
2
2)

1

2
- unknown
H
0
:
1
-
2
= d
0
H
1
Critical Zone

1
-
2
< d
0
t<-t

1
-
2
> d
0
t>-t

1
-
2
d
0
t<-t
/2
and t>t
/2
=
(
1

2
)
0

1
2

2
2



=
(

1
2

2
2

)
2
(

1
2

)
2

1
1
+
(

2
2

)
2

2
1

Pairwise t-test
H
0
:
d
= d
0
H
1
Critical Zone

1
-
2
< d
0
t<-t

1
-
2
> d
0
t>-t

1
-
2
d
0
t<-t
/2
and t>t
/2
=
(


0
)

; = 1
Example
An experiment was performed to compare the abrasive wear of two
different laminated materials. Twelve pieces of material 1 were
tested by exposing each piece to a machine measuring wear. Ten
pieces of material 2 were similarly tested. In each case, the depth of
wear was observed. The samples of material 1 gave an average
wear of 85 units with a sample standard deviation of 4, while the
samples of material 2 gave an average of 81 and a sample standard
deviation of 5. can we conclude at the 0.05 significance that the
abrasive wear of material 1 exceeds that of material 2 by more than
2 units?
Assume he populations to be approximately normal with equal
variances.
Solution
=
(
1

2
)
0

1
+
1

; =
1
+
2
2

=
(
1
1)
1
2
+(
2
1)
2
2
(
1
+
2
2)

=
(85 81) 2
4.478
1
12
+
1
10

; = 12 +10 2
t= 1.04 >1.725 (Critical region)
Decision: Do Not Reject H
0

=

(12 1)4
2
+(10 1)5
2
(12 +10 2)
= 4.478
Hypothesis concerning
variances of Normal Population
29
Example Chi Square-test
A manufacturer of car batteries claims that the life of his batteries is
approximately normally distributed with a standard deviation equal
to 0.9 year. If a random sample of 10 of these batteries has a
standard deviation of 1.2 years, do you think that >0.9 year? Use
0.05 level of significance.
30
Decision:
2
>0.81
Hypothesis concerning the equality of
variances of two normal populations
31
Example F-test
In testing for the difference in the abrasive wear of the two materials (twelve pieces of
material 1 were tested by exposing each piece to machine measuring wear gave an
average wear of 85 units with s1=4; ten pieces of material 2 wear similarly tested and
gave an average wear of 81 units with s2=5), we assumed that the two unknown
population variances are equal. Were we justified in making this assumption? Use a
0.10 level of significance?
32
F=16/25=0.64
Decision: Do not reject H
0
How to chose a Statistical Test?
33
34
Design of Experiments
35
Generalized Design of
Experiments
Goals
Isolate effects of each input variable.
Determine effects of interactions.
Determine magnitude of experimental error
Obtain maximum information for given effort
Basic idea
Expand 1-factor ANOVA to m factors
36
Terminology
Response variable
Measured output value
E.g. total execution time
Factors
Input variables that can be changed
E.g. cache size, clock rate, bytes transmitted
Levels
Specific values of factors (inputs)
Continuous (~bytes) or discrete (type of system)
37
Terminology
Replication
Completely re-run experiment with same input
levels
Used to determine impact of measurement
error
Interaction
Effect of one input factor depends on level of
another input factor
One Way ANOVA (Complete
Randomised Design)
Assumptions & Hypothesis
H
0
:
1
=
2
=.=
k
H
1
: atleast two of means are not equal
Y
ij
=
i
+
ij
= +
i
+
ij
H
0
:
1
=
2
=..=
k
=0
H
1
: atleast one of the
i
s is not equal to zero
k Random Samples
MODEL (One Way ANOVA yij = + i + ij)
Treatment: 1 2 3 ------
---
i ----- k
Y11 Y21 --- ----- Yi1 ----- Yk1
Y12 Y22 ---- ------ Yi2 ----- Yk2
:
:
:
:
:
:
:
:
Y1n Y2n ---- ---- Yin ------ Ykn
Total Y1. Y2. ---- ---- Yi. ---- Yk. y..
Mean: 1. 2. i. k. ..
Total
variability
Sum of Squares
SST = SSA + SSE
Total sum of squares Treatment sum of squares Error Sum of Squares
Degrees of Freedom
(nk-1) (k-1) k(n-1)
ANOVA Table: One Way
ANOVA
Source of
Variation
Sum of
Squares
Degrees
of
freedom
Mean
Squares
F-value p-value
Treatments SSA k-1 MSA=(SSA
/(k-1))
MSA/SSA - level of
significanc
e
Error SSE k(n-1) MSE=(SSE
/k(n-1))
Total SST nk-1 SST/(nk-1)
Conclusions-OneWay ANOVA
The null hypothesis H0 is rejected at the -
level of significance when
F
calculated
> F
[1=(k-1), 2=k(n-1)]
Another approach (p-value)
p- value = at F
[1=(k-1), 2=k(n-1)]
Example- One Way ANOVA
Suppose in an industrial experiment that an engineer is interested in
how the mean absorption of moisture in concrete varies among 5
different concrete aggregates. The samples are exposed to moisture
for 48 hours. It is decided that 6 samples are to be tested for each
aggregate, requiring a total of 30 samples to be tested.
We are interested to make comparisons among 5 populations.
The data are recorded as follows:
Aggregate: 1 2 3 4 5
551 595 639 417 563
457 580 615 449 631
450 508 511 517 522
731 583 573 438 613
499 633 648 415 656
632 517 677 555 679
Results
Source of
Variation
Sum of
Squares
Degrees
of
freedom
Mean
Squares
F-value p-value
Treatments 85356.47 4 21339.12 4.30 0.0088
(i.e.<0.05)
Error 124020.33 25 4960.81
Total 209376.80 29
Decision: Reject H0 and conclude that the aggregate do not have
the same mean observations
Randomised Complete Block
Design (RCBD)
Block 1
t2
t1
t3
Block 2
t1
t3
t2
Block 3
t3
t2
t1
Block 4
t2
t1
t3
A typical layout for the randomised complete block design using 3
measurements in 4 blocks
Treatments Blocks: 1 2 3 4
1 Y11 Y12 Y13 Y14
2 Y21 Y22 Y23 Y24
3 Y31 Y32 Y33 y34
k x b Array for the RCB Design
Treatm
ents
Blocks
:
1 2 --- j ---- b Total Mean
1 Y11 Y12 --- Y1j ---- Y1b Y1. 1.
2 Y21 Y22 -- Y2j ---- Y2b Y2. 2.
: : : : : : : : :
i Yi1 Yi2 ---- Yij ---- Yib Yi. i.
: : : : : : : : ; :
k Yk1 Yk2 --- Ykj ---- Ykb Yk. k.
Total Y.1 Y.2 Y.j Y.b Y..
Mean .1 .2 .j .b ..
Model for RCB Design
Hypothesis

=

+

0
:
1
=
2
=
3

= 0

1
:




Sum of Squares
SST = SSA + SSB + SSE
Total Treatment Block Error
Sum of Squares Sum of Squares Sum of Squares Sum of Squares
Degrees of Freedom
(bk-1) = (k-1) + (b-1) + (k-1)(b-1)
(


..
)
2

=1

=1
= (
.

..
)
2
+ (
.

..
)
2
+

=1

=1
(


.

.
+
..
)
2

=1

=1

ANOVA Table RCB Design
Source of
Variation
Sum of
Squares
Degrees
of
freedom
Mean
Squares
F-value p-value
Treatments SSA k-1 MSA=(SSA
/(k-1))
MSA/MSE - level of
significan
ce
Blocks SSB b-1 MSB=(SSB
/(b-1))
MSB/MSE
Error SSE (k-1)(n-1) MSE=(SSE
/k(n-1))
Total SST kb-1
Conclusions
The null hypothesis H
0
is rejected at the
- level of significance when
F
calculated
> F
[1=(k-1), 2=k(n-1)]
Another approach (p-value)
p- value = at F
[1=(k-1), 2=k(n-1)]
Example-RCB Design
Four different machines, M1, M2, M3, and M4 are being considered for the
assembling of a particular product. It is decided that 6 different operators
are to be used in a randomised block experiment to compare the machines.
The machines are assigned in a random order to each operator. The
operation of the machines requires physical dexterity, and it is anticipated
that there will be a difference among the operators in the speed with which
they operate the machine. The amount of time (in seconds) were recorded
for assembling the product:
Test the hypothesis H0, at the 0.05 level of significance, that the machines
perform at the same mean rate of speed.
Machine Operator: 1 2 3 4 5 6
1 42.5 39.3 39.6 39.9 42.9 43.6
2 39.8 40.1 40.5 42.3 42.5 43.1
3 40.2 40.5 41.3 43.4 44.9 45.1
4 41.3 42.2 43.5 44.2 45.9 42.3
Results- RCB Design (Example)
Source of
Variation
Sum of
Square
s
Degrees
of
freedom
Mean
Squares
F-value p-value
Machines 15.93 3 5.31 3.34 - level of
significan
ce
Operators 42.09 5 8.42 8.42
Error 23.84 15 1.59
Total 81.86 23
Interaction between Blocks &
Treatments
Latin Square Design (LSD)
The randomised block design is very effective for
reducing experimental error by removing one source of
variation
Another design useful in controlling two sources of
variation, while reducing the required number of
treatment combinations is called the LATIN SQUARE
Row Column: 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
A, B, C, & D represents Treatments
Model:
Hypothesis:

=

+

0
:
1
=
2
=

= 0

1
:


Sum of Squares
SST = SSR +SSC + SSTr + SSE
Degrees of Freedom
(r2-1) = (r-1) + (r-1) + (r-1) + (r-1)(r-2)
(

)
2
=

(
..

)
2

+ (
. .

)
2
+ (
..

)
2
+

(


..

. .

..
+2

)
2


ANOVA Table LATIN SQUARE
Design
Source of
Variation
Sum of
Squares
Degrees of
freedom
Mean
Squares
F-value p-value
ROW SSR (r-1) MSR=(SSR/(r-1))
COLUMN SSC (r-1) MSC=(SSC/(r-1))
TREATMENTS SSTr (r-1) MSTr=SSTr/(r-1) F=MSTr/MSE - level of
significance
Error SSE (r-1)(r-2) MSE=SSE/((r-1)(r-
2))
Total SST (r
2
-1)
Example Latin Square Design
To illustrate the analysis of a Latin Square design let us return to the
experiment where the letters A, B, C, & D represent 4 varieties of
wheat; the rows represent 4 different fertilizers; and the columns
account for 4 different years. The date in the table are the yields for
the 4 varieties of wheat, measured in kg per plot. It is assumed that
the various sources of variation do not interact. Using 0.05 level of
significance, test the hypothesis H0: there is no difference in the
average yields of the 4 varieties of wheat.
Fertilizer
Treatment
1981 1982 1983 1984
t1 A
70
B
75
C
68
D
81
t2 D
66
A
59
B
55
C
63
t3 C
59
D
66
A
39
B
42
t4 B
41
C
57
D
39
A
55
Results Example Latin Square Design
Source of
Variation
Sum of
Square
s
Degrees
of
freedom
Mean
Squares
F-value p-value
Fertilizer 1557 3 519.00
Year 418 3 139.33
TREATMENTS 264 3 88.00 2.02 - level of
significance
Error 261 6 43.50
Total 2500 15
60
Two-factor Experiments
Two factors (inputs)
A, B
Separate total variation in output values
into:
Effect due to A
Effect due to B
Effect due to interaction of A and B (AB)
Experimental error
61
Example ??????????????
B (???)
A(??) 1 2 3
1
2
3
4
62
Two-factor ANOVA
Factor A a input levels
Factor B b input levels
n measurements for each input
combination
abn total measurements
63
Two Factors, n Replications
Factor A
1 2 j a
F
a
c
t
o
r

B
1
2

i y
ijk


b
n replications
64
Two-factor ANOVA
Each individual
measurement is
composition of
Overall mean
Effects
Interactions
Measurement errors
error t measuremen
B and A of n interactio to due effect
B to due effect
A to due effect
mean overall
...
...
=
=
=
=
=
+ + + + =
ijk
ij
j
i
ijk ij j i ijk
e
y
e y y


65
Sum-of-Squares
As before, use sum-of-squares identity
SST = SSA + SSB + SSAB + SSE
Degrees of freedom
df(SSA) = a 1
df(SSB) = b 1
df(SSAB) = (a 1)(b 1)
df(SSE) = ab(n 1)
df(SST) = abn - 1
66
Two-Factor ANOVA
)] 1 ( ), 1 )( 1 ( ; 1 [ )] 1 ( ), 1 ( ; 1 [ )] 1 ( ), 1 ( ; 1 [
2 2 2 2 2 2
2 2 2 2
Tabulated
Computed
)] 1 ( [ )] 1 )( 1 [( ) 1 ( ) 1 ( square Mean
) 1 ( ) 1 )( 1 ( 1 1 freedom Deg
squares of Sum
Error AB B A

= = =
= = = =

n ab b a n ab b n ab a
e ab ab e b b e a a
e ab b a
F F F F
s s F s s F s s F F
n ab SSE s b a SSAB s b SSB s a SSA s
n ab b a b a
SSE SSAB SSB SSA

67
Need for Replications
If n=1
Only one measurement of each configuration
Can then be shown that
SSAB = SST SSA SSB
Since
SSE = SST SSA SSB SSAB
We have
SSE = 0
68
Generalized m-factor
Experiments
effects total 1 2
ns interactio factor - 1
ns interactio factor - three
3
ns interactio factor - two
2
effects main
factors
m

m
m
m
m
m
m
m

Ef f ec t s f or 3
f ac t or s:
A
B
C
AB
AC
BC
ABC
69
Degrees of Freedom for m-
factor Experiments
df(SSA) = (a-1)
df(SSB) = (b-1)
df(SSC) = (c-1)
df(SSAB) = (a-1)(b-1)
df(SSAC) = (a-1)(c-1)

df(SSE) = abc(n-1)
df(SSAB) = abcn-1
70
Procedure for Generalized
m-factor Experiments
1. Calculate (2
m
-1) sum of squares terms (SSx)
and SSE
2. Determine degrees of freedomfor each SSx
3. Calculate mean squares (variances)
4. Calculate F statistics
5. Find critical F values from table
6. If F(computed) > F(table), (1-) confidence that
effect is statistically significant
Thank you
71

Potrebbero piacerti anche