Sei sulla pagina 1di 84

Statistical Treatment of

phenomenal Data

Dr. Manisha Jain
Deptt. of Applied Mathematics
AMITY UNIVERSITY
GWALIOR, M. P.
What is Research ???
What is a Scientific Research
A systematic and objective attempt to
provide answers to certain question
Develop an organize body of knowledge
The search for knowledge
Develop new theories

Scientific Research Process
Observations and Formation of the topic
Hypothesis
Conceptual definitions
Operational definition
Gathering of data
Analysis of data
Test, revising of hypothesis
Conclusion, reiteration if necessary

Research Method
Exploratory research: which structures and identifies
new problems
Constructive research: which develops solutions to a
problem
Empirical research: which tests the feasibility of a
solution using empirical evidence
Qualitative research: understanding of human behavior
and the reasons that govern such behavior
Quantitative research: systematic empirical investigation
of quantitative properties and phenomena and their
relationships



Introduction
Most of the research deal with
problems having more than one
variable
Our interest is to find the relationship
between variables
Every research need Quantitative and
Qualitative Analysis to draw the
result
Qualitative research

1. Researchers aim to gather an in-depth
understanding of human behavior and
the reasons that govern such behavior.
2. The qualitative method investigates
the why and how of decision making, not
just what, where





Quantitative research

1. Refers to the systematic empirical investigation of
phenomena via statistical, mathematical or
computational techniques. The objective of
quantitative research is to develop and employ
mathematical models, theories and/or hypotheses
pertaining to phenomena.

1. It provides the fundamental connection between
empirical observation and mathematical expression of
quantitative relationships.



Model
Data Analysis

Collection of Data
Surveys Questionnaires

Organize
Analyse of Data
(appropriate
Statistical Techniques)
Reliability of the
Model
Sensitivity
Analysis
Experiments
Model Construction
Is the Problem Significant?
Is the problem a new one?
Can the problem be solved by the process
of research?
Has the problem theoretical value?
Is the problem workable?
Patent data are made available?
What is Statistic
Statistics is the study of the
1. collection,
2. organization,
3. interpretation of data
Statistics Type
Descriptive statistics The branch of statistics
which describes or summarizes information about a
population or sample
Examples:

The number of employees with MBA degrees in
an organization
The number of students who failed to qualify
their final examination at CIIT
Inferential statistics The branch of statistics
which is used to make inferences or judgments
about a population on the basis of a sample

Examples:
The demand for a new Product say long lasting
perfume (X ) based on a sample conducted in
Region Y
The general election result based on a
representative survey of voters in electoral
district Z

1. Measures of Central Tentancy MCT
(Average)
2. Measures of Variability (Measures of
Deviation)
3. Measures of Skewness (Degree of asymetry,
frequency distribution)
4. Measures of Kurtosis (Peakedness or
flatness of the frequency curves)
Charecteristics of Data
Statistical Data Analysis
(Frequency Distribution (absolute distribution))
A frequency distribution (or frequency table) is a set of
data which records the number of times a particular
vlaue of a variable, or range of values of a variable,
occurs
Amount Deposited (Rs.) Frequency
Less than 50,000
50,000 100,000
Above 100,000
6700
1240
375
Total = 8,315
Statistical Data Analysis
(Frequency Distribution (relative distribution))
Amount Deposited (Rs.) Frequency
Less than 50,000
50,000 100,000
Above 100,000
80%
15%
5%
Total = 100%
Amount Deposited (Rs.) Frequency
Less than 50,000
50,000 100,000
Above 100,000
0.80
0.15
0.05
Total = 1
R
e
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y

Measures of Central Tendency
Purpose : measures of central tendency
is to determine the average value in a set
of values
There are three measures of central
tendency:
(Arithmetic) Mean
Median
Mode
Measures of Central Tendency
(Arithmetic Mean)
The arithmetic mean is the average of all the values
under consideration


Branch Revenue
1
2
3
4
50,000,000
150,000,000
40,000,000
60,000,000
Total = 300,000,000
Arithmetic Mean = 300,000,000 / 4 = 75,000,000
Measures of Central Tendency
(Median)
The Median is the midpoint of the distribution of values
under consideration

Salesperson Number of Sales
Calls
1
2
3
4
5
6
7
8
4
3
2
5
3
3
1
5
Median = 3
1 2 3 3 3 4 5 5
Measures of Central Tendency
(Mode)
The Mode is the value that occurs most frequently in
the distribution of values under consideration

Salesperson Number of Sales
Calls
1
2
3
4
5
6
7
8
4
3
2
5
3
3
1
5
Mode = 3
Measures of Variability
The Mean Deviation
The Variance


The Standard Deviation
. .D S Var =
Significance of Varience
Suppose a medical supplies company that produces disposable
syringes. Each is wrapped in a sterile package and then jumble
packed in a large carton. Now a company needs an estimate of the
number of syringes per carton for billing purpose. So we have to
take a sample of 35 carton at random and recorded the number of
syringes in each carton

101 110 97 103 102 93 97
105 102 94 100 98 97 110
97 107 103 100 93 99 98
93 106 105 98 110 100 106
114 100 112 97 112 99 112
syringes Var
S
D S Var
n
x n
n
x
D S S
01 . 6 12 . 36
12 . 36
34
) 102 ( 35
34
368 , 365
. .
) 1 (
) (
) 1 (
) . (
2
2
2
2
2
= =
= =
=

STATISTICAL TESTS

RELATIONSHIP BETWEEN VARIABLES
Correlation
Regression
TESTING HYPOTHESES STATISTICALLY
t-test
F-test
Chi-Square
Z- test
COMPARING MORE THAN TWO GROUPS
One way ANOVA
Two way ANOVA
Factorial ANOVA
Repeated Measures and ANOVA



Relation between
variables
Regression Analysis

Total & Partial Simple Multiple Linear & Non Linear
Correlation(r)
(measure of the
strength of the linear
relationship b/w values)
STATISTICAL TESTS

RELATIONSHIP BETWEEN VARIABLES
Regression analysis : is widely used for prediction and
forecasting
Correlation : How the variables are related
Pearson Correlation Coefficient




1 1 s s = r
S S
S
r
yy xx
xy
( )
( )( )
( )
2
2
xx
xy
yy
S x x
S x x y y
S y y
=
=
=

Correlation
Simple Linear Regression Model
y = |
0
+ |
1
x +c
where:
|
0
and |
1
are called parameters of the model,
c is a random variable called the error term.
The simple linear regression model is:
|
1
> 0 Positive Association
|
1
< 0 Negative Association
|
1
= 0 No Association
Simple Linear Regression Equation
Positive Linear Relationship
E(y)
x
Slope |
1
is positive
Regression line
Intercept
|
0

Negative Linear Relationship
E(y)
x
Slope |
1
is negative
Regression line
Intercept
|
0

No Relationship
E(y)
x
Slope |
1
is 0
Regression line
Intercept
|
0

Non Linear Regression
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation
0 1

y b b x = +
is the estimated value of y for a given x value.
y
b
1
is the slope of the line.
b
0
is the y intercept of the line.
The graph is called the estimated regression line.
Estimation Process

Regression Model
y = |
0
+ |
1
x +c
Regression Equation
E(y) = |
0
+ |
1
x
Unknown Parameters
|
0
, |
1

Sample Data:
x y
x
1
y
1
. .
. .
x
n
y
n

b
0
and b
1

provide estimates of
|
0
and |
1
Estimated
Regression Equation

Sample Statistics
b
0
, b
1

0 1

y b b x = +
Least Squares Method
Least Squares Criterion
where:
y
i
= observed value of the dependent variable
for the ith observation
^
y
i
= estimated value of the dependent variable
for the ith observation
2

min ( )
i i
y y

Least Squares Graphically


e
2
Y
X
e
1
e
3
e
4
Y b b X e
2 0 1 2 2
= + +
i i
X b b Y
1 0

+ =
LS minimizes e e e e e
i
i
n
2
1
1
2
2
2
3
2
4
2
= + + +
=
Regression
Equation
Regression
Model
Least Squares Method
Slope for the Estimated Regression
Equation

1
2
( )( )
( )
i i
i
x x y y
b
x x

=

where:
x
i
= value of independent variables
y
i
= value of dependent variables
y-Intercept for the Estimated Regression Equation




Regression Analysis : Least Squares Method
0 1
b y b x =
where:

_
y = mean value for dependent variable
_
x = mean value for independent variable
Youre a marketing analyst for Toys. We
gather the following data:

Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4

what is the relationship between sales &
advertising?
How to run Regression in Microsoft Excel
1. Right click on the File Button on the top left corner and
choose Customize Quick Access Toolbar
2. Choose Add-Ins from the left menu bar and choose Excel
Add-ins from the right hand side
3. Make sure that you tick Anaylsys ToolPak, other add-ins you
can also add into your Excel 2007.
4. Once you press OK, you can see Data Anaylsis under the
Data tab from the Ribbon.
5. Now lets look at an example of how to use Regression,
consider the following data set.
Run a regression to test whether there is a relationship between
Stock A and Market Index or not. We firstly Select Data Analysis
from the Data tab.
The result will be shown as below:
Using Excels Regression Tool
First enter the data into Excel worksheet, and then:
Step 3 Choose Regression from the list of
Analysis Tools
Step 2 Choose the Data Analysis option
Step 1 Select the Tools menu
Using Excels Regression Tool
Excel Regression Dialog Box
Click
Excel Solution
= -0.10 + 0.70 X
Data
Regression Statistics Output
ANOVA Output
Estimated Regression
Equation Output
Form the Equation
assume the value of an automobile
decreases by a constant amount each year
after its purchase, and for each mile it is
driven.
value= price+dep*age+depmiles*miles
Form the Equation
The value of a used airplane decreases for
each year of its age
Assuming the value of a plane falls by the
same amount each year
value = p0 + p1*Age
But
it is a well-known fact that planes (and
automobiles) lose more value the first
year than the second, and more the
second than the third, etc.
This means that a linear (straight-line)
function cannot accurately model this
situation. A better, nonlinear, function is:
value = p0 + p1*exp(-p2*Age)
TESTING HYPOTHESES STATISTICALLY

t-test

F-test

Chi-Square

Z- test
Vocabulary
used in Hypothesis Analysis
Null hypothesis (H
0
)
Alternative hypothesis (H
1
)
Alpha ()
Test statistic
P-value
Type I error - a rejection of a true null hypothesis
Type II error - a retention of an incorrect null hypothesis
Confidence (1 - ) - the complement of alpha.
Beta () - the probability of a type II error
Power (1 - ) - the complement of
Hypothesis
Hypothesis is the testable statement of the
relationship between the variables. It should be
1. Conceptually clear
2. Testable
3. General in scope
4. Should be in accord with other hypothesis

Types of Hypothesis
The Null Hypothesis : H
0
:
No difference hypothesis and it is assumed for the
purpose of rejection (No difference b/w two values)
The Alternate Hypothesis : H
1
:

Operational or testable statement



Population
Assume the
population
mean age is 50.
(Null Hypothesis)
REJECT
The Sample
Mean Is 20
Sample
Null Hypothesis
50? 20 = ~ = X Is
Hypothesis Testing Process
No, not likely!
Reason for Rejecting H
0

Reject H
0
when difference found is statistically
significant
Non Rejection or Acceptance of the H
0
implies
that the observed difference is due to chance
Steps Involved in Hypothesis
Testing
Step A: Null and alternative hypotheses
claim of no difference. / claim of a difference in the population,

Step B: Test statistic Z- Test, t-Test, Chi Square Test etc.

Step C: p Value and conclusion

The test statistic Conditional probability (p)

The P- value answers the question If the null hypothesis were true,
what is the probability of observing the current data or data that is
more extreme?






Contd.

1. When p value > .10 the observed difference is not significant
2. When p value .10 the observed difference is marginally
significant
3. When p value .05 the observed difference is significant
4. When p value .01 the observed difference is highly
significant






Level of Significance, o and the
Rejection Region
H
0
: > 3
H
1
: < 3
0
0
0
H
0
: s 3
H
1
: > 3
H
0
: = 3
H
1
: = 3

o/2
o Critical
Value(s)
Rejection
Regions
Acceptance
Regions
Z
0
o
Reject H
0
Z
0
Reject H
0
o
H
0
: > 0 (L. H. S.)
H
1
: < 0
H
0
: s 0 (R. H. S.)
H
1
: > 0
Must Be Significantly
Below = 0
Small values dont contradict H
0
Dont Reject H
0
!
Rejection Region
t test statistic, with n-1 degrees of freedom
t-Test: Unknown
n
S
X
t

=
Example: One Tail t-Test
Does an average box of cereal
contain more than 368 grams
of cereal? A random sample of
36 boxes showed X = 372.5,
and o = 15. Test at the o=0.01
level.
368 gm.
H
0
: s 368
H
1
: > 368
o is not given,
Example Solution: One Tail
80 . 1
36
15
368 5 . 372
=

=
n
S
X
t

Here t(1.80)<t (2.43) (critical) and p(0.04)<0.05


The observed difference is significant
So DO NOT REJECT NULL HYPOTHESIS
H
0
: s 368
H
1
: > 368
Example:Z Test for Proportion
Problem: A marketing company claims that it
receives 4% responses from its Mailing.
Approach: To test this claim, a random sample
of 500 were surveyed with 25 responses.
Solution: Test at the o = .05 significance level.

Z Test for Proportion: Solution
Critical Values: 1.96
Decision:
Conclusion:
We do not have sufficient
evidence to reject the companys
claim of 4% response rate.
Z 0
Reject Reject
.025 .025
a
= 1.14
Here z(1.14)<z(1.95) (critical) 1.14 lying in (-1.96, 1.96)
So DO NOT REJECT NULL HYPOTHESIS
Analysis of Variance
The ANalysis Of VAriance (or ANOVA) is a
powerful and common statistical procedure. It can
handle a variety of situations.

You may be interested to treatment effect & error

between groups estimate
error within groups estimate

ONE WAY ANOVA
A researcher would like to find out whether a man's
nickname affects his cholesterol reading. She records the
cholesterol readings of 23 men nicknamed Sam, 24 men
nicknamed Lou and 19 men nicknamed Mac

NULL HYPOTHESIS : average cholesterol readings of all
Sams, all Lous and all Macs are not different

Example
p value (0.2286) > .10 the observed difference is not
significant
Accept the Null Hypothesis

Two Way Anova
Tissue Culture
Shoot Length Number of Shoots
Tissue Culture
Null Hypothesis :
Between the group - There is no significant changes in reading of diff. s. c
Among the group There is no role of gelling agents in Shoot Length
One Way ANOVA
Are there subcultures required?
p value > .10 the observed difference is not significant
Two Way ANOVA

Thank You

Potrebbero piacerti anche