Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
5/13/17
Two Variable
Analysis
Descript Single Qualitative
ive Variable
Statistic Methods Quantitative Center
s Spread
Shape
Estimating
Probabilities
Two-Variable Analysis
Thisgraph plots Cartesian
coordinate pairs
Data must be paired, that is, the X value
and the Y value must be for the same
observation.
For example, if Ralph is 72 inches (6 feet)
tall and he weighs 180 lbs, his coordinate
pair would be (72, 180).
If George is 67 inches tall and weighs 135
lbs, his coordinate pair is (67, 135).
We would never put Ralphs height with
Georges weight.
Scatter Plots
Education and Income
25
E
d
u 20
c
a
t
i 15
o
n
i 10
n
Y
e 5
a
r
s
0
0 10 20 30 40 50 60 70 80 90 100
Income in $K
Income and Education
100
90
80
I
n 70
c
o 60
m
e 50
i 40
n
30
$
K 20
10
0
6 8 10 12 14 16 18 20 22
Years of Education
Education and Income Income and Education
25 100
90
20 80
70
15 60
50
10 40
30
5 20
10
0 0
0 20 40 60 80 100 6 8 10 12 14 16 18 20 22
Switched Axes
Theyare an important first step in two-
variable analysis.
Either variable can be on either axis.
In this class we look for linear associations.
A linear association is indicated by the
visual impression of a line.
The more the association resembles a
distinct line, the stronger is the association.
The more the association resembles a
random scattering of points, the weaker is
the association.
Some objectivity?
It
is a measure of strength of a possible
linear relationship between 2 variables.
It is called Co-Variance, because that is what
it is, Variance but with 2 variables.
Covariance Characteristics
Itsunits are worse than
variances.
Thecovariance between incomes and
heights of male executives is 77.72 dollar-
inches.
Covariance Interpreted
Correlation
Covariance Transformed
= {-1, 1}
When = 1 or -1 perfect
correlation.
What does perfect correlation mean?
o A deterministic relationship is one in
which knowledge of the value of one
variable determines the value of the
other exactly.
o Imperfect correlation suggests that
one MAY have a statistical
relationship. A statistical
relationship is one that is
characterized by natural variability in
Correlation
both measurements.
Correlation
enables the interpretation of
covariance to be meaningful, and provides a
measure of how much of a line the variables
make.
= 0 generally implies no linear relationship; the
scatter plot shows no linear pattern at all.
> 0 means that something resembling a positively
sloped line can be seen in the scatter plot; the closer
to 1 the more the plot resembles a line.
< 0 means that something resembling a negatively
sloped line can be seen in the scatter plot; the closer
to -1 the more the plot resembles a line.
Correlation is resistant to changes in units of
measurement.
Correlation is very sensitive to outliers.
Correlation Characteristics
Just because you can calculate a
covariance or correlation does not
mean you have a linear
relationship.
They assume you have checked to see
if the two variables have a linear
relationship.
All they measure is the strength of
LINEAR relationships.
Just because you have a linear
relationship and have calculated a
covariance or correlation does not
But.
mean . . .have a causal
you
relationship.
Causation exists when a change in an
explanatory variable is the direct cause
of a change in a response variable.
How do we know if we have causation?
Randomized, controlled experiment
A reasonable explanation
The connection appears under varying
conditions.
Every thing else is ruled out
Variables are "confounded" if their effects on a
third variable cannot be separated from one
another.
Correlation is NOT
Lurking variables are variables which cause an
effect but which were not included in the analysis.
causation.
Reasons for relationships between
variables other than causation:
o The explanatory variable is not the
"sole" cause of the response variable.
o Confounding variables exist.
o Both variables have a common cause.
o Both variables are changing over time.
o Coincidence.
Spurious Correlation
o a large suggests an association between
two variables that does not truly exist.
DataData
Children -0.114 AnalysisCorrelation
0.472 1.683 1.732
Income -34.609 38.212 -7.774 1.642 243.094
Educatio
Lottery Age Children Income
n
Lottery 1.000
Educatio
-0.620 1.000
n
Age 0.177 -0.178 1.000
Children -0.023 0.107 0.107 1.000
Income -0.589 0.734 -0.042 0.080 1.000
Covariance, Correlation and Excel
IFF you have a causal relationship
(how will you know?), a least squares
line can be calculated.
It is a model of a linear relationship
with a specific characteristic that
allows us to say it is the line of best
fit.
Yi ' = b 0 + b 1 X i
You probably know this as y = mx+b.
b0 =b
b1 = m
Our Example
Objective and Subjective
Subjective
at least partly opinion based
Objective Probabilities
Classical
Empirical or *relative frequencies*
Probability Concepts
Gender and Major of 200 Students
Major
Accountin
Gender g Econ Stats Total
Male 0.22 0.15 0.12 0.49
Female 0.28 0.15 0.08 0.51
Total 0.5 0.3 0.2 1
Simple or Marginal Probability
Joint Probability
Conditional Probability
Independence Rule: If
P(A&B)=P(A)*P(B)independent
A Contingency Table
Economic Class
Survived
I II III Other Total
?
No 122 167
Yes 203 178 212 711
Total 885 2201
A Population at
Risk:
Number of Yes 711
Yes in Class I 203
Total Number of
885
Other
Yes in Class III 178
No in Class II 167
No in Class I 122
Population Size 2201
Yes in Other 212
Fill this contingency table.
Economic Class
Survived
? I II III Other Total
No 122 167 5285 6736 14907
Yes 203 1181 178 212 711
Total 3252 2853 7064 885 2201
1181 = 711-203-178-212
3252 = 122+203
2853 = 167+118
7064 = 2201-885-285-325
5285 = 706-178
6736 = 885-212
14907 = 122+167+528+673 OR 14907 = 2201-711
Independence
Expected
Marginal Economic Status of Population
Probabili
Products Exposed to Risk
ties
Survived
I II III Other Total
?
0.15*0.68
No 0.0884 0.2176 0.272 0.68
= 0.102
0.4*0.32
Yes 0.048 0.0416 0.1024 0.32
= 0.128
Total 0.15 0.13 0.32 0.4 1
Joint Observed
Economic Status of Population
Probabili probabili
Exposed to Risk
ties ties
Survived
I II III Other Total
?
No 0.06 0.08 0.24 0.3 0.68
Yes 0.09 0.05 0.08 0.1 0.32
Total 0.15 0.13 0.32 0.4 1
Independence?
At least one No qualitative
qualitative variable. variables.
Contingency Table Scatter Plots
* May be used for 2 * Two quantitative
quantitative variables
variables. * Either variable may be
Mutually exclusive and on X or Y
collectively exhaustive * Reveals linear
categories associations.
* This assures that the Covariance
probabilities all will * Measures the strength
sum to of a
exactly 1. linear association.
* May be positive, 0 or
If product of marginal
negative
probabilities = joint
* Has very difficult units
probabilities
Summarize Two-Variable to Relationships
independent interpret
* Similar to the idea of a * Because of units, the
Correlation
* Measures the strength SSE (Y i Y i') 2
of a i