Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
5:
The purpose of the problem is to see if there is a correlation between first (x, y), (x, z)scorned to see if there is
correlation between X Y Z. the sample included 3 variables each of them has 5 observation.
Simple
Statistic
s
Variable
Mean
Std Dev
Sum
Minimu
m
Maximu
m
7.80000
4.54973
39.00000
3.00000
13.00000
10.20000
4.32435
51.00000
5.00000
15.00000
4.60000
2.88097
23.00000
1.00000
8.00000
0.965090.0078
-0.975250.0047
There is strong correlation between X and Y (r=0.96) and also there is strong correlation between X and Z
(r=0.97).
3 Variables:
Simple
Statistic
s
Variable
Mean
Std Dev
Sum
Minimu
m
Maximu
m
4.60000
2.88097
23.00000
1.00000
8.00000
7.80000
4.54973
39.00000
3.00000
13.00000
10.20000
4.32435
51.00000
5.00000
15.00000
Pearson Correla
tion Coefficients
, N = 5Prob > |r|
under H0: Rho=
0
X
1.00000
0.965090.0078
-0.975250.0047
0.965090.0078
1.00000
-0.963170.0084
-0.975250.0047
-0.963170.0084
1.00000
Read
Number of Observations Used
Analysis of
Variance
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
Model
77.11928
77.11928
40.73
0.0078
Error
5.68072
1.89357
Corrected
Total
82.80000
Source
Root MSE
1.37607 R-Square
0.9314
Dependent
Mean
0.9085
Coeff Var
17.64195
Parameter
Estimates
DF
Paramete
rEstimate
Standard
Error
t Value
Pr > |t|
Intercept
0.78916
1.25920
0.63
0.5753
1.52410
0.23882
6.38
0.0078
Variable
The equation of the regression line to predict Y from X is Y=0.78+1.52Xthis line has a coefficient of determination
of 0.93 meaning that 93% of the scatter or variation can be explained by using X to predict y and a s.e.e. of 1.37
meaning that roughly 95% of the time our quit Y prediction could be off .
data d;
input X Y Z;
DATALINES;
1 3 15
7 13 7
8 12 5
3 4 14
4 7 10
;
ODS GRAPHICS ON;
ODS RTF FILE='D.RTF';
PROC CORR ;
VAR X;
WITH Y Z;
RUN;
PROC CORR ;
VAR X Y Z;
RUN;
PROC REG;
MODEL Y=X;
RUN;
ODS GRAPHICS OFF;
ODS RTF CLOSE;
RUN;
Problem 5.3:
2 Variables:
AGE
SBP
Simple
Statistic
s
Minimu
m
Maximu
m
180.0000
0
15.00000
50.00000
796.0000
0
116.0000
0
150.0000
0
Variable
Mean
Std Dev
Sum
AGE
30.00000
13.03840
SBP
132.6666
7
14.00952
Pearson Correlation C
oefficients, N = 6Prob
SBP
AGE
1.00000
0.952580.0033
SBP
0.952580.0033
1.00000
The purpose of the problem was to compete the correlation between the age and sbp. The sample
was collected from 6 person including the age and their SBP. There is a strong correlation between
the age of the person and his SBP (r=0.95, p=0.0033).
DATA P;
INPUT AGE SBP;
DATALINES;
15 116
20 120
25 130
30 132
40 150
50 148
;
ODS GRAPHICS ON;
ODS RTF FILE='P.RTF';
PROC CORR;
VAR AGE SBP;
RUN;
ODS GRAPHICS OFF;
ODS RTF CLOSE;
RUN;
Mean
Std Dev
Minimu
m
1010
7214.90
5183.80
1679.23
1242.42
4534.00
3216.00
2 Variables:
Maximu
m
Skewne
ss
9821.00
7001.00
0.017936
5
0.022890
7
Attendance Sales
Pearson Correlation C
oefficients, N = 10
Prob > |r| under H0: R
ho=0
Attendance
Attendance
Sales
1.00000
0.93748<.0001
Sales
0.93748<.0001
1.00000
10
10
Analysis of
Variance
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
Model
12209634
12209634
58.04
<.0001
Error
1682814
210352
Corrected
Total
13892448
Source
Root MSE
458.64114 R-Square
Dependent
Mean
0.8789
Coeff Var
0.8637
8.84759
Parameter
Estimates
DF
Paramete
rEstimate
Standard
Error
t Value
Pr > |t|
Intercept
179.41977
672.68015
0.27
0.7964
Attendanc
e
0.69362
0.09104
7.62
<.0001
Variable
e) yes.
f) The daily attendance and the number of hot dog sales at a local ball park are studied over a
period of time for ten games. The study was to see if hotdog sales could be predicted as a
result of the attendance at the game. The attendance at the games for this sample ranged
from 4534 to 9821 with a mean attendance of 7214.9 and s = 1679.2. There was a strong
positive correlation between hot dog sales and attendance, ( r= 0.94)it was decided to conduct a
pilot study that would do a regression analysis on the data to predict hotdog sales from
attendance. The model was significant and developed the predictor equation for hotdog sales to
be:
Sales = 179.42 + 0.69Attendance valid for number attending between 4534 and 9821. The
coefficient of determination, R-squared indicates that around 87.9 % of the variation about the
line could be explained by using daily attendance to explain hotdog sales. The s.e.e. was
around 458.6 which is the average amount each of our predictions should be off.
data hotdogs;
input Attendance Sales ;
datalines;
8747 6845
5857 4168
8360 5348
6945 5687
8688 6007
4534 3216
7450 5018
5874 4652
9821 7001
5873 3896
;
ods graphics on;
ods rtf file = 'hotdogs.rtf';
proc means n mean stddev min max skew ;
var Attendance Sales ;
run;
proc corr nosimple ;
var Attendance Sales ;
run;
proc reg ;
model Sales = Attendance ;
run;
ods graphics off;
ods rtf close ;
run;
Taxes problem:
Variable
income
taxes
Mean
Std Dev
Minimu
m
1111
75.23636
36
21.51818
18
48.18655
98
7.282145
0
9.100000
0
10.20000
00
2 Variables:
Pearson Correlation C
oefficients, N = 11
Prob > |r| under H0: R
income taxes
Maximu
m
Skewne
ss
150.7000
000
30.40000
00
0.291914
70.364562
2
ho=0
income
taxes
1.00000
0.834160.0014
0.834160.0014
1.00000
income
taxes
Number of Observations Read
11
11
Analysis of
Variance
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
Model
368.98798
368.98798
20.59
0.0014
Error
161.30839
17.92315
10
530.29636
Source
Corrected
Total
Root MSE
4.23357 R-Square
Dependent
Mean
Coeff Var
19.67441
0.6958
0.6620
Parameter
Estimates
DF
Paramete
rEstimate
Standard
Error
t Value
Pr > |t|
Intercept
12.03382
2.44923
4.91
0.0008
Income
0.12606
0.02778
4.54
0.0014
Variable
The purpose of the study was to see if taxes could be predicted by the income of individual. A
sample of 11 individual 2009 and taxes returns. The income for this sample ranged from 9.1 to
150.7with a mean taxes of 21.51 and s = 7.28. There was positive correlation between income
and taxes, ( r= 0.83) The model was significant and developed the predictor equation for tax to
be: taxes=12.033+0.12 income valid for number attending between9.1and150.7. The coefficient
of determination, R-squared indicates that around 69% of the variation about the line could be
explained by income to predict taxes. The s.e.e. was around 4.23which is the average amount
each of our predictions should be off.
data tax;
input income taxes ;
datalines;
38.7 16.0
80.5 20.1
14.8 11.1
47.3 24.3
9.1 10.2
150.7 30.4
55.9 27.3
110.2 27.9
73.2 16.2
146.8 29.8
100.4 23.4
ods graphics on;
ods rtf file = 'tax.rtf';
proc means n mean stddev min max skew ;
var income taxes ;
run;
proc corr nosimple ;
var income taxes ;
run;
proc reg ;
model taxes=income ;
run;
ods graphics off;
ods rtf close ;
run