Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
=
0
+
where
d
0
= displacement of the particle from the origin at time
t = 0; and
v = velocity.
Regression Analysis
The collection of statistical tools that are used to model
and explore relationships between variables that are
related in a nondeterministic manner
Used because there are many situations where the
relationship between variables is not deterministic
Examples:
- The electrical energy consumption of a house (y) is
related to the size of the house (x, in ft
2
).
- The fuel usage of an automobile (y) is related to the
vehicle weight (x).
Simple Linear Regression
Single regressor variable or predictor variable x and a
dependent or response variable Y
The expected value of Y for each value of x is
| =
0
+
1
, where the intercept
0
and slope
1
are unknown regression coefficients.
We assume Y can be described by the model
=
0
+
1
+ (Equation 11-2), where is a
random error with mean zero and (unknown) variance
2
.
Simple Linear Regression
The random errors corresponding to different
observations are also assumed to be uncorrelated
random variables.
Regression model may be thought as an empirical
model.
Method of Least Squares
Suppose that we have n pairs of observations
1
,
1
,
2
,
2
, ,
=
0
+
1
, = 1, 2, ,
Equation (11-3)
and the sum of the squares of the deviations of the
observations from the true regression line is
=
=1
=
=1
Equation (11-4)
Method of Least Squares
The least squares estimators of
0
and
1
, say
0
and
1
,
must satisfy
0
,
1
= 2
=1
= 0
0
,
1
= 2
=1
= 0
Equations (11-5)
Method of Least Squares
Simplifying Equations (11-5)
0
+
=1
=
=1
=1
+
=1
=
=1
Equations 11-6 (least squares normal equations)
Least Squares Estimates
0
=
1
Equation 11-7
1
=
=1
=1
=1
2
=1
=1
Equation 11-8
where = 1
=1
and = 1
=1
.
Least Squares Estimates
Notationally, it is occasionally convenient to give special
symbols to the numerator and denominator of Equation
11-8. Given data
1
,
1
,
2
,
2
, ,
, let
=1
=
=1
=1
Equation 11-10 (denominator) and
=1
=
=1
=1
=1
Equation 11-11 (numerator)
Fitted or Estimated Regression Line
=
0
+
1
Equation 11-9
Note that each pair of observations satisfies the
relationship
0
+
1
+
, = 1, 2, ,
where
= 765.98 and
Sample Correlation Coefficient
Interpretations of r
1.00 perfect positive (negative) correlation
0.91 - 0.99 very high positive (negative) correlation
0.71 - 0.90 high positive (negative) correlation
0.51 - 0.70 moderate positive (negative) correlation
0.31 - 0.50 low positive (negative) correlation
0.01 - 0.30 negligible positive (negative) correlation
0.00 no correlation
Coefficient of Determination
Denoted by r
2
A descriptive measure of the strength of the regression
relationship, a measure of how well the regression line
fits the data
Ordinarily, we do not use r
2
for inference about
2
.
Coefficient of Determination
11-13/400 A study of the amount of rainfall and the
quantity of air pollution removed produced the
following data:
Daily Rainfall, (0.01
cm)
Particulate Removed,
(g/m
3
)
4.3 126
4.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108
Coefficient of Determination
11-13/400
(a) Find the equation of the regression line to predict the
particulate removed from the amount of daily rainfall.
(b) Estimate the amount of particulate removed when the
daily rainfall is = 4.8 units.
Coefficient of Determination
11-43/436 With reference to Exercise 11.13 on page 400,
assume a bivariate normal distribution for and .
(a) Calculate .
(b) Test the null hypothesis that = 0.5 against the
alternative that < 0.5 at the 0.025 level of
significance.
(c) Determine the percentage of the variation in the
amount of particulate removed that is due to changes
in the daily amount of rainfall.
Do not answer questions (b) and (c).
Summary
A scatter diagram displays observations on two
variables, x and y. Each observation is represented by a
point showing its x-y coordinates. The scatter diagram
can be very effective in revealing the joint variability of
x and y or the nature of relationship between them.
The method of least squares is used to estimate the
parameters of a system by minimizing the sum of the
squares of the differences between the observed
values and the fitted or predicted values from the
system.
Summary
Generally, correlation is a measure of the
interdependence among data. The concept may
include more than two variables. The term is most
commonly used in a narrow sense to express the
relationship between quantitative variables or ranks.
The correlation coefficient (r) is a dimensionless
measure of the linear association between two
variables, usually lying in the interval from 1 to +1,
with zero indicating the absence of correlation (but not
necessarily the independence of the two variables.)
Summary
The coefficient of determination (r
2
) is often used to
judge the adequacy of a regression mode. Its value
tells that the model accounts for r
2
% of the variability
in the data.
References
Aczel-Sounderpandian. Business Statistics, 7
th
Ed.
2008
Montgomery and Runger. Applied Statistics and
Probability for Engineers, 5
th
Ed. 2011
Walpole, et al. Probability and Statistics for Engineers
and Scientists 9
th
Ed. 2012, 2007, 2002