Sei sulla pagina 1di 32

MATH30-6

Probability and Statistics


Multivariate Analysis
Objectives
At the end of the lesson, the students are expected to
Construct a scatter diagram;
Use simple linear regression for building empirical
models to engineering and scientific data;
Understand how the method of least squares is used to
estimate the parameters in a linear regression model;
and
Interpret the different values obtained.
Deterministic Relationship
A model that predicts variable perfectly
Example:
The displacement (d
t
) of a particle at a certain time is
related to its velocity.

=
0
+
where
d
0
= displacement of the particle from the origin at time
t = 0; and
v = velocity.
Regression Analysis
The collection of statistical tools that are used to model
and explore relationships between variables that are
related in a nondeterministic manner
Used because there are many situations where the
relationship between variables is not deterministic
Examples:
- The electrical energy consumption of a house (y) is
related to the size of the house (x, in ft
2
).
- The fuel usage of an automobile (y) is related to the
vehicle weight (x).
Simple Linear Regression
Single regressor variable or predictor variable x and a
dependent or response variable Y
The expected value of Y for each value of x is
| =
0
+
1
, where the intercept
0
and slope

1
are unknown regression coefficients.
We assume Y can be described by the model
=
0
+
1
+ (Equation 11-2), where is a
random error with mean zero and (unknown) variance

2
.
Simple Linear Regression
The random errors corresponding to different
observations are also assumed to be uncorrelated
random variables.
Regression model may be thought as an empirical
model.
Method of Least Squares
Suppose that we have n pairs of observations

1
,
1
,
2
,
2
, ,

. See Fig. 11-3.


The estimates of
0
and
1
should result in a line that is
(in some sense) a best fit to the data.
German scientist Karl Gauss (1777-1855) proposed
estimating the parameters
0
and
1
in Equation 11-2
to minimize the sum of squares of the vertical
deviations in Fig. 11-3.
This criterion for estimating the regression coefficients
is called the method of least squares.
Method of Least Squares
Method of Least Squares
Using Equation 11-2 ( =
0
+
1
+), we may express
the n observations in the sample as

=
0
+
1

, = 1, 2, ,
Equation (11-3)

and the sum of the squares of the deviations of the
observations from the true regression line is
=

=1
=

=1

Equation (11-4)
Method of Least Squares
The least squares estimators of
0
and
1
, say

0
and

1
,
must satisfy

0
,

1
= 2

=1
= 0

0
,

1
= 2

=1
= 0
Equations (11-5)

Method of Least Squares
Simplifying Equations (11-5)

0
+

=1
=

=1

=1
+

=1
=

=1


Equations 11-6 (least squares normal equations)
Least Squares Estimates

0
=

1

Equation 11-7

1
=

=1

=1

=1

2
=1

=1


Equation 11-8
where = 1

=1
and = 1

=1
.
Least Squares Estimates
Notationally, it is occasionally convenient to give special
symbols to the numerator and denominator of Equation
11-8. Given data
1
,
1
,
2
,
2
, ,

, let

=1
=

=1

=1


Equation 11-10 (denominator) and

=1
=

=1

=1

=1


Equation 11-11 (numerator)
Fitted or Estimated Regression Line
=

0
+

1

Equation 11-9

Note that each pair of observations satisfies the
relationship

0
+

1
+

, = 1, 2, ,

where

is called the residual.


Fitted or Estimated Regression Line
11.2/398 The grades of a class of 9 students on a midterm
report () and on the final examination () are as
follows:



(a) Estimate the linear regression line.
(b) Estimate the final examination grade of a student who
received a grade of 85 on the midterm report.
77 50 71 72 81 94 96 99 67
82 66 78 34 47 85 99 99 68
Fitted or Estimated Regression Line
10-11/424 An article in the Journal of Monetary
Economics assesses the relationship between percentage
growth in wealth over a decade and a half of savings for
baby boomers of age 40 to 55 with these peoples income
quartiles. The article presents a table showing five income
quartiles, and for each quartile there is a reported
percentage growth in wealth. The data are as follows.


Run a simple linear regression of these five pairs of
numbers and estimate a linear relationship between
income and percentage growth in wealth.
Income quartile 1 2 3 4 5
Wealth growth (%) 17.3 23.6 40.2 45.8 56.8
Fitted or Estimated Regression Line
10-12/424 A financial analyst at Goldman Sachs ran a
regression analysis of monthly returns on a certain
investment () versus returns for the same month on the
Standard & Poors index ( ). The regression results
included

= 765.98 and

= 934.49. Give the least-


squares estimate of the regression slope parameter.
Correlation
The degree of linear association between the two
random variables X and Y
Indicated by the correlation coefficient
is the population (true) correlation coefficient,
estimated by r, the sample correlation coefficient or
Pearson product-moment correlation coefficient
can take on any value from 1, through 0, to 1.
Possible Interpretations of
1. When is equal to zero, there is no correlation. That
is, there is no linear relationship between the two
random variables.
2. When = 1, there is a perfect, positive, linear
relationship between the two variables. That is,
whenever one of the variables, or , increases, the
other variable also increases; and whenever one of
the variables decreases, the other one must also
decrease.
3. When = 1, there is a perfect negative linear
relationship between and . When or increases,
the other variable decreases; and when one
decreases, the other one must increase.

Possible Interpretations of
4. When the value of is between 0 and 1 in absolute
value, it reflects the relative strength of the linear
relationship between the two variables. For example,
a correlation of 0.90 implies a relatively strong
positive, relationship between the two variables. A
correlation of 0.70 implies a weaker, negative (as
indicated by the minus sign), linear relationship. A
correlation = 0.30 implies a relatively weak
(positive) linear relationship between and .

Correlation
Correlation
Sample Correlation Coefficient
The estimate of
Also referred to as the Pearson product-moment
correlation coefficient

=


Sample Correlation Coefficient
Interpretations of r
1.00 perfect positive (negative) correlation
0.91 - 0.99 very high positive (negative) correlation
0.71 - 0.90 high positive (negative) correlation
0.51 - 0.70 moderate positive (negative) correlation
0.31 - 0.50 low positive (negative) correlation
0.01 - 0.30 negligible positive (negative) correlation
0.00 no correlation
Coefficient of Determination
Denoted by r
2
A descriptive measure of the strength of the regression
relationship, a measure of how well the regression line
fits the data
Ordinarily, we do not use r
2
for inference about
2
.
Coefficient of Determination
11-13/400 A study of the amount of rainfall and the
quantity of air pollution removed produced the
following data:
Daily Rainfall, (0.01
cm)
Particulate Removed,
(g/m
3
)
4.3 126
4.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108
Coefficient of Determination
11-13/400
(a) Find the equation of the regression line to predict the
particulate removed from the amount of daily rainfall.
(b) Estimate the amount of particulate removed when the
daily rainfall is = 4.8 units.
Coefficient of Determination
11-43/436 With reference to Exercise 11.13 on page 400,
assume a bivariate normal distribution for and .
(a) Calculate .
(b) Test the null hypothesis that = 0.5 against the
alternative that < 0.5 at the 0.025 level of
significance.
(c) Determine the percentage of the variation in the
amount of particulate removed that is due to changes
in the daily amount of rainfall.

Do not answer questions (b) and (c).
Summary
A scatter diagram displays observations on two
variables, x and y. Each observation is represented by a
point showing its x-y coordinates. The scatter diagram
can be very effective in revealing the joint variability of
x and y or the nature of relationship between them.
The method of least squares is used to estimate the
parameters of a system by minimizing the sum of the
squares of the differences between the observed
values and the fitted or predicted values from the
system.
Summary
Generally, correlation is a measure of the
interdependence among data. The concept may
include more than two variables. The term is most
commonly used in a narrow sense to express the
relationship between quantitative variables or ranks.
The correlation coefficient (r) is a dimensionless
measure of the linear association between two
variables, usually lying in the interval from 1 to +1,
with zero indicating the absence of correlation (but not
necessarily the independence of the two variables.)
Summary
The coefficient of determination (r
2
) is often used to
judge the adequacy of a regression mode. Its value
tells that the model accounts for r
2
% of the variability
in the data.
References
Aczel-Sounderpandian. Business Statistics, 7
th
Ed.
2008
Montgomery and Runger. Applied Statistics and
Probability for Engineers, 5
th
Ed. 2011
Walpole, et al. Probability and Statistics for Engineers
and Scientists 9
th
Ed. 2012, 2007, 2002

Potrebbero piacerti anche