Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
\section{Introduction}
\begin{itemize}
\item The term “regression" and the methods for investigating the relationships
between two variables may date back to about 100 years ago.
\item It was first introduced by Francis Galton in 1908, the renowned British
biologist, when he was engaged in the study of heredity.
\item Def- The simple linear regression is for modeling the linear
relationship between two variables.
\item The simple regression model is often written as the following form.
\begin{equation}
y= \beta _{0}+\beta _{1}+\varepsilon
\end{equation}
where, $y$ is the dependent variable\\
$\beta _{0}$ is $y$ intercepts\\
$\beta _{1}$ is the gradient or the slope of the regression line\\
$x$ is the independent variable\\ and $\varepsilon$ is the random error.
\item A more general presentation of a regression model may be written as
\clearpage
\begin{equation}
y=E(y)+\varepsilon
\end{equation}
where E(y) is the mathematical expectation of the response variable\
NOTE - When E(y) is a linear combination of exploratory variables
$x_{1},x_{2},x_{3}.....x_{k}$ the regression is the linear regression.\\
If k = 1 the regression is the simple linear regression.
\item The typical experiment for the simple linear regression is that we
observe n pairs of data $(x_{1},y_{1}),(x_{2},y_{2}),.....,(x_{n},y_{n})$ from a
scientific experiment, and model in terms of the n pairs of the data can be written
as
\begin{equation}
y_{i}=\beta _{0}+\beta _{1}x_{i}+\varepsilon_{i}
\end{equation}
for i=1,2,.....,n\\
with $E(\varepsilon_{i})$=0,\\
a constant variance $Var(\varepsilon _{i})$=$\sigma ^{2}$ and all
$\varepsilon_{i}'s$ are independent.\\
\item Note that the actual value of $\sigma^{2}$ is usually unknown.
\end{itemize}
Mathematically, the least squares estimates of the simple linear regression are
given by solving the following system:\\
\begin{align}
\frac{\partial }{\partial \beta_{0}}\sum_{i=1}^{n}[y_{i}-
(\beta_{0}+\beta_{1}x_{i})]^2\\
\frac{\partial }{\partial \beta_{1}}\sum_{i=1}^{n}[y_{i}-
(\beta_{0}+\beta_{1}x_{i})]^2
\end{align}\\
Suppose that $b_{0}$ and $b_{1}$ are the solutions of the above system, we
can describe the relationship between $x$ and $y$ by the regression line
$y\hat{}=b_{0}+b_{1}x$ which is called the fitted regression line by convention.\\
It is more convenient to solve for $b_{0}$ and $b_{1}$ using the centralized
linear model:\\
$y\hat{}=\beta_{0}^{*}+\beta_{1}(x_{i}-x\bar{})+\varepsilon_{i}$,\\
where $\beta_{0}=\beta_{0}^{*}-\beta_{1}x\bar{}$\\
We need to solve for\\
\begin{align}
\frac{\partial }{\partial \beta_{0}^{*}}\sum_{i=1}^{n}[y_{i}-
(\beta_{0}^{*}+\beta_{1}(x_{i}-x\bar{}))]^2=0\\
\frac{\partial }{\partial \beta_{1}}\sum_{i=1}^{n}[y_{i}-
(\beta_{0}^{*}+\beta_{1}(x_{i}-x\bar{}))]^2=0
\end{align}\\
Taking the partial derivatives with respect to $\beta_{0}$ and
$\beta_{1}$, we have\\
\begin{align}
\sum_{i=1}^{n}[y_{i}-(\beta_{0}^{*}+\beta_{1}(x_{i}-x\bar{}))]=0\\
\sum_{i=1}^{n}[y_{i}-(\beta_{0}^{*}+\beta_{1}(x_{i}-x\bar{}))]
(x_{i}-x\bar{})=0
\end{align}\\
Note that\\
\begin{equation}
\sum_{i=1}^{n}=n\beta_{0}^{*}+\sum_{i=1}^{n}\beta_{i}
(x_{i}-\bar{x})=n\beta_{0}
\end{equation}
Therefore we have
\begin{equation}
\beta_{0}^{*}=\frac{1}{n}\sum_{i=1}^{n} y_{i}=\bar{y}
\end{equation}
Now Substituting $\beta_{0}^{*}$ by $y\bar{}$ in (2.12), we get
\begin{equation}
\sum_{i=1}^{n}[y_{i}-(\bar{y}+\beta_{1}(x_{i}-\bar{x}))]
(x_{i}-\bar{x})=0
\end{equation}
$b_{0}$ and $b_{1}$ be the solutions of the system (2.5) and
(2.6). Now it is easy to see.
\begin{equation}
b_{1}=\frac{\sum_{i=1}^{n}(y_{i}-\bar{y})(x_{i}-\bar{x})}
{\sum_{i=1}^{n}(x_{i}-\bar{x})^2}=\frac{S_{xy}}{{S_{xx}}}
\end{equation}
and,
\begin{equation}
b_{0}=b_{0}^{*}-b_{1}\bar{x}=\bar{y}-b_{1}\bar{x}
\end{equation}
\item The fitted value of the simple linear regression is defined as
$\hat{y}_{i}=b_{0}+b_{1}x_{i}$.\\
and
$e_{i}=y_{i}-\hat{y}_{1}$\\
is referred to as the regression residual.
\item Regression error is the amount by which an observation differs from
its expected value.
\end{itemize}
\section{Statistical Properties of the Least Squares Estimation}
We first discuss statistical properties without the distributional assumption
on the error term ,but we shall assume that\\
$E(\varepsilon_{i})=0, var(\varepsilon_{i})=\sigma^2$and $\varepsilon_{i}$'s
for $i=1,2,3,.....,n$are independent.\\
\clearpage
\textbf{Theorem 4:} The least squares estimator $b_{1}$ and $y\bar{}$ are
uncorrelated.Under the normality assumption of $y_{i}$ for $i=1,2,3,.....,n$,
$b_{1}$ and $y\bar{}$ are normally
distributed and independent.\\
\textbf{proof:}\\
\begin{align*}
Cov(b_{1},y\bar{})=Cov(\frac{s_{xy}}{s_{xx}},y\bar{})\\
&=\frac{1}{s_{xx}}Cov(s_{xy},y\bar{})\\
&=\frac{1}{ns_{xx}}Cov(\sum_{i=1}^{n}(x_{i}-x\bar{})(y_{i}-y\bar{}),y\bar{})\\
&=\frac{1}{ns_{xx}}Cov(\sum_{i=1}^{n}(x_{i}-x\bar{})y_{i},y\bar{})\\
&=\frac{1}{n^2s_{xx}}Cov(\sum_{i=1}^{n}(x_{i}-
x\bar{})y_{i},\sum_{i=1}^{n}y_{i})\\
&=\frac{1}{n^2s_{xx}}\sum_{i=1}^{n}(x_{i}-x\bar{})Cov(y_{i},y_{j})\\
\end{align*}
Note that $E(\varepsilon_{i})$=0 and $\varepsilon_{i}$'s are independent we
can write\\
\begin{align*}
Cov(y_{i},y_{j})=E[(y_{i}-Ey_{i})(y_{j}-Ey_{j})]\\
&=(\varepsilon_{i},\varepsilon_{j})=\left\{\begin{matrix}
\sigma^2 &if &i= j \\
0 &if &i\neq j
\end{matrix}\right.\\
\end{align*}
Thus, we conclude that
\begin{align*}
Cov(b_{1},y\bar{})= \frac{1}{n^2s_{xx}}Cov(\sum_{i=1}^{n}(x_{i}-
x\bar{})\sigma^2\\
\end{align*}
Recall that zero correlation is equivalent to the independence between two
normal variables.\\
Thus, we conclude that $b_{0}$ and $y\bar{}$ are independent.\\
Hence Proved.\\
\begin{itemize}
\item\textbf{Now we discuss how to estimate the variance of the error term in the
simple linear regression model:} \\
$\Rightarrow$let $y_{i}$ be the observed response variable, and $y\hat{}_{i}
$=$b_{0}+b_{1}x_{i}$,
the fitted value of the response.\\
since we have $y_{i}$ and $y\hat{}_{i}$ but true error $\epsilon_{i}$ in the
model is not observable and we would like to estimate it.\\
and the empirical version of the error $\epsilon_{i}$ is $y_{i}-\hat{}_{i}$\\
now , estimation of the error variance based on $\epsilon_{i}$\\
\begin{align*}
s^2=\frac{1}{n-2}\sum_{i=1}^{n}(y_{i}-y\hat{}_{i})^2\\
\end{align*}
since in the denominator is n-2.this makes $s^2$ is an unbiased estimator of
the error variance $\sigma^2$.
\end{itemize}
\begin{itemize}
\item The unbiasness of estimator $s^2$ for the simple linear regression can be
shown in the following derivations.\\
$\Rightarrow$ \begin{align*}
y_{i}-y\hat{}_{i}=y_{0}-b_{0}-b_{1}x_{i}=y_{i}-(y\bar{}-b_{1}x\bar{})-
b_{1}x_{i}\\
&=(y_{i}-y\bar{})-b_{1}(x_{i}-x\bar{})\\
\end{align*}
it follows that\\
\begin{align*}
\sum_{i=1}^{n}y_{i}-y\hat{}_{i}=\sum_{i=1}^{n}(y_{i}-y\bar{})-
b_{1}\sum_{i=1}^{n}(x_{i}-x\bar{})=0\\
\end{align*}
and\\
\begin{align*}
[y_{i}-y\hat{}_{i}]x_{i}=[(y_{i}-y\bar{})-b_{1}(x_{i}-x\bar{})]x_{i}\\
\end{align*}
hence we have\\
\begin{align*}
\sum_{i=1}^{n}(y_{i}-y\hat{}_{i})x_{i}=\sum_{i=1}^{n}[(y_{i}-y\bar{})-
b_{1}(x_{i}-x\bar{})]x_{i}\\
&=\sum_{i=1}^{n}[(y_{i}-y\bar{})-b_{1}(x_{i}-x\bar{})](x_{i}-x\bar{})\\
&=\sum_{i=1}^{n}(y_{i}-y\bar{})(x_{i}-x\bar{})-b_{1}\sum_{i=1}^{n}
(x_{i}-x\bar{})^2\\
&=n(s_{xy}-b_{1}s_{xx})\\
&=n(s_{xy}-s_{xx}\frac{s_{xy}}{s_{xx}})=0\\
\sum_{i=1}^{n}(y_{i}-y\hat{}_{i})x_{i}=0\\
\end{align*}
\item Now To Show $s^2$ is an unbiased estimate of the error variance,
first we note that\\
\begin{align*}
(y_{i}-y\hat{}_{i})^2=[(y_{i}-y\bar{})-b_{1}(x_{i}-x\bar{})]^2\\
&=\sum_{i=1}^{n}(y_{i}-y\hat{}_{i})^2=\sum_{i=1}^{n}[(y_{i}-y\bar{})-
b_{1}(x_{i}-x\bar{})]^2\\
&=\sum_{i=1}^{n}(y_{i}-y\bar{})^2-2b_{1}\sum_{i=1}^{n}(x_{i}-x\bar{})
(y_{i}-y\bar{})+(b_{1})^2\sum_{i=1}^{n}(x_{i}-x\bar{})^2\\
&=\sum_{i=1}^{n}(y_{i}-y\bar{})^2-2nb_{1}s_{xy}+n(b_{1})^2s_{xx}\\
&=\sum_{i=1}^{n}(y_{i}-y\bar{})^2-2n\frac{s_{xy}}{s_{xx}}s_{xy}
+n\frac{s^2_{xy}}{s^2_{xx}}s_{xx}\\
&=\sum_{i=1}^{n}(y_{i}-y\bar{})^2-n\frac{s^2_{xy}}{s_{xx}}\\
\because (y_{i}-y\bar{})^2=[\beta_{1}(x_{i}-x\bar{})+
(\varepsilon_{i}-\varepsilon\bar{})]^2\\
&=\beta_{1}^2(x_{i}-x\bar{})^2+(\varepsilon_{i}-\varepsilon\bar{})^2+2\beta_{1}
(x_{i}-x\bar{})(\varepsilon_{i}-\varepsilon\bar{})\\
\therefore E(y_{i}-y\bar{})^2=\beta_{1}^2(x_{i}-
x\bar{})^2+E(\varepsilon_{i}-\varepsilon\bar{})^2\\
&=\beta_{1}^2(x_{i}-x\bar{})^2+\frac{n-1}
{n}\sigma^2\\
and \sum_{i=1}^{n}E(y_{i}-y\bar{})^2=n\beta^2_{1}s_{xx}
+\sum_{i=1}^{n}\frac{n-1}{n}\sigma^2\\
&=n\beta^2_{1}s_{xx}+(n-1)\sigma^2\\
furthermore ,we have\\
E(s_{xy})=E(\frac{1}{n}\sum_{i=1}^{n}(x_{i}-x\bar{})(y_{i}-y\bar{}))\\
&=\frac{1}{n}E\sum_{i=1}^{n}(x_{i}-x\bar{})y_{i}\\
&=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-x\bar{})E(y_{i})\\
&=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-x\bar{})(\beta_{0}+\beta_{1}x_{i})\\
&=\frac{1}{n}\beta_{1}\sum_{i=1}^{n}(x_{i}-x\bar{})x_{i}\\
&=\frac{1}{n}\beta_{1}\sum_{i=1}^{n}(x_{i}-x\bar{})^2=\beta_{1}s_{xx}\\
\end{align*}
\end{itemize}
\clearpage
and\\
\begin{align*}
Var(s_{xy})=Var(\frac{1}{n}\sum_{i=1}^{n}(x_{i}-x\bar{})y_{i})\\
&=\frac{1}{n^2}\sum_{i=1}^{n}(x_{i}-x\bar{})^2Var(y_{i})=\frac{1}
{n}s_{xx}\sigma^2\\
\end{align*}
Thus,we can write\\
\begin{align*}
E(s^2_{xy})=Var(s_{xy})+[E(s_{xy})]^2=\frac{1}
{n}s_{xx}\sigma^2+\beta^2_{1}s^2_{xx}\\
and \\
E(\frac{ns^2_{xy}}{s_{xx}})=\sigma^2+n\beta^2_{1}s_{xx}
\end{align*}
finally,$E(\sigma^2)$ is given by:\\
\begin{align*}
E\sum_{i=1}^{n}(y_{i}-y\bar{})^2=n\beta^2_{1}s_{xx}+(n-1)\sigma^2-
n\beta^2_{1}s_{xx}-\sigma^2\\
&=(n-2)\sigma^2
\end{align*}
in other words ,we prove that \\
\begin{align*}
E(s^2)=E(\frac{1}{n-2}\sum_{i=1}^{n}(y_{i}-y\hat{})^2)=\sigma^2\\
\end{align*}
Thus, $s^2$ the estimation of the error variance is an unbiased estimator of the
error variance $\sigma^2$ in the simple linear regression.\\
\begin{itemize}
\item Another view of choosing n-2 is that in the simple linear regression
model there are n observations and two restrictions on these observations:
\end{itemize}
\clearpage
\begin{align*}
1. \sum_{i=1}^{n}(y_{i}-y\hat{})=0\\
2. \sum_{i=1}^{n}(y_{i}-y\hat{})x_{i}=0\\
\end{align*}
Hence the error variance estimation has n-2 degrees of freedom which is also the
number of total observations - total number of the parameters in the model.\\
Now, take the partial derivative with respect to $\sigma^2$ in the $\log$
likelihood function $\log(L)$ and set it to zero:\\
\begin{align*}
\frac{\partial\log (L)}{\partial\sigma^2}=\frac{-n}{2\sigma^2}+\frac{1}
{2\sigma^4}\sum_{i=1}^{n}(y_{i}-y\hat{}_{i})^2=0
\end{align*}
the MLE of $\sigma^2$ is $(\sigma\hat{})^2=\frac{1}{n}\sum_{i=1}^{n}(y_{i}-
y\hat{}_{i})^2$\\
Note that it is a biased estimates of $\sigma^2$.\\
Since, we know that $ s^2=\frac{1}{n-2}\sum_{i=1}^{n}(y_{i}-y\hat{})^2$ is an
unbiased estimate of the error variance $\sigma^2$\\
Note that the $(\sigma\hat{})^2$ is is an asymptotically unbiased estimate of
$\sigma^2$,which coincides with the classical theory of MLE.