Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PATTERNRECOGNITION
AND MACHINELEARNING
CHAPTER3:LINEARMODELSFORREGRESSION
Purposesoflinearregression
Specifyingrelationshipbetweenindependentvariables
(inputvectorx)anddependentvariables(targetvaluest)
Assumption:Targetsarenoisyrealizationofan
underlyingfunctionalrelationship
Goal:modelthepredictivedistributionp(t|x)
Agenerallinearmodel(GLM)explainsthisrelationshipin
termsofalinearcombinationoftheIV+error:
t
n
=y(x
n
,)+
n
GLM: Illustration of matrix form
=
|
e
+
y X
N
1
N N
1 1
p
p
e X y + = | e X y + = |
) , 0 ( ~
2
I N e o ) , 0 ( ~
2
I N e o
N: number of scans
p: number of regressors
Parameter estimation
e X y + = |
= +
e
(
2
1
|
|
Ordinary least
squares estimation
(OLS) (assuming i.i.d.
error):
y X X X
T T 1
) (
= |
Objective:
estimate to
minimize
=
N
t
t
e
1
2
y X
y
e
Design space
defined by X
x
1
x
2
GLM: A geometric perspective
P I R
Ry e
=
=
P I R
Ry e
=
=
|
X y =
y X X X
T T 1
) (
= | y X X X
T T 1
) (
= |
T T
X X X X P
Py y
1
) (
=
=
T T
X X X X P
Py y
1
) (
=
=
Residual
forming matrix
R
Projection matrix
P
OLS estimates
LSesmateofthedataprojecon
ofdatavectorontodesignmatrixspace
LinearBasisFunctionModels:PolynomialCurve
Fitting
(x)=Basisfunctions
Typesofnonlinearbasisfunctions
PolynomialGaussianSigmoidal
Estimationof:Classical(frequentist)
techniques
Usinganestimatortodeterminespecific
valueforparametervector
e.g.Sumofsquareserrorfunction(SSQ):
Minimizingfunctioninrespectto:*
t=y(x, *)
Reducingoverfitting:RegularizedLeastSquares
Controlofoverfitting:Regularizationoferror
function
whichisminimizedby
Datadependenterror+Regularizationterm
:regularization
coefficient.
OLS
solution+
extension
(/2)
T
RegularizedLeastSquares
Applyingmoregeneralregularizer:
How to choose appropriate value?
Classicaltechniques:MaximumLikelihoodand
LeastSquares
Assumeobservationsfromadeterministicfunction
withaddedGaussiannoise:
whichimpliesaGaussianconditionaldistribution:
Givenobservedinputs,
(independentlydrawn),andtargets,,
weobtainthelikelihoodfunction
where
Classicaltechniques:MaximumLikelihoodand
LeastSquares
Takingthelogarithm,weget
Computingthegradientandsettingittozeroyields
Solvingforw ,weget
OLSestimate
Conclusion:Frequentist approach&forecaston
Bayesianmethods
Frequentist approach
Seekingpointestimateof
unknownparameter
maximizinglikelihood
inappropriatelycomplex
modelsandoverfitting
(solution:Regularization,
but:number/typeofBFstill
important)
Bayesianapproach:
Characterizinguncertaintyin
through probability
distribution p()
averagingofmultiple
posteriorprobability
parameterdistributions
[p(|t)]
Bayesianlinearregression:Parameterdistribution
:assumed known constant
Likelihood function:p(t|)with Gaussian noise
exponential of quadratic function of
Gaussian prior:p()=N(|m
0,
S
0
)
Gaussian posterior:p(|t)=[p(D|)*p()]/p(D)
p(|t)=N(|m
N
,S
N
)
m
N
=S
N
(S
0
1
m
0
+
T
t)
S
N
=S
0
1
+
T
BayesianLinearRegression:Commoncases
Acommonchoiceforthepriorisazeromean
isotropicGaussian:
forwhich