Sei sulla pagina 1di 15

Cybernetics and Systems Analysis, Vol. 45, No.

3, 2009

LINEAR REGRESSION WITH NONSTATIONARY


VARIABLES AND CONSTRAINTS
ON ITS PARAMETERS

UDC 519.237.5

A. S. Korkhin

The problem of estimating parameters of a linear regression with allowance for inequality constraints
on the parameters is considered in the special case when its variables have a trend. A parameter
estimation algorithm is described. The consistency of parameter estimates is proved and their
asymptotic distribution is found. Consistent estimates are proposed for the mean-square error matrix of
estimates of regression parameters and noise dispersion under rather general assumptions on the law
of noise distribution.
Keywords: linear regression, parameter, inequality constraint, estimation, mean-square error matrix,
noise dispersion.

In the present article, the results obtained in [1, 2] are applied to a regression with regressors having trends. The
regression being considered is of the form y t = xt a 0 + e t , t = 1, T , where y t R 1 , x t R n , a 0 R n , e t R 1 is noise, and
the symbol denotes transposition.
We estimate the parameter a 0 by solving the problem
S T (a ) =

1 T
( y t - xt a ) 2 min, g i ( a ) 0, i I ,

2 t =1

(1)

where the values of x t , y t , t = 1, T , and functions g i ( a ), i I , are known.


In problem (1), constraints are determined from informal considerations and can rather often be formulated in solving
practical problems. The true value of the vector of regression parameters a 0 can satisfy the following equality and inequality
constraints:
(2)
g i ( a 0 ) = 0, i I 10 , g i ( a 0 ) < 0, i I 20 , I 10 I 20 = I = {1, K , m}.
We denote m = | I | and mi =| I i0 | , i = 1, 2 , where | M | is the number of elements of a set M. The assumptions on
regressors, constraints, and noise are as follows.
Assumption 1. Random quantities e t are centered and independent. They do not depend on x t , t = 1, 2, K , have
identical dispersions s 2 and distributions F t ( u ), t = 1, 2, K , and, at the same time, we have

sup

t =1, 2 , K ||u || >c

dF t ( u ) 0 as

c .
T

We denote PT = x t xt and ET = diag ( rT11 , rT22 , K , rTnn ) , where rTii is an element on the main diagonal PT .
t =1

Assumption 2. The matrix PT is not degenerate for all T. The matrix RT = ET-1 PT ET-1 R as T , where R is
a positively definite matrix.
National Mining University, Dnepropetrovsk, Ukraine, korkhin@mail.ru. Translated from Kibernetika i Sistemnyi
Analiz, No. 3, pp. 5064, MayJune 2009. Original article submitted May 20, 2008.
1060-0396/09/4503-0373

2009 Springer Science+Business Media, Inc.

373

Assumption 3. Constraints g i ( a ) are doubly continuously differentiable convex functions. The rank of a matrix G
whose rows are g i ( a 0 ) , i I , is completed.
Assumption 4. As T rTii , xT2 + 1,i / rTii 0, i = 1, n.
Assumption 5. A diagonal matrix ET ( m m ) with positive elements on the main diagonal eTi , i = 1, m , can be found
~
~
~
for which there exists the following limit of the matrix GT = ET GET-1 : lim GT = G, where G is a matrix composed of rows
T

~
~
~
g i ( a ), i = 1, n. At the same time, (1) the matrix G1 = lim GT 1 has a completed rank, where GT 1 = ET 1G1 ET-1 ,
0

ET 1 = diag ( eTi ),

i I 10 ,

and G1 is a matrix composed of rows g i ( a 0 ), i I 10 ; (2) there exists a finite limit

-1
lim eTi eTj
< , i I , j = 1, n, where eTj = rTjj .

Assumption 5 holds true, in particular, when regressors are bounded. Then we have ET = T J m and ET = T J n .
It is assumed that independent variables are nonrandom.
COMPUTATION OF AN ESTIMATE OF THE REGRESSION PARAMETER VECTOR
Problem (1) is a convex programming problem that can be solved by a method from [3]. If the constraints in problem
(1) are linear, which is an important situation in actual practice, then problem (1) is a quadratic programming problem. To
solve it, methods of this class of problems are suitable. In particular, a method from [4, Ch. 23] can be used that is specially
adapted to the solution of problems of the form (1) with linear constraints.
Let us consider the case of linear constraints in more detail. In this case, the estimation problem after its
transformation assumes the form
(1/ 2) a PT a - a X T YT min, g i ( a ) = g i a - bi 0, i I ,

(3)

where X T is a (T n) matrix whose tth row is xt ; YT = [ y1 , y 2 , K , yT ] ; g i R n, bi R 1 , i I , are known values.


A regression usually has a free term on which any constraint is often not imposed. Let us show that, in the case when
we have x t = [1 x~t ], x~t R n -1 , the solution of the estimation problem can be simplified by decrementing the number of its
variables by one. The result presented below takes place.
THEOREM 1. If Assumption 2 holds true and no constraints are imposed on the free term a10 , then the solution of
~ ], where the estimate of the
problem (3), i.e., an estimate aT of the regression parameter a 0 is of the form aT = [ aT 1 a
T
n -1
~
~
~

is the solution of the problem


free term a = y - a x , a R
T1

T T

~ r a
~ ~
~
(1/2) a
T - a d T min , Aa b.

(4)

T
T
T
T
~ R n -1 , ~
Here, a
PT = (x~t - x~T )( x~t - x~T ), dT = (x~t - x~T )( y t - yT ) , x~T = x~t / T , yT = y t / T , b R m , is
t =1

t =1

t =1

t =1

a vector whose components are bi , i I , and A is an m ( n - 1) matrix composed of (n -1) last columns of the matrix G,
whose rows are g i ( a 0 ) = g, i = 1, m.
Proof. We write a Lagrange function for minimization problem (3) in the form L( a, l ) = (1/ 2) a PT a - a X T YT
+ l (Ga - b ) , where l is an m-dimensional vector of Lagrange multipliers.
According to Assumption 2, necessary and sufficient conditions of the minimum of problem (3) are of the form
a L( a, l ) = PT a - X T YT + G l = On ,

(5)

l i ( g i a - bi ) = 0, l i 0, i = 1, m,

(6)

where a L( a, l ) it the gradient of the Lagrange function along the vector a, On is a zero n-dimensional vector, and l i
is the ith component of l.
374

Since no constraints are imposed on the free term a10 , the matrix G and its ith row g i are of the form
G = [Om M A ], g i = [ 0 A i ] ,

(7)

where A i is the ith row of the matrix A. Then we have


O
G l = m .
A l

(8)

Let us consider condition (5). It is the representation of a system of n equations in vector form. We consider the first
of these equations that, with allowance for expression (8), is of the form
T
T
T
T
L( a, l )
= a1T + a 2 x t1 + a 2 x t 2 + + a n x t ,n -1 - y t = 0.
a1
t =1
t =1
t =1
t =1

Dividing both sides of the equation by the number of observations T, we obtain


a1 + a 2 xT 1 + a 3 xT 2 + + a n xT ,n -1 - yT = 0,

(9)

where xTi = xti / T , i = 1, n - 1 (xTi is the ith component of x~T ).


t =1

Equation (9) must be satisfied by the sought-for estimates of parameters and, hence, it implies
~ x~ .
aT 1 = yT - aT 2 xT 1 - aT 3 xT 2 - - aTn xT ,n -1 = yT - a
T T
Thus, a formula for computation of estimates for free terms is proved.
Let us consider the ith equation (i = 2, n) in the system of equations (5),
T
T
T
T
T
L( a, l )
= a1 x ti + a 2 x ti x t1 + a 3 x ti x t 2 + + a n x ti x t , n -1 - x ti y t + a i l = 0, i = 2, n,
a i
t =1
t =1
t =1
t =1
t =1

where a i is the ith row of the matrix A.


Substituting a1 from equality (9) in it, we obtain
n

j=2

t =1

t =1

t =1

t =1

a j - xT, j -1 xti + xti xt, j -1 - - yT xti + xti yt + a i l = 0 .

(10)

After transformations, we have


- xT,

j -1

t =1

t =1

t =1

xti + xti xt, j -1 = ( xti - xTi )( xt, j -1 - xT, j -1 ) ,

- yT

t =1

t =1

t =1

xti + xti yt = ( xti - xTi )( yt - yT ) .

Substituting the last two expressions in equality (10), we have


n

j=2

t =1

t =1

a j ( xti - xTi )( xt, j -1 - xT, j -1 ) - ( xti - xTi )( yt - yT ) + a i l = 0,

i = 2, n.

We represent the obtained system of equations in the following vector form with allowance for the denotations of
problem (4):
~ - d + A l = O .
(11)
rT a
T
n
375

Let us consider condition (6). Taking into account the structure of the matrix G (see representation (7)), we obtain
~ - b ) = 0, l 0, i = 1, m.
l i ( Ai a
i
i

(12)

Since Eqs. (5) and (6) have a unique solution, Eqs. (11) and (12) obtained from them possess the same property. Then
these equations are the necessary and sufficient conditions of the minimum of problem (4) that are satisfied by the vector
~ of the vector of estimating the regression parameter a is the solution of problem (4).
~=a
~ . Hence, a subvector a
a
T
T
T
The theorem is proved.
As is well known [5, p. 24], the numerical solution of problem (4) is more exact than that obtained as a result of the
numerical solution of problem (3) (if no constraints are imposed on both problems). Under constraints, the accuracy of the
numerical solution of problem (4) will also be higher than that of problem (3) since the main error source lies in inversions
of matrices entering in the objective functions of the mentioned problems.
Based on Theorem 1, the estimation of the regression parameter can be reduced to the solution of a quadratic
programming problem in which moduli of elements of the matrix of its objective function are smaller than unities. To this
T
T
~
~, where B = s -1 s , s = diag ( s ), i = 1, n, s = T -1 ( y - y ) 2 , and s = T -1 ( x - x ) 2 ,
end, we put b = BT a
t
ti i
T
y x
x
xi
y
xi
t =1

t =1

i = 1, n. Denoting rTb = s x-1 rT s x-1 , dTb = ( s y s x ) -1 dT , and A b = ABT-1 , we obtain from problem (4) that
(1/2) b rTb b - b dTb min, A b b b.
An advantage of the solution of such an estimation problem (we denote this solution by b = bT ) is that bT does not
depend on the scale of measurement of variables. The components of bT are standardized estimates of (n -1) last components
of the vector a 0 . We call them beta weights by analogy with the term used in regression analysis without constraints. Such
weights can be conveniently used for estimation and comparison of the force of influence of independent variables on a
dependent variable.
PROPERTIES OF AN ESTIMATE OF A REGRESSION PARAMETER
We first prove the consistency of the solution of problem (1).
THEOREM 2. If Assumptions 15 are satisfied, then the solution aT of problem (1) is a consistent estimate of a 0 .
Proof. After transformation, problem (1) assumes the form
(1/ 2) a PT a - a X T YT min, g i ( a ) 0, i I,

(13)

where the denotations in the expression for the objective function are the same as in problem (3).
According to Assumption 2, the matrix PT is positively defined. Therefore, a nondegenerate matrix HT can always be
found such that we have PT = HT HT . We assume that b = HT a. Taking into account these transformations, quadratic
programming problem (13) is transformed into the form
(1/2) b b - b ( HT-1 ) X T YT min, hi ( b ) 0, i I ,

(14)

where hi ( b ) = g i ( HT-1 b ) . It is easy to make sure that hi ( b ) is a convex function.


We put W b = {b: hi ( b ) 0, b R n }. The set W b is convex by virtue of the convexity of functions hi ( b ), i I . We
transform problem (14) into the following form
|| b - bT* | |2 min, b W b ,

(15)

where bT* = ( HT-1 ) X T YT is the solution of problem (14) without taking into account the constraint b W b .
It follows from problem (15) that its solution bT is the projection of bT* onto W b . This projection is unique by virtue
of the convexity of W b . As is well known [6, p. 116], the distance from an arbitrary point A that does not belong to some
376

convex set to the projection of A onto this set does not exceed the distance from A to an arbitrary point belonging to the
mentioned set. Therefore, we have
(16)
|| bT - b 0 | |2 | | bT* - b 0 | |2 ,
where b 0 = HT a 0 W b since, with allowance for condition (2), we obtain
hi ( b 0 ) = hi ( HT a 0 ) = g i ( a 0 ) 0, i I .
From inequality (16) we have the truth of the inequality | | HT ( aT - a 0 )| |2 | | HT ( aT* - a 0 )| |2 , which implies
( aT - a 0 ) ET RT ET ( aT - a 0 ) ( aT* - a 0 ) ET RT ET ( aT* - a 0 ) . Then we obtain
m min ( RT )|| ET ( aT - a 0 )| |2 m max ( RT )| |UT* | |2 ,
where m max ( RT ) and
UT* = ET ( aT* - a 0 ) .

m min ( RT ) are,

respectively,

the

maximal

and

minimal

(17)
eigenvalues

of

RT

and

2
2
We denote eT2 ,min = min eTi
, where eTi
= rTii . Then the inequality | | ET ( aT - a 0 )| |2 eT2 ,min | | aT - a 0 | |2 holds true.
i =1, n

Taking into account this inequality, we obtain the following inequality from inequality (17):

m max ( RT )

|| aT - a 0 ||2
eT2 ,min m min ( RT

* 2
| |UT | | .
)

(18)

According to [7, p. 35], under Assumptions 1, 2, and 4, the distribution of UT* converges to the distribution of the
p

quantity U * ~ N (On , s 2 R -1 ). Therefore, based on Theorem XII [8, p.119], we have | |UT* | |2 | |U * | |2 , where the symbol
p

denotes the convergence in distribution. According to Assumption 2, the eigenvalues of the matrix RT converge to values
that are not equal to zero as T and, moreover, according to Assumption 4, eT2 ,min as T . Therefore, the first
multiplicand in the right side of inequality (18) converges in probability to zero. Then, by theorem X(a) [8, p. 118], the right
side of inequality (18) converges in probability to zero. This implies the truth of the statement of the theorem.
COROLLARY 1. If Assumptions 15 hold true, then the solution of problem (3) with linear constraints aT
converges in probability to a 0 .
COROLLARY 2. If Assumptions 15 are satisfied and, at the same time, we have ET = T J m and ET = T J n ,
then the solution of problem (3) with the linear constraints aT converges in quadratic mean to a 0 .
Proof. Under the condition of Corollary 2, aT* converges in quadratic mean to a 0 [9, p. 43], m max ( RT ) m max ( R ),
and m min ( RT ) m min ( R ) > 0 as T . This directly implies the statement formulated.
Let us determine the limiting distribution law for the estimate of the parameter of the linear regression. The proof
scheme will be similar to that described in [1]. However, it differs from the proof presented in this article in connection with
the presence of a trend in regressors. Let us consider three auxiliary results.
LEMMA 1. Let the conditions of Theorem 2 be satisfied. Then, for a given number d > 0, numbers e > 0 and T0 > 0
can be found such that we have
P{||UT || e} < d, T > T0 , UT = ET ( aT - a 0 ).

(19)

Proof. Transforming inequality (17), we obtain


| |UT | |2 mT | |UT* | |2 ,

(20)

where mT = m max ( RT ) / m min ( RT ). As T , according to Assumption 2, we have mT m 0, and, in proving


p

Theorem 2, it is shown that ||UT* ||2 ||U * ||2 . Then we have mT | |UT* | |2 m | |U * | |2 . Therefore, taking into account
377

inequality (20), for a given number d > 0, some numbers e > 0 and T0 > 0 can be found for which the following chain of
inequalities takes place: d > P{mT ||UT* ||2 e} > P{| |UT | |2 e}, T > T0 . The lemma is proved.
LEMMA 2. The quadratic programming problem
(1/2) a Ra - Q a min, Ga b,

(21)

has a solution a(G , Q ) that is continuous with respect to G and Q. Here, a, Q R n, R is a positively definite matrix of
order n, G is an m n matrix, and b R m .
Proof. As is shown in proving Theorem 2, by virtue of the positive definiteness of the matrix R, the representation
R = H H is correct, where H is a nondegenerate matrix. We put b = Ha, P = ( H -1 ) Q , and S = GH -1 . Taking into account
these transformations, quadratic programming problem (21) is transformed into the form
min{(1/ 2) b b - P b | Sb b}.

(22)

We denote its solution by b( S, P ). For arbitrary Q1 and Q2 = Q1 + DQ and G1 and G2 = G1 + DG, we have
P2 = P1 + DP and S 2 = S 1 + DS , where Pi = H -1Qi and S i = H -1Gi , i = 1, 2. From this we obtain
DQ = HDP , DG = HDS .

(23)

Taking into account that we have a(Gi , Qi ) = H -1 b( S i , Pi ), i = 1, 2, we obtain


Da = H -1 Db,

(24)

where Da = a(G2 , Q2 ) - a(G1 , Q1 ) and Db = b( S 2 , P2 ) - b( S 1 , P1 ). Let DQ On , and let DG Omn , which, by virtue
of the nondegeneracy of H and expressions (23), implies that DS Omn and DP On . Then, according to a lemma
from [10], we have Db On . This and expression (24) imply that Da On .
The lemma is proved.
To obtain further results, the left side of the ith constraint g i ( a ) should be expanded in the vicinity of a = a 0 by its
Taylor series expansion as follows:
1
g i ( a ) = g i ( a 0 ) + g i ( a 0 ) + ( a - a 0 ) g i 2 ( a 0 + q 1 Da ).
2

(25)

Here, q 1 [ 0, 1], Da = a - a 0 , g i 2 ( a ) is a Hesse matrix of order n, 2 g i ( a ) / a j a k , and j, k = 1, n, is its ( j, k )th


element.
LEMMA 3. Let Assumptions 25 be satisfied. Then the solution of the minimization problem

0
1
0
b i ( X ) = ( eTi g i ( a )ET ) X + li ( X ) Om 1 , i I 1 ,

0
0
0
1
b i ( X ) = eTi g i ( a ) + ( eTi g i ( a )ET ) X + li ( X ) Om 2 , i I 2 ,

jT ( X ) =

1
X RT X - QT X min,
2

(26)

with respect to X = ET ( a - a 0 ) R n is UT = ET ( aT - a 0 ) . In problem (26), we have QT = ET-1 e t x t ,


t =1

b i ( X ) = eTi g i ( ET-1 X + a 0 ), i I ,

(27)

1
(28)
X ( eTi ET-1 g i 2 ( a 0 + q 1 ET-1 X ) ET-1 ) X , i I ,
2
where the function g i 2 and the value of q 1 are mentioned in series (25).
Proof. Problem (26) has a strictly convex objective function (according to Assumption 2) and a convex accessible
region since its constraints are convex functions. Hence, the solution of problem (26) (we denote it by UT* ) is unique. It
li ( X ) =

378

satisfies the following necessary and sufficient minimum conditions:


m

*
*
*
RT UT* - QT + b i (UT* ) uTi
= On , uTi
b i (UT* ) = 0, uTi
0, i I ,

(29)

i =1

*
where uTi
is the Lagrange multiplier corresponding to the ith constraint.

Problem (1) also is a convex programming problem and the necessary and sufficient conditions of its minimum are
satisfied by aT ,
m
(30)
PT aT - aT X T YT + g i ( aT ) l Ti = On , l Ti g i ( aT ) = 0, l Ti 0, i I .
i =1

Here, l Ti , i I , are Lagrange multipliers for problem (1).


After some transformations of the first equality in system (30), multiplying it by ET-1 from the left, and putting
uTi = eTi-1 l Ti , i I , we obtain from system (30) that
m

RT UT - QT + eTi ET-1g i ( aT ) uTi = On , uTi eTi g i ( aT ) = 0, uTi 0, i I .

(31)

i =1

*
Comparing expressions (29) and (31), we obtain that condition (29) holds true when UT* = UT and uTi
= uTi 0,

i I , since, according to expression (27), we have b i (UT ) = eTi g i ( aT ) and b i (UT ) = eTi ET-1g i ( aT ), i I . Since
conditions (29) determine a unique solution of problem (26), the lemma is true.
Let us define the limit for UT .
THEOREM 3. If Assumptions 15 hold true, then, as T , the random quantity UT = ET ( aT - a 0 ) converges in
distribution to a random quantity U, i.e., to the solution of the problem
~
jT ( X ) = (1/ 2) X RX - Q X min, G1 X Om 1 ,

(32)

~
~
where the ( m1 n ) matrix G1 consists of rows of the matrix G whose indices are i I 10 .
Proof. Let us consider the quadratic programming problem
~
jT0 ( X ) = (1/ 2) X RX - QT X min, GT 1 X Om 1 .

(33)

We denote its solution by UT0 . By Theorem 2.6.1 [7, p. 35], for the vector QT , we have
p

QT Q ~ N (On , s 2 R ), T .

(34)

~
~
According to Lemma 2, UT0 is a continuous function of QT and GT 1 , UT0 = f (QT , GT 1 ). From this and limit (34) and
p
~
~
~
~
also taking into account that, according to Assumption 5, we have lim GT 1 = G1 , we obtain f (QT , GT 1 ) f (Q , G1 ) .
T

~
According to problem (32), we have U = f (Q , G1 ) . Thus, as T , we have
p

UT0 U .

(35)

We denote accessible regions of problem (26) by OT = {X : b i ( X ) 0, i I } and those of problem (33) by


O = {X : ( eTi g i ( a 0 )ET-1 ) X 0, i I 10 }. Functions g i ( a ), i I , are convex (Assumption 3). Then, according to expression
(27), OT is a convex set. It also follows from the convexity of the constraints that we have li ( X ) 0, X R n . Therefore,
since we have b i (UT ) 0, i I 10 , it follows from the first constraint in problem (26) that we have
( eTi g i ( a 0 )ET-1 )UT 0, i I 10 . This implies that UT O.
379

According to Assumption 2, jT0 ( X ) is a strongly convex function with respect to X . It possesses the property [11, p. 54]
||UT - UT0 ||2

2 0
[ jT (UT ) - jT0 (UT0 )], m > 0,
m

(36)

since UT O, i.e., UT satisfies the constraints of problem (33) (the accessible region of this problem includes the
accessible region of problem (26)). According to property (36), for an arbitrarye > 0, we have
2

P{ ||UT -UT0 ||2 < e 2 } P [ jT0 (UT ) - jT0 (UT0 )] < e 2


m

P{ | jT0 (UT ) - jT (UT )| < ( e1 / 2)} + P{[ jT (UT ) - jT0 (UT0 )] < ( e1 / 2)}- 1 ,

(37)

where e1 = e 2 m / 2 .
To further transform the right side of inequality (37), we derive some relationships connecting objective functions in
problems (26) and (33). For an arbitrary e 2 > 0, we have
P{| jT0 ( X ) - jT ( X )| < e 2 } = P{|1/ 2X ( R - RT ) X | < e 2 }.

(38)

Substituting X = UT in relationship (38), we obtain


P{| jT0 (UT ) - jT (UT ) | < e 2 } 1- P{|UT | a } - P{| R - RT | 2e 2 / a},

(39)

where a is some positive number. According to Assumption 2, for a sufficiently large T, the third addend in the right
side of inequality (39) equals 0. Then, applying Lemma 1, from inequality (39) we obtain
P{ | jT0 (UT ) - jT (UT )| < e 2 } 1- d, T > T1 .

(40)

In relationship (38), we put X = UT0 . After performing similar calculations and taking into account that UT0 has a limit
distribution (according to convergence (35)), for arbitrary numbers e 2 > 0 and d > 0, we obtain
P{ | jT0 (UT0 ) - jT (UT0 ) | < e 2 } = DT 2 1- d, T > T2 .

(41)

We consider the second addend in the right side of relationship (37). After some simple transformations, we have
P{[ jT (UT ) - jT0 (UT0 )] < ( e1 / 2)} P{jT (UT ) - jT (UT0 ) 0}
+ P{ | jT (UT0 ) - jT0 (UT0 ) | < ( e1 / 2)} - 1 = DT 1 + DT 2 - 1,

(42)

where DT 2 is the probability defined in inequality (41).


To determine the probability DT 1 in (42), we write the constraints of problem (26) for X = UT0 as follows:
~
~
GT 1UT0 + L1 (UT0 ) Om 1 , GT 2UT0 + ET 2 g ( 2 ) ( a 0 ) + L2 (UT0 ) Om 2 .

(43)

Here, Lk ( X ) is a vector of dimension mk , k = 1, 2, whose ith component is a function li ( X ), i I k0 , specified by


~
expression (28); GTk = ETk Gk ET-1, k = 1, 2 . In this case, by virtue of Assumption 5, we have
~
~
(44)
lim GTk = Gk .
T

0
Here, we have ETk = diag ( eTi ), i I k .
According to expressions (28) and (35) and Assumption 5, we have p lim li (UT0 ) =
T

+ q 1 ET-1UT0 )ET-1 )UT0 = 0, i I , whence we obtain p lim L (UT0 ) =Om 2 .


T

380

1 0
(U ) ( eTi ET-1 g i 2 ( a 0
2 T

According to expression (42), we obtain DT 1 P{UT0 OT }. Therefore, we have the chain of inequalities
~
DT 1 = P{jT (UT ) - jT (UT0 ) 0} P{GT 1UT0 + L1 (UT0 ) Om 1 ,
~
~
GT 2UT0 + L2 (UT0 ) + ET 2 g ( 2 ) ( a 0 ) Om 2 } P{GT 1UT0 + L1 (UT0 ) Om 1 }
~
+ P{GT 2UT0 + L2 (UT0 ) + ET 2 g ( 2 ) ( a 0 ) Om 2 } - 1

0P{yTi(1) + li (UT0 ) 0}

iI 1

(2 )
+ P{ yTi
+ li (UT0 ) + eTi g i ( a 0 ) 0} - ( m1 - 1) - ( m2 - 1) -1
iI 20

0P{yTi(1) + li (UT0 ) 0}+ 0P{yTi(2 ) + li (UT0 ) + eTi g i ( a 0 ) 0} + 1- m1 - m2 ,

iI 1

iI 2

~
(k )
is the ith component of the vector GTkUT0 , k = 1, 2 .
where yTi
p ~
~
~
According to problem (32), we have G1U Om 1 . Then expressions (35) and (44) imply that GT 1UT0 G1U Om 1 .

Since p lim li (UT0 ) = 0, i I , it is easy to show that, for an arbitrary d1 > 0, there is some T3 > 0 for which we have
T

(1)
P{yTi
+ li (UT0 ) 0} > 1- d1 , i I 10 , T > T3 .

By virtue of the convergence of li (UT0 ), i I , to zero in probability, from expressions (34) and (43) and
Assumption 5, for an arbitrary d 2 > 0, we have
P{e -1 (| yT( 2 ) | + li (UT0 )) - g i ( a 0 )} > 1- d 2 , i I 20 , T > T4 .
Ti

Substituting this inequality and the previous inequality in the lower estimate obtained for the quantity D1T , we obtain
DT 1 1- m1 d1 - m2 d 2 , T > max(T3 , T4 ) .

(45)

We put e 2 = e1 / 2 in expression (41). Then, with allowance for inequality (45), from relationship (42) we obtain
P{[ jT (UT ) - jT0 (UT0 )] <

e1
} > 1- d - m1 d1 - m2 d 2 , T > max(T2 , T3 , T4 ).
2

(46)

Substituting inequality (40) when e 2 = e1 / 2 and inequality (46) in relationship (37), we have P{| |UT -UT0 | |2
< e 2 } > 1- 2d - m1 d1 - m2 d 2 , T > max Ti . Thus, we obtain
1 i 4

p lim | |UT -UT0 | |2 = 0.


T

The statement of the theorem follows from this expression and expression (35).
DETERMINATION OF THE ACCURACY OF ESTIMATION
OF REGRESSION PARAMETERS
In this section, we determine the accuracy of estimation for problem (3). We consider that the noise in the model is
normally distributed, i.e., it satisfies the assumption that is given below and is a special case of Assumption 1.
Assumption 6. The random quantities e t in the regression equation are independent and normally distributed with
zero expectations and dispersions s 2 .
381

Let us consider main regression accuracy characteristics such as estimates of the noise dispersion and mean-square
error (m.s.e.) matrix of parameter estimates.
Auxiliary results. For understanding further calculations, we consider the concepts of active and inactive constraints
to within some positive number x such that we have - x > g i ( a 0 ), i I 20 [2]. The first form of a constraint (let its number be i)
satisfies the condition - x g i ( aT ) 0. Accordingly, the ith inactive constraint satisfies the inequality g i ( aT ) < - x. In [2],
a random quantity g Ti is introduced such that, for a finite sample with a number of observations T, we have
1 if the ith constraint is active to within x ,
g Ti =
0 if the ith constraint is inactive to within x.
As is obvious, quantities g Ti , i I , are functions of aT .
The results presented below will be used in what follows.
LEMMA 4 [2, Theorem 3]. Let aT be a consistent estimate. In this case, if the ith constraint is active when a = a 0
(i I 10 ), then we have p lim g Ti = 1and if the ith constraint is inactive when a = a 0 (i I 20 ), then we have p lim g Ti = 0.
T

T
n

LEMMA 5 [12, Lemma 2]. Let S ( R , B , b, s , N ) be a vector function of variables s R , s > 0; N R , where R is
1

an n n matrix, B is an m n matrix, b R m is the solution of the problem j( R , s , Y ) = (1/ 2)Y RY - sN Y min, BY b,


where Y R n ; the matrix B has a completed rank, N is a normally distributed random quantity with the covariance matrix R,
M {N } = On , and R is a positively definite matrix. Then, for i, j = 1, n, the function
H ij ( R , B , b , s ) =

si (R , B , b, s , z ) s j (R , B , b, s , z ) f ( z , R ) dz

Rn

is continuous with respect to R, B, and b. Here, H ij ( R , B , b, s ) , i, j = 1, n, is an element of the matrix


H = M {S ( R , B , b, s , N ) S ( R , B , b, s , N )} and f ( z , R ) is the distribution density of N.
Main results. In [2], for the case of bounded regressors with a constant mean and equally distributed quantities
e t , t = 1, 2, K , the following consistent estimate of s 2 is proposed:
T

s T2 = (T - n + g Ti ) -1 ( y t - xt aT ) 2 .
iI

(47)

t =1

The consistency of estimate (47) for various noise distributions e t , t = 1,2, K , and regressors with trends is determined
by the following theorem.
THEOREM 4. If Assumptions 15 are satisfied, then s T2 is a consistent estimate of s 2 .
Proof. Transforming the expression for s T2 , we have
T

s T2 =

( yt - xt a 0 )2

UT RT UT - 2UT QT t =1
+
T - n + g Ti
iI

T
,
T - n + g Ti

(48)

iI

where the random quantity QT is defined in problem (26) and UT = ET ( aT - a 0 ) . The matrix ET is determined by
Assumption 2.
Under Assumptions 1 and 2, limit (34) takes place. By Theorem 3 and by virtue of Assumptions 15, UT has a limit
distribution. Hence, the numerator of the first addend in expression (48) has a limit distribution. Its denominator tends to
infinity since, according to Lemma 4, the sum gT i converges in probability to a finite number m1 . Then, based on
iI

Theorem X(a) [8, p. 118], the first addend in expression (48) converges in probability to zero. According to Assumption 1
and Lemma 4, the second addend converges to s 2 by the law of large numbers.
The theorem is proved.
382

With a view to obtaining an expression for the m.s.e. matrix of regression parameter estimates K T0 for a finite T, we
write problem (3) after transformation in the form
~
(1/ 2)Y RT Y - Y QT min, GT Y ET ( b - Ga 0 ) ,
where X = ET ( a - a 0 ) and b is a vector whose components are bi , i I .
Putting x = X / s, we write this problem in the form
~
~
(1/ 2) x RT x - x qT min, GT 1 x BT(1) , GT 2 x BT( 2 ) .

(49)

Here, we have qT = QT / s ~ N (On , RT ) (according to the expression QT = ET-1 e t x t and Assumption 6). It follows
t =1

from limit (34) that

qT Q / s = q ~ N (On , R ), T .

(50)

The right sides of constraints in problem (49) are defined by the expressions
BT( i ) = ETi ( b ( i ) - Gi a 0 ) / s, i = 1, 2 ,

(51)

where b ( i ) R m i has components b j , j I i0 (see problem (3)) and, at the same time, we have b (1) = G1 a 0 . Thus, we
obtain BT(1) = Om 1 , and components of BT( 2 ) are nonnegative.
We specify the m.s.e. matrix of regression parameter estimates K T0 by the expression
K T0 = M {UT UT } = s 2 kT0 , kT0 = M {uT uT },

(52)

~ ~
~
~
where uT = UT / s = S ( RT , GT 1 , GT 2 , BT(1), BT( 2 ), qT ) is the solution of problem (49) and kT0 = M {S ( RT , GT 1 , GT 2 , bT(1) ,
~
~
bT( 2 ), qT )S ( RT , GT 1 , GT 2 , bT(1), bT( 2 ), qT )} .
It follows from Theorem 3 that uT converges in distribution to the random quantity u = s ( R , b (1) , q ) that is the
solution of the problem

~
(1/ 2) x Rx - q x min, G1 x b (1) = Om 1 ,

(53)

where q is the random quantity specified in (50).


We have U = su, where U is the solution of problem (32), U = S ( R , b (1) , q ). Then we obtain
K = M {UU } = s 2 k , k = M {uu },

(54)

where k = M { s( R , b (1), q ) s ( R , b (1), q )}.


For a known dispersion s 2 , the computation of the matrix K T0 is reduced (according to expression (52)) to the
determination of the matrix kT0 that cannot be computed without knowledge of the quantity a 0 . In expression (52), in the
capacity of an estimate of the matrix kT0 , we take the matrix kT whose ( i, j )th element is as follows:
~ ~
K Tij ( RT , GT 1 , GT 2 , bT(1), bT( 2 ) ) =

nS i ( RT , GT 1 , GT 2 , bT

(1)

, bT( 2 ), z )

~ ~
S j ( RT , GT 1 , GT 2 , bT(1), bT( 2 ) , z ) f ( z , RT )dz.

(55)

383

~ ~
~ ~
Here, S i ( RT , GT 1 , GT 2 , bT(1) , bT( 2 ), z ) is the ith component of S ( RT , GT 1 , GT 2 , bT(1) , bT( 2 ), z ) , i.e., of the solution of
the problem

~
~
(1/ 2) x RT x - x qT min, G1T x bT(1), G2T x bT( 2 ) .

(56)

~
The matrices GTk , k = 1, 2 , appearing in expression (55) are defined in inequality (43), vectors bT( k ) R m k have
components bTi , and, at the same time, we have bT( k ) = [ bTi ] , i I k0 , k = 1, 2 ,
bTi =

eTi ( bi - ai aT )
(1- g Ti ), i I ,
sT

(57)

where eTi is an element on the main diagonal of the matrix ET specified by Assumption 5.
By virtue of the consistency of aT (Theorem 2) and s T2 (Theorem 4) and with allowance for Lemma 4, we have
p lim bT(1) = b (1) = Om 1 , bT( 2 ) as T .
T

(58)

Problem (56) differs from problem (49) in the right sides of its constraints. According to expression (57), they should
be modified to obtain a consistent estimate of the m.s.e. matrix of regression parameter estimates. As such a matrix, we
consider the matrix
(59)
K T = s T2 kT ,
~ ~
where the quantity s T2 is defined by expression (47), kT = [ K ijT ( RT , GT 1 , GT 2 , bT(1) , bT( 2 ) )], i, j = 1, n, and, at the same
~ ~
time, K Tij ( RT , GT 1 , GT 2 , bT(1), bT( 2 ) ) is specified by expression (55).
Let us show that the matrix K T specified by expression (59) is a consistent estimate of the m.s.e. matrix of regression
parameter estimates.
THEOREM 5. If Assumptions 26 are satisfied, then we have p lim K T = K , where the matrix K is defined by
T

expression (54) in which U is the solution of problem (32).


Proof. We first show that
p lim kT = k,
T

(60)

where k = K / s 2 = M {UU } / s 2 .
To this end, we consider the quadratic programming problem
~
(1/ 2) x RT x - qT x min, GT 1 x bT(1) ,

(61)

~
where GT 1 is a matrix (of a completed rank) of dimension m1 n (according to Assumption 3).
~
We denote the solution of problem (61) by s ( RT , GT 1 , bT(1) , qT ). We introduce the matrix (of order n) of second
moments of this solution by defining its ( i, j )th element as follows:
~
k Tij ( RT , GT 1 , bT(1) ) =

nsi ( RT , GT 1 , bT

~
, z ) s j ( RT , GT 1 , bT(1), z ) f ( z , RT ) dz.

(1)

(62)

~
~
Here, si ( RT , GT 1 , bT(1), z ) is the ith component of s ( RT , GT 1 , bT(1), qT ) when qT = z and f ( z , RT ) is the density of
distribution of qT ,

f ( z, RT ) = ( 2p ) - n / 2 (det RT ) -1/ 2 exp ( - (1/ 2) z RT-1 z ) .

(63)

~
~
Since we have k = M {s( R , G1 , b (1) , q ) s ( R , G1 , b (1), q )} and, according to limit (50), q ~ N (On , R ), an element ( i, j ) of
the matrix k assumes the form
384

~
k ij ( R , G1 , b (1) ) =

nsi ( R , G1 , b

~
, z ) s j ( R , G1 , b (1) , z ) f ( z , R ) dz, i, j = 1, n,

(1)

(64)

~
where si ( R , G1 , b (1), z ) is the ith component of u when q = z and f ( z , R ) is the density of distribution of q,
f ( z , R ) = ( 2p ) - n / 2 (det R ) -1/ 2 exp ( - (1/ 2) z R -1 z ).

(65)

~
~
According to Assumption 2 and limits (44) and (58), we have lim RT = R, lim GT 1 = G1 , and p lim bT(1) = b (1) = Om 1 .
T

Then, applying Lemma 5 to expression (62), from expressions (62) and (64), we obtain
~
~
p lim k Tij ( RT , GT 1 , bT(1) ) = k ij ( R , G1 , b (1) ) , i, j = 1, n.

(66)

~
~ ~
We now show that k Tij ( RT , GT 1 , bT(1) ) and K Tij ( RT , GT 1 , GT 2 , bT(1), bT( 2 ) ) (see expression (55)) have a common limit
as T . We introduce the following set:
~
~
w(GT 1 , bT(1), bT( 2 ) ) = {z: s( RT , GT 1 , bT(1), z ) bT( 2 ), z R n }.
~
~
It is easy to make sure that we have w(GT 1 , bT(1), bT( 21) ) w(GT 1 , bT(1), bT( 22) ) if the inequality bT( 21) bT( 22) is true.
According to limit (58), for arbitrary numbers e > 0, d > 0, M > 0, and h > 0, some T0 > 0 can be found such that we
have
P{ || bT(1) || e} > 1- d, P{| | bT( 2 ) | | M } > 1- h, T > T0 .
Putting

(67)

~
~
~
j ij ( RT , GT 1 , bT(1), z ) = si ( RT , GT 1 , bT(1), z ) sj ( RT , GT 1 , bT(1), z ) ,
~
~
~
F ij ( RT , GT , bT(1), bT( 2 ), z ) = S i ( RT , GT , bT(1) , bT( 2 ), z ) S j ( RT , GT , bT(1), bT( 2 ), z ) ,

~
~ ~
and
S k ( RT , GT , bT(1), bT( 2 ), z ) = S k ( RT , GT 1 , GT 2 , bT(1), bT( 2), z ), k = i, j,
~
~
= K Tij ( RT , GT 1 , GT 2 , bT(1), bT( 2 ) ), we obtain from expressions (55) and (62) that

where

~
~
| K Tij ( RT , GT , bT(1) , bT( 2 ) ) - k Tij ( RT , GT 1 , bT(1) )| =

j ij
( 1) ( 2)

~
denoting K Tij ( RT , GT , bT(1), bT( 2 ) )

( RT , GT 1 , bT(1), z ) f ( z , RT ) dz

w ( bT , bT )

(1)

j ij ( RT , GT 1 , bT(1), z ) f ( z , RT ) dz -

R n \ w ( bT , bT( 2) )

( 1)

~
F ij ( RT , GT , bT(1), bT( 2 ), z ) f ( z , RT ) dz

w( bT , bT( 2) )

(1)

~
F ij ( RT , GT , bT(1), bT( 2 ), z ) f ( z , RT ) dz

R n \ w ( bT , bT( 2) )

=
R

(1)

| j ij ( RT , G~T 1 , bT(1),

~
~
x ) - F ij ( RT , GT , bT(1), bT( 2 ), z )| f ( z , RT ) dz = LT ( RT , GT , bT(1), bT( 2 ) )

(68)

\ w ( bT , bT( 2) )

~
~
~
since we have si ( RT , GT 1 , bT(1) , z ) = S i ( RT , GT , bT(1) , bT( 2 ) , z ), i = 1, n, for z w(GT 1 , bT(1) , bT( 2 ) ) .
~
~
For a fixed z, we have S i ( RT , GT , bT(1), bT( 2 ), z ) si ( RT , GT 1 , bT(1), z ) as T since bT( 2 ) . Therefore, for an
arbitrary g > 0, some numbers M > 0 and N > 0 can be found such that the following inequality is true:
385

max

||bT( 1) || e ,
||bT( 2) || M

(1)

| j ij ( RT , G~T 1 , bT(1),

~
z ) - F ij ( RT , GT , bT(1), bT( 2 ), z )| f ( z , RT ) dz g,

R n \ w ( bT , bT( 2) )

~ ~
||GT -G || N

where e > 0 is given. Then, taking into account inequality (67), we obtain
~
P{LT ( RT , GT , bT(1), bT( 2 ) ) g} P{| | bT(1) | | e, | | bT( 2 ) | | M ,

~
~
||GT - G | | N }

~
~
~
~
= P{|| bT(1) || e, ||GT - G || N } - P{| | bT(1) | | e, | |GT - G | | N , | | bT( 2 ) | | < M }
~
~
P{| | bT(1) || e, ||GT - G || N } - P{|| bT( 2 ) | | < M } P{| | bT(1) | | e} -P{| | bT( 2 ) | | < M } > 1- d - h, T > T0 .
From expression (68) and the latter inequality, we have
~
~
P{| K Tij ( RT , GT , bT(1), bT( 2 ) ) - k Tij ( RT , GT 1 , bT(1) )| g} 1- d - h, T > T0 .
Thus, we obtain

~
~
p lim | K Tij ( RT , GT , bT(1), bT( 2 ) ) - k Tij ( RT , GT 1 , bT(1) )| = 0.
T

(69)

~
~
Expressions (69) and (66) imply that K Tij ( RT , GT , bT(1), bT( 2 ) ) converges in probability to k ij ( R , G1 , b (1) ), which
proves the validity of relationship (60). Then, with allowance for Theorem 4, we obtain the statement of this theorem.

REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

386

A. S. Korkhin, Some properties of regression parameters estimates under a priori inequality constraints,
Kibernetika, No. 6, 106114 (1985).
A. S. Korkhin, Determining sample characteristics and their asymptotic linear-regression properties estimated using
inequality constraints, Cybernetics and Systems Analysis, No. 3, 445456 (2005).
A. S. Korkhin, Solution of problems of a nonlinear least-squares method with nonlinear constraints based on the
linearization method, Probl. Upravl. Inform., No. 6, 124135 (1998).
C. Lawson and R. Henson, Numerical Solution of Problems by the Method of Least Squares [Russian translation],
Nauka, Moscow (1986).
J. Maindonald, Computational Algorithms in Applied Statistics [Russian translation], Finansy i Statistika, Moscow
(1988).
B. T. Polyak, Introduction to Optimization [in Rusian], Nauka, Moscow (1983).
T. W. Anderson, Statistical Analysis of Time Series [Russian translation], Mir, Moscow (1976).
S. R. Rao, Linear Statistical Models and Their Applications [Russian translation], Nauka, Moscow (1968).
E. Z. Demidenko, Linear and Nonlinear Regressions [in Russian], Finansy i Statistika, Moscow (1981).
M. Z. Khenkin and E. I. Volynskii, A search algorithm to solve the general mathematical programming problem,
Zh. Vychisl. Mat. Mat. Fiz., No. 1, 6171 (1976).
V. G. Karmanov, Mathematical Programming [in Russian], Nauka, Moscow (1975).
A. S. Korkhin, Parameter estimation accuracy for nonlinear regression with nonlinear constraints, Cybernetics and
Systems Analysis, No. 5, 663672 (1998).

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Potrebbero piacerti anche