Sei sulla pagina 1di 15
CHAPTER 11 Probability Density Functionals and Reproducing Kernel Hilbert Spaces* Emanuel Parzen, Stanford University ABSTRACT ‘The extraction, detection, and prediction of signals in the presence of noise are among the central problems of statistical communication theory. Over the past few years I have sought to develop an approach to those problems that would simultaneously apply to stationary or nonstationary, discrete parameter or continuous parameter, and univariate or multivariate time series and would distinguish between their statistical and analytical aspects. In particular, they would clarify the role played by various widely employed analytical techniques (sach as the Wiener-Hopf equation and eigenfunction expansions). In the development of this approach, two basic concepts are used: the notion of the probability density functional ol a time series and the notion of a repro- ducing kernel Hilbert space. The purpose of this chapter is to sketch the relation between these concepts. 1. THE PROBABILITY DENSITY FUNCTIONAL OF A NORMAL TIME SERIES Let [S(), ¢ € T] and [N(@®,t & 7] be time series, called, respectively, the signal process and the noise process. Let @ be the space of all real-valued functions on 7. Let Py and Psy be probability measures defined on the measurable subsets B of 2 by Py{B] = prob ((V(), #E 1] € BY a) Ps,n{B] = prob {[S@ + NO, t€ 7) EB). @) We are trying to determine, if it exists, a function p on 0-with the property that PssxlB) = f,pdPx. ® = Prepared under contaact Nonr 3440(00) for tho Office of Naval Research, Reproduc- tion in whole or in part is permitted for any purpose of the United States Government. Reprinted by permission from Time Series Analysis, Murray Rosenblatt, ed., John Wiley & Sons, Ine., New Yorks, 1963, pp, 155-169 ATT 478 156 STRUCTURAL PROBLEMS ‘The function p may be called the probability density functional of Psy with respect to Py in order to emphasize that its argument is © function [X(, £ET]. It is also denoted plX(®), t € 7] and called the probability density functional of the signal-plus-noise process AO =SH+NO, LET, a with respect to the noise process (N(#), ¢ T]. ‘The function pis often written symbolically as a derivative, aP say ““aPy ® and called the Radon-Nikodym derivative of Ps, with respect to Py [sce Halmos (1950), p. 329]. A necessary and sufficient condition that the probability density (5) exist is that Poy be absolutely continuous with respect to Py in the sense that, for every measurable subset A of 2, p Py[A] = 0 implies Ps, x[A] = 0. (6) In order that Ps, be not absolutely continuous with respect to Py, itis neces sary and sufficient that there exist a set A such that Py[A] = 0 and Psyx[A]} > 0. (7) ‘The probability measures Py and Ps,.v are said to be orthogonal if there exists aset A such that PrlAj = Oand Pszxl4] = 1. @) We can regard (8) as an extreme case of being not absolutely continuous. The notion of orthogonality derives its importance from detection theory (the theory of testing hypotheses). The simple hypotheses Ho: XQ) = NO) Hi XQ) = SO) + NG) are said to be perfectly detectable if there exists a set A such that Px(A] = prob {[X(0, ¢€ T] € AlHo} = 0 Psywl4] = prob [[X(0,1€ 7} © AlHi} = 1. Clearly, the hypotheses Ho and H; ere perfectly detectable if and only if Py and Pg,y are orthogonal. Given the probability measures Py and Ps,.w, the following questions arise: 1. Determine whether Py and Psyn are orthogonal. 2. Determine whether Ps; is absolutely continuous with respect to Py. 3. Determine the Radon-Nikodym derivative (5) if it exists. (9) ‘To answer these questions, the natural way to proceed is to approximate the 479 PROBABILITY FUNCTIONALS AND HILBERT SPACES 187 infinite dimensional case by finite dimensional cases. For any finite subset, Pea(h, +++, to? (10) let Py,p and Ps4y,1 denote the probability distributions of [X(), ¢ € 2] under Py and Ps,y, respectively. Assume that Pgyy,r is absolutely con- tinuous with respect to Py, with Radon-Nikodym derivative denoted dP san 7 = sen, 1 Pe = ap ‘The divergence between Psy. and Py on the basis of having observed (X(0), i€ T’) is defined by Jr = Esix(log pr) — Ex (log pr). (12) Using the theory of martinggles, it may be shown that OS Se S Son ff CTY. (13) Consequently, the limit Jr= im. Jr (14) exists and is finite or infinite. Further, it may be shown [see Hajek (1958)} that (a) if Jr < , then Ps,y is absolutely continuous with respect to Py and Psy = "apy 7 dimers (15) @) if Jy = @, and both the time series [V(), t E T]and [SQ + N@,t€ 7) are normal, then Ps, and Py are orthogonal. ‘We next apply these criteria under the following assuinptions. The noise process [N()), t 7] is a normal process with zero means and covariance kernel P KG, 0) = FN(s) NOL, (18) which is positive definite in the sonse that for every finite subset 7” = {41, +++ ,t,} of T the covariance matrix Kh) ++ Kh, tr) Kr = (Kt, (l= . . an Ket Ayo Ket bn) is nonsingular, with inverse matrix denoted Ket = [KM 8) (It should be noted that the assumption of positive definiteness is made only for mathematical convenience in the present exposition; it can be omitted.) 480 158 STRUCTURAL PROBLEMS In regard to the signal process, two cases are of most interest: 1, Sure signalease. [S(), t G Tlisa nonrandom function. 2. Stochastie signol case. [S(@), t € 1 is normal time series, independent of the noise process, with zero means and positive definite covariance kernel RG, ) = EIS) SO]. as) ‘To employ the criterion (15), we first need to compute the divergence Jr», defined by (12). In this section we consider the sure signal case; the stochastic signal case is considered in Section 3, In the sure signal case log pr: = (X, 8). — HS, 8) x2 20) where we define for any functions f and g on 7 Goer = FY MK, 0 0. et) er Consequently Jr = Beyal(X, S)x.r] — Ew(X, Sr] = (8,8) (22) and Jr < if and only if lim (8, S)x,27 <9. 23) In words, in the sure signal case, Ps,.y is absolutely continous with respect to Py if and only if (S, S)x,r- approaches a limit as 7” tends to 7. Fortunately it is possible to characterize those functions S(-) that have this property. To do so, we introduce the notion of a reproducing kernel Hilbert space, 2, REPRODUCING KERNEL HILBERT SPACES Let K(s, t) be the covariance kernel of time series [X(, 6G 7}. For each tin 7, let K(-, 2) be the function on T whose value at sin Tis equal to K(s, é). It may be shown [sce Aronszajn (1950)] that there exists a unique Hilbert space, denoted H(K; 7), with the following properties: 1. The members of H(K; 7) are real-valued functions on T [if K(s, #) wore complex-valued, they would be complex-valued functions). 2. For every tin 7 KG,) € H(K; 1). @ 3, For every tin T and f in H(K; 7) $0 = GF KG Or ay where the inner product between two funetions f and g in H(K; 7) is written ax. Frample 1. Suppose T = (1,2, . . . , n) for some positive integer m and that the covariance kernel K is givon by a symmetric positive definite matrix [Kj] with inverse [K]. ‘The corresponding reproducing kernel space H(K; f) 481 PROBABILITY FUNCTIONALS AND HILBERT SPACES 159 consists of all n-dimensional vectors f = (fy * * « ya) with inner product Gere = 5 1k en To prove (24) we need only to verify that the reproducing property holds for eens Goer 3 ptm, = 3 1a69 = ‘The inner product may also be written as a ratio of determinants: Kao Ke fl iy oo ry E@xer= — (25) To prove (25), we again need only to verify the reproducing property. When the covariance matrix K is singular, we may define the corresponding reproduc- ing kernel inner product in terms of the pseudo-inserse of the matrix K. Example 2. Let T = [isa < ¢ € blandlet [N(@, a < ¢ < b] be the Wiener process; that is, it has independent increments and covariance function K(s, ) = 9? min (s, 0) (26) for some parameter c%, Consider the Hilbert spaces H(K; T) consisting of all funetions fon a & t & b of the form $0 =f) + [17 du en for some square integrable measurable function f” on a < t < b [which can be called the Lederivative of f], with inner product defined by . Gdee = 2 [Loam + ['reore eu} 8) Tf we define a1 if a 0). Then H(K; 7) consists of all square integrable functions g(®) whose Fourier transforms G(w) vanish on N and such that |e za dw < @. J, loa za The foregoing results are easily extended to multiple time series [X.(é), © f° EH) azo), (7) oF G.(0) Ft g(t) dt, (66) 486 16s STRUCTURAL PROBLEMS where Xa) = f° el azalw). (58) Direct Product Hilbert Spaces ‘The notion of a direct product space plays an important part in our con- siderations. Given two function spaces G1 and Gz, consisting of functions defined on 7; and Ts, respectively, their direct product space, denoted G1 @ Gs, is the Hilbert space completion of the set of functions gon 7, @ T'2.f the form. alti, ta) = galt) galt), (59) where gi © Gy and g2 EG. ‘The norm of a function in @, @ G; of the form (59) is defined by llalléseos = llosletaslléa. (60) ‘The funetion g defined by (59) is on occasion denotod by 91 ® go. It should be noted that if G: and Gz are reproducing kernel Hilbert spaces, with respective reproducing kernels K; and K, defined on T @ T, then G, ® Gris a reproducing kernel Hilbert space with kernel K, @ Kz, where K, @ Kz is a function of four real variables defined by Ki ® Kalss, 82, th, te) = Kilsr, tr) Kalse, t2) (61) and @, Ki @ Kal, +, ty te) )ereos = gt, t2). (62) When @ = G2 = L2(7,B, x), G) ® Gz consists of all (B @ B-measurable) funetions g on T @ F such that Noses = fp fy 0%, lds) wld) < 2 (3) If Gi and G2 are equal to the reproducing kernél Hilbert space consisting of all Lydifferentiable functions on the interval (¢:4 < ¢ < 6) with norm squared 1 aan ils = Zotte + [lool a cy then G; ® Gy is a reproducing kernel Hilbert space with norm squared Brags = 5 9" i f ‘a P Nolléveos = 339%, 0) +5 J |az0(s, a) | ae 1 file e +f [Eve | at (65) bpd 2 aa +f f |22ee.0] ds dt. 3. STOCHASTIC SIGNAL CASE In this section we shall determine conditions for the existence of the proba- bility density functional (6) in the stochastic signal case described before (19). 487 PROBABILITY FUNCTIONALS AND HILBERT SPACES 165 We shall prove below that Ps. is absolutely continuous with respect to Py if and only if [Rltramrenccye < ©. (66) It may be shown that a sufficient condition for (66) to hold is that Rl zaexcy < %. (67) In practice, the condition we shall attempt to verify is (67). Consequeatly, before proving that (66) is necessary and sufficient for p = dPs4n/dPw to exist, let us show directly that (67) is a sufficient condition for p to exist and Jet us obtain an explicit formula for p. It may be shown that if (67) holds, then the signal process {S(f),# © T] may be written 80 = } 80, 8) ms where (2) [n)] is a sequence of random variables satisfying Ea, 18) = Bla BAe (69) for a suitable sequence [,], and (b) [&,] is a sequence of functions in H(K) satisfying (@e, Bary = a(a, 8). (70) In fact, [A,] are the eigenvalues and [#,] are the corresponding eigenfunctions of the linear transformation R on H(K) to itself defined by rurther RAO = , RO D)auw- m i X= [lRlifranezu < ©. (72) For n = 1,2, y let Sa(0) = See, Pa = (%, tad (73) By the developments of Section 1, it follows that Ps, is absolutely con- tinuous with respect to Py with probability density function _ Psa TI f wy 2 poe Spat eT] [om oar, ~ hb pow (SB) an vo (ry “Ter “an (STH) By martingale theory it may be shown that (72) implies that the probability ASB 166 STRUCTURAL PROBLEMS density function exists and is given by the limit dPsyw . @Py ~ LP ) so that Wen [-3 2d } Tog Se = >» Hog (+3) + V2 (76) If in addition to (72) dru : pe log SEX -3 Se Gam thy, vie 7s) A mt The intuitive meaning of (77) is that almost all sample functions of the signal process [S(f), £ T] belong to H(K), since from (68) isle ~ Sat, cA HISIlki = Yr. AL It appears to establish (77) it would suffice to prove that: FAlisilil < ©. In order to obtain necessary and sufficient conditions that Ps. be abso- iutely continuous to Py in the stochastic signal ease, iet us begin by rephrasing the problem. Let K; and Ky be two positive definite covariance kernels, and let P; be the probability measure induced on @ by a normal process (X(0), +1 with zero means and covariance kernel K;. The following questions arise: 1. Determine whether P; and P» are orthogonal. 2. Determine dP2/d?1 if it exists. ‘We use equations (10) to (15). Let AP afm _ [Kael exp (—3X"K eX) | aPie |Ki\ exp (—4X" KT 2X) Jp = Ep(log pr) — Ep,(log pr) = btrace (KipKar — 1 — 1+ KypKyr). (80) Pr (79) 489 PROBABILITY FUNCTIONALS AND HILBERT SPACES 167 Amazingly enough, the right-hand side of (80) can be expressed as the norm of # funetion in the reproducing kernel Hilbert space corresponding to the kernel Kx ® Kz, which is a function of four variables (6, st, ') defined by Ki @ Kils, 8', 1, €) = Ki(s, ) Kals',t’). (1 IfK, and Kz are nonsingular covariance matrices, we may verify that trace (K1Kz") = (Ki, Ki) roxy (82) since &y Kd mom = y Kila) Kz, ) Ke", ) Kale) =F 6,0 RP, KG) a (83) - Dar 2) Kies tt) a = trace (KK). Tt may also be proved that trace J = (Ky, Ko) roxy (84) In this manner we may verify that trace (KiK3* + KsK7" — 21) = ||Ki — Kalloxs (85) XYKOX — XK AX = (Ke Ky X@ Xeon (86) where X @ X is the function on 1 @ T defined by X @ Xs, t) = X(s) XW). (87) Using (85) and (86), we may rewrite (79) and (80): pr = |KaleK s.r" exp Ks — Ki, X@ Xxigearer] (88) Jr = 3K: — Killoren. (89) ‘The following conclusions can be immediately inferred: 1. In order that P; and Ps be orthogonal, it is necessary and sufficient that it is not so that Kz — Ky, belongs to H(K, @ Ky; T @ 1). (90) 2. Tf (90) holds, then the Radon-Nikodym derivative exists and is given by the limit (as 7’ 7) of (88). Formally, we may write aP, OP, ~ DRT) exp MK ~ Ki, X © XY)noxsz00] (91) if 8 (Ke) = Jim [KEK , (92) is assumed to exist. 490 168 STRUCTURAL PROBLEMS By using (91), we can sketch a proof of Woodward's theorem on linear transformation of Wiener integrals [Woodward (1961)]. Example 4. ‘To illustrate the use of (67), we consider stationary time series with spectral density functions, so that Ro — = f° OY Jala) de, Ke = J" 60 flo) ds We now show that a sufficient condition for (67) to hold for any finite interval T= (0 €t< 1) is that * 1809) ay <0 7) da < &. (93) To prove (93), we write [Bloor = | {* ee iste) do os wer = [odor f° denfslents(on et, Oo) cox.208 = [Odor f° dosfslonifstod |e") xl § [f7 dofste)|lohi oT From (49) we may deduce thet 2 Ly tool 2 < pat] 2 * pia (ed) plelie < J aliworrtg| se fe ael- (94) As T tends to , the right-hand side of (94) tends to [2m fara) (95) as a Limit in mean with respect to the finite measure on — 00

Potrebbero piacerti anche