(Thorson & Claerbout, 1985) Velocity Stack and Slant Stack Stochastic Inversion

GEOPHYSICS, VOL. 50, NO. 12 (DECEMBER 1985); P. 2727-2741, 10 FIGS.
Velocity-stack and slant-stack stochastic inversion
Jeffrey R. Thorson* and Jon F. ClaerboutS
inversion process is complicated by missing data, be-

ABSTRACT
cause surface seismic data are recorded only within a
Normal moveout (NMO) and stacking, an important finite spatial aperture on the Earth’s surface. Our ap-
step in analysis of reflection seismic data, involves sum- preach to solving the problem of an ill-conditioned or
mation of seismic data over paths represented by a nonunique inverse L ,‘ brought on by missing data, is
family of hyperbolic curves. This summation process is to design a stochastic inverse to L. Starting from a max-
a linear transformation and maps the data into what imum a posteriori (MAP) estimator, a system of equa-
might be called a ce[ocir_rspace: a two-dimensional set tions can be set up in which a priori information is
of points indexed by time and velocity. Examination of incorporated into a sparseness measure: the output of
data in velocity space is used for analysis of subsurface the stochastic inverse is forced to be locally focused, in
velocities and filtering of undesired coherent events (e.g., order to obtain the best possible resolution in velocity
multiples), but the filtering step is useful only if an ap- space. The size of the resulting nonlinear system of
proximate inverse to the NM0 and stack operation is equations is immense, but using a few iterations with a
available. One way to effect velocity filtering is to use gradient descent algorithm is adequate to obtain a rea-
the operator Lr (defined as NM0 and stacking) and its sonable solution. This theory may also be applied to
adjoint L as a transform pair, but this leads to unac- other large, sparse linear operators. The stochastic in-
ceptable filtered output. Designing a better estimated verse of the dunr-stack operator (a particular form of
inverse to L than L’ is a generalization of the inversion the Radon transform), can be developed in a parallel
problem of computerized tomography: deconvolving manner. and will yield an accurate slant-stack inverse
out the point-spread function after back projection. The pair.
INTRODUCTION: THE PROBLEM OF MISSING DATA prestack methods; in a way they do as much as they can
before handing the data over to the stacking algorithm.
For years wave-equation and Fourier methods have domi- Today’s standard industry processing sequence still pivots
nated poststack migration, yet they have been virtually ig- about the centrally important step of NM0 and stacking.
nored for use in stacking. Of the methods for wave-equation A possible explanation for the puzzle is sensitiriry ro ueloci-
moveout and stacking that have been developed, a number of ty: although data viewed along the midpoint axis may contain
processes have been recently designed which are conceptually relatively little velocity information, data viewed along the
superior to the standard processing sequence of NMO, stack, offset axis, i.e.. in a common-midpoint (CMP) gather, contain
and migration. Most of these algorithms can be categorized as important velocity information. Regardless of the process, an
prestack migration processes. Gazdag (1980) developed post- estimate of velocities is required at some point; velocity esti-
stack migration algorithms accurate for steeply dipping mation at the point of stacking is convenient because each
events. while Judson et al. (1978) Yilmaz and Claerbout CMP gather at this stage may be independently analyzed. The
(1980). and Hale (1984) developed more generalized algo- only alternative to a prestack migration scheme is to require
rithms for prestack migration which are capable (theoretically, that velocity information be input fully before migration, re-
at least) of correctly migrating wide-offset, steeply dipping sulting in a migrated depth section (Schultz and Sherwood,
events. 1980) or possibly a migrated time section. A prestack partial
Yet it remains a puzzle why methods which handle wide migration scheme that stops prior to stacking still requires
incident angles properly have not been successful when incor- knowledge of earth velocities (Yilmaz and Claerbout, 1980)
porating the NMO-and-stacking step in practice. The more but it has the capability of improving the estimation of veloci-
successfulof the prestack methods (e.g., Hale, 1984) are partial ty at the stacking stage (Hale, 1984).
Manuscriptreceived by the Editor September 28. 1984; revised manuscript received June 1, 1985
*Sierra Geophysics Inc., 15446 B&Red Road, Redmond. WA 98052.
:Department of Geophysics. Stanford university Stanford. CA 94305.
t 1985 Society of Exploration Geophysicists. All rights reserved.
2727
2728 Thorson and Claerbout
The velocity issue, however, does not answer the question of they need make no tacit assumption about missing data. In
why wave-equation techniques are not used for the NMO- least-squares parameter estimation, only the existing data
and-stack process itself. An issue that is addressed by few if need be best-fit by the model. The parameters comprising the
any of the prestack methods is that of missing data. Jacobs model (whose variation minimizes the mean-square error) can
(1982) noted that on many data sets shot spacing is too coarse be placed into what we call the mode/ space. The original data
to support any type of finite-differencing or Fourier-domain meanwhile reside in data space. We use the concepts of model
migration of the data over the shot axis. This led him to space and data space extensively here especially in our devel-
consider another prestack migration scheme, shot-profile mi- opment of a robust least-squares NMO-and-stack operator.
gration. We note now that in practice, missing data are a As in Burg’s spectral estimation method, the NMO-and-stack
consequence of not only a coarsely sampled shot axis; the operator seeks to improve the resolution of the parameters (in
coverage by geophones is also limited both in spatial extent this case, the stacking velocities), but can also be used to
and density. This incompleteness in the spatially sampled field interpolate missing traces on the CMP gather. The desired
data is conveniently classified into three categories: limited model space for NM0 and stacking is the celocitp panel
aperture (cable length truncation), sparse sampling (aliasing), (Figure I), a space indexed by two parameters: velocity and
and irregular gaps in the recording array (dead traces). Any zero-offset intercept time The data space itself, i.e., the CMP
one deficiency in the field data precludes use of finite- gather, is also a space indexed by two parameters: offset and
differencing or Fourier transforms. Sampling in time however, time
is more controllable; compared to the situation in sampling Aki and Richards (1980, chap. 12) give examples of least-
the spatial axes, the problems involved in temporal sampling squares inversion in a model space comprising a few hundred
are minor. points. The dimensionality of the model spaces we use is equal
Spatial coverage is likewise a problem with the offset axis, to the number of sample points in a velocity panel, whose
that is, on a CMP gather. This might be the primary reason dimensionality is typically two to three orders of magnitude
why the NMO-and-stacking-stage is not implemented with a greater. Our model space is vast, and we cannot hope to reach
finite-difference or Fourier method. Whether the imaging of a a solution without the help of some a priori criterion to con-
CMP gather is implemented by a finite-difference algorithm or strain the least-squares solution.
by NM0 and stacking, both results suffer from the end-effect A valuable criterion to employ in development of the least-
and aliasing artifacts brought on by missing data: the con- squares NMO-and-stack operator is sparseness of the solution
dition of the data may not justijy use of a more sophisticated in model space: the large-amplitude portions of the solution
process than simple NM0 and stacking. Having in mind the should be compressed into relatively few components of the
problem of missing spatial coverage, we now focus on design solution vector. We show that the sparseness criterion turns
of an NMO-and-stack operator which is insensitive to missing the least-squares estimation problem into a nonlinear one.
data. The appearance of nonlinearity is not surprising if we consider
the problem of interpolating, say, missing traces in a gap on a
Treatment of missing data seismic section. One simple approach is to interpolate hori-
zontally between points on the traces that bound the gap on
How do existing production migration programs treat miss- either side. This will fail to interpolate properly a dipping
ing data‘? In general, zero values are assumed for missing reflector across the gap. if the gap is wide enough to cause the
points. NM0 and stacking is an example of this: if a trace is reflector to become spatially aliased. A more logical approach
left out of the stack, it is treated as if zero-valued data were is to interpolate along one or two preselected dips. This is
recorded. In finite-differencing and Fourier methods, the edges essentially a nonlinear step, if‘rhe directions ofinterpolation are
of the computation grid are typically expanded to include zero determined by the data. The issues of nonlinearity and a priori
values, which overcomes wraparound artifacts or reflections information play an important role in design of a sparse stack-
occurring at the edge of the grid. Any improvement to the ing operator.
zero-value assumption must be based on interpolation and Finally, the results we develop can be applied in a straight-
extrapolation of the existing data; we shall return to this forward manner to slant stacks. A close relation exists be-
point. tween the slant stack (i.e., linear-moveout-and-stack) operator,
One common geophysical problem is estimation of the spec- and the NMO-and-stack operator; they are both instances of
trum of a truncated function. The truncations at the edges of a generalized Radon transform (Deans, 1983). We develop a
the interval cause blurring of the spectral estimate by convolu- slant stacking operator which minimizes aliasing and trunca-
tion with a sine function. To improve spectral resolution, tion artifacts on the output. and illustrate its application to a
many people, following Blackman and Tukey (1958) weight vertical seismic profile.
the data smoothly to zero at the ends of the interval. Burg
(1975) asserted that such weighting is data falsification. He VELOCITY STACKS
reasoned that it is better to estimate data values off the end of
the truncated segment, and employed a maximum entropy A constant-velocity stack, or more simply a velocity stack, is
criterion to do so. Burg’s maximum entropy spectral analysis a collection of traces indexed by velocity which result from
method can successfully resolve close, narrow peaks in the normal moveout and stacking the input CMP gather at differ-
spectrum (Ulrych and Clayton, 1976); moreover it estimates ent constant velocities. The output of NM0 and stacking a
the missing data values. Missing data in this case are an essen- CMP gather with an arbitrary stacking velocity function is
tial aspect, but in the formulation of the algorithm, they may therefore a one-dimensional (1-D) trace embedded in the ueloci-
or may not explicitly appear. I)’ pur~el. the two dimensional (2-D) output of the velocity
Least-squares methods fit neatly into Burg’s philosophy; stack. Figure 1 is an example of the velocity stack of a CMP
Stochastic Inversion 2729
gather. Although the stack was performed by discrete summa- The uniformly weighted stack is denoted in operator notation
tion, for theoretical purposes the velocity stack is defined as by CT.
the integral Another possibility is to set the weighting function w(h) pro-
portional to offset h, w(h) = h. In this manner the far-offset
traces are allowed to contribute more to the stack than the
near-offset traces, on which there is little moveout discrimi-
or in operator notation, nation among events of differing velocity. To make a distinc-
tion between this q&r-weighted velocity stack and the uni-
U WI= L7‘
*WId. (2) formly weighted stack, we denote the offset-weighted stack by
LT. The limitations in offset may be incorporated into the
It will be more convenient to index velocity panels with slow-
operators by specifying w(h) to be zero outside the range of
ness p, the inverse of velocity, rather than with velocity. The
recorded offsets, say, h, to h, The two velocity stacks are thus
function n(h) is a weighting function in offset, d(h, t) denotes
defined
the input to the velocity stack, and uw(p, r) denotes the output.
The offset-time pair (h, t) denotes the coordinate axes of the u = Lrd
CMP gather, and the slowness-time pair (p, T) denotes the hz
coordinate axes of the corresponding velocity panel. Equation U(p>7) = d(h, t = Jn) dh, (3)
(1) defines a linear transformation &z from a function defined Jhl
over the (h, t) domain into a function defined over the (p, T)
and
domain. When (k. t) and (p. T) are discretized, these functions
are equivalent to vectors whose components are indexed by h, II, = LTd
I or p. r. We subsequently refer to all functions defined over hz
the (II. 1) domain collectively as belonging to “offset space” or %(P, T) = h d(h, t = J,w) dh. (4)
“data space.” and the functions defined over the (p, T) domain shl
as residing in “velocity space.” or “model space.” Figure 2 compares the two velocity stacks LT and LT ap-
Specifying different weighting functions w(h) in equation (1) plied to the same window of data from Figure 1. The main
allows different definitions for the velocity stack. A uniform[) advantage that Lf has over bT is that ‘it reduces the lateral
~*eighfrd stack results when the weights are identical, w(h) = 1. coherence or “smear” of events in the velocity panel. The
horizontal smearing produced by the velocity stack defined by
equation (3) results from the strong contribution of the near-
offset traces. Yet the remaining coherent streaks on the panel
produced by LT are artifacts resulting primarily from the trun-
offset Ikml velocity (km/secl cation of events at the highest offset trace on the CMP gather.
1 2 3 2 3 A main goal is to alleviate both the near-offset and far-offset
truncation problems on the velocity panel.
More general weighting functions w(h) reduce the trunca-
tion eflects at far offsets (Larner. 1979). Tinkering with w(h) is
similar to trying to find the right taper window (Hanning, etc.)
to use in the taper-and-transform method of spectral esti-
mation (Blackman and Tukey, 1958). The effect is to falsify the
data within the observation window with arbitrary weighting
functions.
Least-squares formulation of the problem
Let us look for an alternate choice for the velocity stacking

operator which does not suffer the mentioned shortcomings of
LT or LT. The transformation into velocity space should have
the desirable property offbcusing hyperbolic events into points;
this property would obviously aid in the resolution of two
events with different stacking velocities.
We take a basic least-squares approach by asserting that
the CMP gather d(h, t) is the result of some transformation on
(a) CMP gather (b) Velocity stack a function u,(p, T) in velocity space. Perhaps d(h, t) is also
contaminated with additive noise:
FIG. 1. Velocity stacking. Panel (a) is a portion of a CMP
gather from the Offshore Texas Gulf area (courtesy of Western d = Lu, + n. (5)
Geophysical Co.). Refractions were removed with a triangular
mute. For clarity, only positive polarities were plotted on the A straightforward definition for the operator L is the adjoint,
CMP gathers. Panel (b) is a velocity stack of panel (a) which or transpose, of the operator L’ given in equation (3); why
was made using equation (3) with the integration over offset
this definition is preferred will be discussed. Spatial dis-
replaced by a uniformly weighted summation. Though the
stacking velocity curve may be picked off the panel, it lacks cretization may be incorporated into & and bT by defining the
the velocity resolution of a semblance velocity analysis. weighting function I’ to be zero at regular increments. In
this way the continuous transform of equation (1) can be Equation (6) is chosen as a modeler of hyperbolic events on
turned into a discrete, finite-dimensional linear operator. This a CMP gather; using equation (6), an impulse in velocity
discretization step does not change any of our results. space transforms into a hyperbolic event having uniform am-
By a suitable definition of the inner product (Appendix A), plitude over all offsets on the gather (the data space). To what
L is simply the operation of reverse NM0 and stacking: extent are the operations L and L* inverses of each other?
P2 Taking a least-squares approach to the solution of equation
d,.,(h, f) = u,(p, r = Jt2 - p2h2) dp, (6) (5), the estimate u to u0 that minimizes the energy in the noise
sPI term n is
or in operator notation, u = (C’IJ ‘CTd. (7)
d,., = &, The operator LTC is diagonally dominant. If &‘L, is written as
The limits pi and p2 bracket the range of slownesses stacked a (very) large matrix, the amplitudes of the diagonal elements
over. The adjoint relationship between the uniformly weighted of LTL
_ _ may be orders of magnitude greater than those of the
velocity stacks Lr and L is derived in Appendix A. L is identi- off-diagonal elements. Alternately, if LrL is considered to be a
cal in operation to LT except that NM0 “stretch” is replaced 2-D filter, the central peak of its impulse response should
by an inverse-moveout compression on each trace. dominate the outliers or “side lobes” of the filter. Unfortu-
nately the side lobes of the filter LTC are significant; they are
responsible for the wide spread in velocity exhibited by the
events in Figure lb.
A more basic question is whether (Lr!J’ in equation (7)
velocity (km/sac1 velocity Ikm/secl
2 3 2
exists, and if so, how easily may it be found. Both the finite
3
,- sampling interval and finite range of offsets may contribute to
the ill-conditioning of LTL and increase the likelihood of its
being singular, but probably the greatest contribution to ill-
conditioning is the nonuniqueness of the hyperbolic summa-
tion paths at and near zero offset.
Two basic routes can be followed in designing a substitute
for the ill-conditioned (or nonexistent) inverse (CTIJ’. The
I
first is to employ a generalized inverse (LTb)’ (Lanczos. 1961),
which by definition inverts only the nonzero eigenvalues of
L7L. By use of the generalized inverse the null space compo-
nents are effectively constrained to be zero in the inversion.
The generalized inverse of L itself is consequently (I,,rL)+L*.
Figure 2 illustrates the steps taken to apply the generalized
(a) Velocity stack inverse filter to a window of the data from Figure 1. First, the
(b) Offset weighted
velocity stack Lr of equation (3) is applied to the data. Next,
2 3
the filter (I+‘$)’ is applied to give the stack shown in Figure
2c. This filter is relatively localized in time and space (Thor-
son, 1984) and is also time-variable and space-variable. Our
objective is to design an operator able to obtain a sharply
focused velocity panel from a CMP gather. The filtered panel
of Figure 2c shows that the generalized inverse (LTL)’ does
this fairly well. False events at high velocities were eliminated;
however, there is still some problem with discrimination of
real events from artifacts at low velocities.
The stochastic inverse
I
J
The second approach to designing an approximate inverse
to L is to solve equation (7) directly, after perturbing LrL in
(c) Generalized inverse (d) Stochastic inverse
order to guarantee that LTJ= has a stable inverse. To do this,
FIG. 2. Comparison of the generalized inverse and stochastic we specify a priori knowledge about the variance of the solu-
inverse to velocity stacks. The upper left panel (a) is a portion tion u by adding the term urQu to the standard least-squares
of the velocity stack of Figure 1 from 2 to 3 s. In text notation, functional (Aki and Richards, 1980, section 12.3.5), where the
it is LTd [equation (3)]. The upper right panel (br)is the offset- term Q represents the ratio of noise variance to variance of
wetghted velocity stack over the same data L, d defined by the points in model space. Adding the diagonal term Q to the
equation (4). Panel (c) is an estimate of the generalized inverse
(LTI,)‘LTd of the data. Panel (d) is an approximate solution least-squares linear system L“I, converts it into what Aki and
to the stochastic inverse equations (8). The resolution of veloc- Richards call the stochastic inverse:
ities in velocity space generally improves through the sequence
of panels from (a) to (d). Though each data set is scaled differ- u = (LrL
_ 1 + Q)-‘CTd. (8)
ently, to facilitate a comparison of velocity resolutions the
panels were plotted with the maximum value on each at full If the variances of the solution u are not precisely known, the
scale, which is 1S times the trace spacing. Q term may be “bootstrapped” or iteratively refined from
updated estimates of u. A mathematical justification for this gather into velocity space is meaningful only when the implicit
bootstrapping procedure is given in the following section on assumptions that justify the use of L as a modeler remain true.
velocity-stack stochastic inversion. There, the variance esti- What are these assumptions? A reflector on a CMP gather
mates in D are related to a nonlinear sparsenessmeasure. One must have a uniform amplitude from trace to trace, it must
consequence of the bootstrapped estimation of D is as follows: vary smoothly in moveout from trace to trace, and it must
since D depends explicitly on n, equations (8) are turned into a exhibit hyperbolic moveout. That is, traces on the gather must
nonlinear system of equations. The definition of stochastic be balanced in amplitude and be free of static shifts. Real data
inversion used here is a generalization of the definition of Aki sets satisfy none of these assumptions exactly, but the nearer a
and Richards, whose diagonal term is the identity matrix data set comes to satisfying them, the greater the ability the
multiplied by a constant scalar. stochastic inverse will have in obtaining precise stacking ve-
locities.
Comparison of the stochastic inverse to velocity stacks
Other choices for L, L7
Panel (d) of Figure 2 shows the results obtained when the
stochastic inverse in eqution (8) is applied to the CMP gather A linear operator with a close relation to velocity stacking
of Figure 1. Because the system of equations (LTL + Q) is is slant stacking. Both are back-projection techniques which
nonlinear and, moreover, possessesa vast dimensionality, it is differ only in the specification of the paths of summation
impossible to attain exact convergence with an iterative algo- taken over the data. Slant stacking involves summing over a
rithm, much less solve equations (8) directly. Yet in practice, straight line. The operator can be defined by
substantial convergence can be made within a few iterations;
uss= L,T,d
only five iterations were necessary to obtain the estimated
solution II in Figure 2d. The resolution of individual velocity h2
events on u has been significantly enhanced over that of the %,(P, r) = d(h, t = 7 + ph) dh, (9)
5hl
two alternatives, standard velocity stacking and the gener-
alized inverse (again, see Figure 2). Another comparison, this while the corresponding adjoint (derived in Appendix A) is
time of the stochastic inverse u to the weighted velocity stack
d,, = L, n
LTd [equation (4)] over a greater range in time is shown in
P2
Figure 3.
d,,(h, t) = u(p, r = t - ph) dp. (10)
The least-squares equations (8) define a transformation from sPI
data space to velocity space. Once a sparse solution u has
The parameters indexing the data domain h and t still repre-
been found, the process of “inversion” from velocity space
back into data space consists simply of an application of the sent, respectively, spatial and temporal quantities, while the
operator L to II (Figure 4). Coherent events remaining on the
residual noise estimate d - &JI in Figure 4 are, for the most
part, nonhyperbolic events. The low-frequency coherent events
exhibiting linear moveout on the residual panel of Figure 4
are normal modes confined to the shallow water layer over
velocity lkdsec) velocity (km/set)
which this particular marine gather was recorded.
2 3 2 3
Figure 5 displays a comparison between the contoured en-
velope of the stochastic inverse u [equations (8)], a standard
semblance velocity analysis from the same gather, and the
envelope of the weighted velocity stack L,d [equation (4)].
The resolution of velocities on the standard semblance plot
and the stochastic inverse are virtually the same, even though
the semblance contour plot was created by scanning twice the
N
number of velocities used to create the velocity stack. The
envelope of the standard velocity stack LTd, in Figure 5a, also
shows good resolution of the strong events, but has a higher
level of background noise due to truncation artifacts. The
main objective of the stochastic inversion procedure is to drive
down the level of the artifacts on a standard velocity stack so
w
weak events which may have been obscured by the artifacts
might be made visible. Notice the high-velocity event at 2.4 s
on Figure 5 which is distinct from the general trend of the
velocity function. This event is a reflection from a steeply
dipping fault (Hale, 1984); its high apparent velocity is due to
the steep dip of the fault-plane reflector.
e
Limitations (a) Stochastic inverse (b) Offset weighted
FIG. 3. A comparison between panel (a)+ the stochastic inverse

We emphasize that L functions as a modeler of CMP gath- u and panel (b), the velocity stack L, d over a larger time
ers, albeit a crude one. The inverse transformation of a CMP window of 1 to 4 s.
offset lkml offset (km1 offset (km1
1 2 3 1 2 3
. I!
/
t-
.’ ;
I :
! i
t :
:ti
i: -
!!I
n:
-.
2
v,
TI
CA.
(a) CMP gather (b) Stochastic estimate (c) Residual
FIG. 4. Reconstructing the original data. Panel (a) is the original data, panel (b) is the estimate Qr of the data, and
panel (c) is the difference between the two. Events with steep linear moveout can be seen on the resrdual gather, while
hyperbolic events have been moved to panel (b).
velocity lkm/secl velocity (km/s ec I velocity (km/se ?C 1

2 3 2 3
_&__-+ r, L
i
Iv -
-.
m
:
w -
(a) Offset weighted (b) Semblance analysis (c) Stochastic inverse
FIG. 5. Resolution comparisons in velocity space. Panel (c) was made by taking a time-averaged, normalized envelope
of the stochastic inverse II of Figure 3a. For comparison, panel (b) shows a standard semblance velocity analysis on the
CMP gather. Each point of the panel is a measure of the similarity of waveforms over the corresponding moveoutpath
in data space. Perfect semblance equals 1. The left panel (a) is a normalized envelope of the velocity stack b, d of
Figure 3b. The fine contour interval on each plot is 0.1. 40 velocities were scanned to make the semblance plot, 20
velocities to make II and LTd. The events in each velocity panel closely track the events on the semblance velocity
analysis; for this example, the horizontal sampling used in velocity space (20 traces) is adequate.
parameters indexing the new model domain, p and 5, are slope dieting which velocities will be present in the solution. We
(or ray parameter) and zero-offset intercept time wish to assume nothing a priori about the solution, apart from
In the development of the stochastic inverse, we may choose specifying a realistic range of velocities to be encountered.
the form of operator L; we require that L be linear, its adjoint Estimating a priori noise variances, on the other hand, is
LT be known, and that it be an acceptable modeler of the data not as diff&dt because the CMP gather (in the data domain)
via equation (5). For example, a vertical seismic profile (VSP) can be directly examined for noise. For example, independent
is typified by downgoing and upgoing events with linear noise analyses recorded in the field might be available. If there
moveout. Slant stacking models events with linear moveout. If is a problem with a noisy trace, it may be ignored by (con-
data d are taken to be a VSP, the parameter h refers to depth ceptually) setting its corresponding noise variance to infinity.
of the geophone in the well. Examples of slant stack stochastic One way to avoid having to make a priori assumptions
inversion on a vertical seismic profile are shown later. about model variances is by a bootstrap process, which reesti-
Although the stochastic inverse examples shown above were mates model variances while simultaneously solving the MAP
made with equation (3) as the chosen velocity stack, the alter- estimator. Theoretical arguments supporting this procedure
native velocity stack L,T of equation (4) could equally well are given in the next section. Making the model variance func-
have been used. In fact, Lf is a slant stack in disguise. By the tionally dependent upon the solution u turns the MAP esti-
change of variables h*-+ x, t2+ y, equation (4) can be turned mation functional into a sparsenessmeasure. Solutions to the
into a slant stack. An adjoint for L,T (using a different defini- MAP estimator will be driven to sparseness; large compo-
tion for the inner product space) can be put in the following nents of the solution tend to cluster into a few, large peaks in
form (Thorson, 1984): the model domain, but will tend to be very small elsewhere.
Therefore, sparseness is a desirable property for the velocity
d, = Lu stack inverse to possess.
P2 A linear system of equations as large as equations (8), and
d,(h, t) = p u(p, T = ,,/m) dp. (11) even more so the corresponding nonlinear problem of boot-
sPL
strapped model variances, cannot be solved directly, consider-
As a choice for the modeling operator L, equation (11) does ing the dimensionality of the problems. The examples shown
much the same thing as equation (6tit maps points in model were made by employing a gradient descent algorithm for the
space into nearly uniform-amplitude hyperbolas in data space. solution of equations (8).
There may be advantages to choosing the adjoint pair (L,, Lf) We now derive the nonlinear system of equations that,
over the pair (L, LT). The additional linear weights h and p in when solved, yields the data set’s so-called stochastic inverse
the integrands of equations (4) and (11) emphasize the infor- in velocity space. The term stochastic inverse is generalized to
mation content of the higher offset traces. Again we stress that mean the MAP estimator derived here.
the choice reduces to which operator, L or b,, is more accept-
able as a modeler of a CMP gather. Sparse inversions and MAP estimation
VELOCITY-STACK STOCHASTIC INVERSION MAP estimation is defined as maximization of p(u(d), the

conditional probability density of u given d and is produced
We now show in detail how the velocity-stack stochastic by variation over the model parameters u (Musicus, 1982). All
inverse [equations (8)] illustrated by the examples just given probability density functions here are denoted by an unsub-
was derived. Damped least-squares systems such as equations scripted p( . ); each probability density can be uniquely identi-
(8) are easily derived from a maximum a posteriori (MAP) fied by its corresponding random variable. The functional
estimator, in which the various probability densities involved relation between u and d is given by d = Cu + n. Applying
are assumed to be Gaussian (Bard, 1974). One advantage to Bayes’ rule,
formulating the stochastic inverse as a MAP estimator is that
the constant diagonal term in equations (8) can be generalized P(dI U)P(U)
P(U I 4 = (12)
to an arbitrary diagonal matrix. The diagonal matrix is justi- p(d)
fied by allowing the model variances to differ from point to Now p(d ) u), the conditional probability of d given II, can be
point in the model domain. In a similar manner, nonconstant interpreted as the probability density of the noise p(n) since
noise variances in the data domain may be incorporated into the multidimensional random variable d is the sum of a deter-
the normal equations with a second diagonal weighting term. ministic Lu and a random noise term n. In other words, p(d =
In this way arbitrarily complicated model and noise variances bu + nI) for a given n, is equal to P(n = n,). The probability
can be incorporated into equations like equations (8), but for density p(d) enters only as the normalizing term
the equations to remain linear, the variances must be supplied
as a priori information.
The problem of unknown model variances

p(d)=
s P(dI U)P(U)du.
Thus. maximizing the a posteriori model density p(uld) is

(13)
equivalent to maximizing the product of the noise density p(n)

The requirement that model variances be specified before-
with the a priori model density p(u). Now, assuming that the
hand is an overly restrictive one. This is the chief limitation in
probabilities of equation (12) are in exponential (actually
the use of the linear stochastic inverse; a priori model vari-
Gaussian) form, it is convenient to define the following func-
ances must be estimated without knowledge of the solution,
tionals:
and yet these variances greatly affect the final solution. In
velocity-stack inversion, this requirement is equivalent to pre- S,.,.= C, - In p(d 1u), (14)
and cally, let u, (conditional upon II) represent an independent,

zero-mean. Gaussian process with variance u(’ :
S, = C, - In p(u). (15)
1 uf
The terms C, and C, are constants, independent of u, which P(UlP)= ifplpi) = fi J--
i= 1 \ 5Lpi
exp -T
2 p; 1 [ (21)
may be freely adjusted to simplify the expressions for S, and
S,. The MAP estimation problem is now expressed as the
If we constrain each pi to be a member of the class of values
minimization problem
’rllI‘r’’ Ii = 1. .,.( N ], we may think of the definition of equation
min (S,V+ S,), i = 1, __,m. (16) (21) as partitioning the M points of the velocity panel ui into
“I N distinct populations. Each population is characterized by a
The subscript i indexes the separate elements of vector u = (u,, 1-D Gaussian density with variance [u(k)12and zero mean.
u2. .“, u.~)~. M is the total dimension of model space; for For the moment, no constraints need be put on the density
example. if u represents a velocity panel of 24 traces, each p(p). In particular, p(p) does not have to be separable into a
consisting of 1 000 samples, M equals 24 000. Later we use product of I-D densities p(p,).
two subscripts p (slowness or inverse velocity) and r (time) to Let us now calculate the gradient g, of the sparseness
index each element of u in the velocity panel. Let p(n) repre- fitncrior~nl S, in equation (15). The ith term of gradient g, is
sent zero-mean, independent Gaussian noise with variance ran: is, 1 i
g,,, : - = - - -
p(n) = exp [C, - $I’ diag [q]-*II], ill, p(u) Al, p(u’
?
so that 1
P(P) 2 P(U I I4 4. (22)
S, = &u - d)’ diag [r~~~(Lu - d). (17) --1P(U)
Inserting the expression for the partial derivative with respect
If B is constant, the factor n-’ may be taken out of the diago-
to II, of p(u I p) [equation (21)] yields
nal term in equation (17); a uniformly weighted least-squares
functional remains. The gradient vector g, of the noise func- 1 -lli
tional S, is then defined by its elements 8Pi = - - PW 7 P(U I I4 &
P(U) I I
YNi = 2 =f
I
ILT(Lu- 4li >
=
u, P(P)P(U I I4
I,z
s P(U) +
where again the subscript i is used to index the components of
1
g. The normal equations LrLu = CTd result when the gradient = II, 7 P(P I u) &. (23)
is set to zero. j_[
If the a priori model density p(u) is likewise assumed to be
Bayes’ rule was used in equation (23) to define the new con-
zero-mean, independent, and Gaussian, S, is seen to equal
ditional density p(p I u). The last integral of equation (23) de-
(1/2)uT diag [I] -‘u, where B (a member of model space) is a
fines the mathematical expectation of u; * conditional upon u:
vector made up of the standard deviations pi of ui. The gradi-
ent g, of functional S, is consequently diag [B]-‘u, and
normal equations (18) are modified to (J, (24)
Thus
0 = g, + g, = diag [B]-‘u + -$ Lr(Lu - d). (19)
The solution u to equations (19) is known as the stochastic (25)

inverse. The diagonal of the normal equations is weighted
with the noise-to-signal ratio n2/u2; given enough noise, the The mathematical expectation in equation (24) allows the esti-
diagonal term guarantees that solution u exists and is unique. mated variance oz to have an explicit functional dependence
However, a large noise variance allows the solution u to relax on the entire vector u, not only on the corresponding point ui.
to zero. This is a consequence of the assumption that the The stochastic inverse can now be generalized to be the solu-
mean of u (as a random variable) is zero; the stochastic inverse tion u of the set of equations resulting from setting the total
estimate in equations (19) is biased toward zero. gradient g, + g,Vto zero:
The variances u’ in equations (19) are deterministic (speci-
fied a priori). An alternative to supplying a deterministic vari- O=g,+g,=diag[a]~‘u+$~r(Lu-d). (26)
ance is to allow B to be a random variable defined in model
space, which is logical if we wish to avoid providing such
When G is functionally independent of u, the equations are
precise a priori information about II. What are the conse-
linear and become equivalent to the stochastic inverse (in the
quences of defining B as a random variable? First, p(u) can be
strict sense) of equation (19).
expressed in terms of the new densities p(p) and p(u 1p):
In summary, the derivation of the system of equations (26)
directly from a MAP estimator has an advantage over the
P(U) = P(ldP(U I I4 &. (20) standard approach of starting from a least-squares functional,
s in that prior information about the solution can be incorpor-
The assumptions made on our original model density may ated via the specification of a rule for estimating variances as a
instead be applied to the conditional density p(uI~). Specifi- function of u. The specification of this rule can be thought of
as a generalization of the usual explicit specification a priori Equation (27) states that the expected value of pi: is depen-
variances. dent only upon the value up,~; any formula chosen to estimate
this expected value should likewise depend only upon the
Clustering criteria value of u at point (p, r). Selecting such a formula fixes the
Recall that no explicit correlation is imposed between any functional form of the sparseness gradient g, of equation (25);
two values ui and uj if p(u 1p) is defined by equation (211, since moreover, it establishes the form of the prior density p(u) in
the covariance matrix (composed of elements ~2) is diagonal. the MAP estimator.
Nevertheless, correlations between ui and u, may still arise. As an example, consider the simplest choice for op, i corre-
indirectly, by allowing correlations between their respective sponding to equation (27):
variances uf and u:. For example, if a close correlation exists
Rule 1:
between u, and uj for points j “close” to i in the velocity panel,
u, and ui will tend to be assigned to Gaussian populations of 0;: (up. ,) = I up.TI l. (29)
nearly the same variance. Elements of these populations (spe-
cific points i in the velocity panel) as a consequence will tend What does this rule fix the corresponding gradient and prior
to cluster together on the velocity panel. The imposition of a density p(u,,,) to be? For the remainder of this section, sub-
strong correlation between random variables pi and u, for scripts (p, T) are dropped; o and u always refer to the point (p,
adjacent points i and.j is a natural way to impose the property T) in the velocity panel. One element of the sparsenessgradient
of chsru-imq on the velocity panel u. This behavior is expected gp(u) from equation (25) is simply
with a standard velocity analysis: high-amplitude events on a
velocity panel are located about the root-mean-square (rms) gp(u) = i. (30)
U
velocity-time curve.
To suppress correlations between I+ and uj, it is sufficient to When a gradient descent method is used to solve the nonlin-
make the random variables pi and uj independent. As a conse- ear equations (26) the gradient must be continuous and must
quence, p(P) and p(Plu) are separable into products of 1-D be bounded from below in order to allow a minimum func-
densities, and equation (24) simplifies to tional value to exist. By itself, equation (30) is inadequate for
use as the sparsenesscomponent to the full gradient g, because
No clustering
it becomes infinitely discontinous at the origin u = 0. We
impose the following requirement on the sparseness gradient:
(oyc);’ = E[q’ / ui] = (27) y,,(u) must go to zero continuously as u goes to zero. The
easiest way to do this is to choose a lower limit c$, on the
If the expected value of uim’ is made dependent only on ele- estimated variance o*(u). In other words, a background popu-
ments rfj in the neighborhood of point i, equation (24) be- lation of points exists whose Gaussian distribution has zero
comes mean and variance of CT:. Likewise it is reasonable to assume
that very large values of u belong to a zero-mean, Gaussian
Clustering
population with the upper limiting variance IS:. With these
(ac),-* = E[p, ’ 1u,, ,j = j, near i] assumptions, rule I [equation (29)] is modified to
Rule 2:
4 ~O’I UI
One implementation of definition (28) appropriate to velocity
analysis is to estimate oz. ~as the running average of ui, ~along p*(u) = lul2 (J”I lul5 or. (31)
the time axis:
i o: ox < Iul
The estimated standard deviation o in equation (31) is plotted

as a function of u in Figure 6a. The gradient. shown in Figure
The double indices (p, T) index, respectively. slowness and time 6b, is consequently
on the velocity panel. By making the estimation of variance
U/O; ~O’I UI
o;. 1 independently of the adjacent slowness traces up,,., we
retain the independence of the solution u in slowness or veloci- HP(U)= l/u o” I IUI I oz. (32)
ty, while allowing a smooth variation in c$ along the time
axis. UPS’,
(J, < IUI
i
Recall that the sparseness gradient was defined as the de-
Sparseness criteria
rivative of the logarithm of the a priori density p(u) [equation
To explain better how the conditional expectation (1511
E[u,f I u] gives a sparse behavior to II, we assume for the
moment that no clustering rules such as equation (28) are in P(U) = exp ( - TpL
effect and that the expectation is in the form of equation (27). ”
We must define precisely what characteristics a “sparse” solu- S,>(U)z qp(u’) du’ + CT,, (33)
tion u exhibits; graphically, the solution to equations (26) has s
relatively few large elements scattered in a sea of small ele- where $,, is a single term of the sparseness functional S, of
ments. equation (15) at the point (p, K) and ep is an appropriate
constant. Gradient (In can now be integrated to give the upper limiting variance cr’, forces the gradient to be linear
at large values of u. Otherwise the gradient [in the form of
c, + u2,‘2a:, o0>lul equation (30)] would converge to zero as u+ X. This tend-
SPp(U)= C, + 1,2 + In (Iuj,crO) oo 5 IUI I0,. (34) ency toward zero for arbitrarily large values of u imposes poor
1
convergence characteristics upon any gradient descent method
C, + In (0, /crO)+ u’,‘205 or < 1u I
1 attempting to solve the stochastic system of equations (26).
Consequently, An additional incentive to placing limits on the range of
possible variances is to keep the resulting set of equations (26)
C2 exp - j
( > 7 00’ Iul
well-conditioned. When the terms IS;: are considered ele-
ments of the diagonal matrix om2, limiting the range of the
elements to lie between 00’ and 0;’ places an upper limit on
7 1 u201
p(u) = _- O,IlUl~O,, the condition number of (r- 2. namely, o’, /oi.
(-1zexp 2 (35)
( 1
I u2 The multidimensional sparsenessfunctional S, (II)
C, Fexp --- ox < Iul
( 2 crt > The previous section described various choices for a 1-D
where C, and C, are chosen to normalize the probability sparseness functional gP(u,, ,). When correlations are allowed
density function p(u) in equation (35). Functions gP and p(u) between points in model space, the total sparseness functional
are illustrated in Figures 6c and 6d, respectively. The prior S,(u) is no longer simply the sum of independent terms
density p(u) is easy to describe-it has a Gaussian shape for s,(u,,,). Let us now examine briefly what form S,(u) takes
low and high values of u, and. in the middle ranges, is pro- when the formula for the conditional expectation E[p,T’ I u] is
portional to 1II I- .‘ given by equation (2X). The standard deviation pi will be esti-
The lower limiting variance oi is needed to enforce the mated by )ii, a spatial average over the neighbors of u, :
continuity of the derivative g,, at the origin u = 0. Similarly,
k
where the values wik comprise a local smoothing filter. All

coefficients wik are positive, symmetric, and sum to unity. If
_.,oIu) S,(u) is assumed to depend upon u indirectly through the
estimated variances (fir, , fiM), that is, S,(u) = S,(p) for
some function S,(p), then S, may be differentiated by the
chain rule to give the gradient
gpi = c c:s,3
:-
j afi, aui
However, from equation (36),
so that
as, wji
gpiEuiCTT.
I aPj Pj
The desired form for gPi(u,), given by equation (25), is
g+. (40)
To make the expression for the gradient in equation (39) con-

FIG. 6. The prior density, sparseness functional, and gradient sistent with the gradient in equation (40), we define the esti-
implied by the sparseness criterion of equation (31). (a) Ex-
mated variance and functional derivative as
pected value of the standard deviation o given u, from equa-
tion (31). Variances are clipped by the minimum and maxi-
mum values of o0 and ox. Here, u0 = 0.5 and o, = 2.0. (b) (414
I-D gradient gP(u) of the sparseness functional, equation (32).
The gradient is linear outside the range (oO, o,), and de_cays
as u-r within this range. (c) The sparseness functional S,(u), and
the integral of the gradient in (b), from equation (34). The
slope of this potential “well” is steepest at u = crO. If there “s9_ 1 @lb)
were no upper limit or , the gradient of this surface would apj - clj’
vanish as u+ rx. and there would be no incentive for a descent
routine to drive large values of u smaller. (d) The resulting 1-D These assumptions yield the following functional dependence
prior density p(u), the exponent of the sparseness functional ofcrfonu:
[equation (35)]. The two dashed curves are the limiting Gaus-
sian envelopes whose standard deviations are o0 and err The 0; = [I wij [I Wjk u:] 1‘ ~ .‘
curve between these envelopes has an inverse-u decay. j k
That is, of is the average of the inverse of the average of ut If of entropy. In fact. expression (43) is precisely Shannon’s en-
the weights wij are smooth enough, I$ might be estimated tropy measure for a statistically independent. Gaussianly dis-
more easily by taking the local average of I$. tributed set of N random variables (Burg, 1975, section III-A).
By integrating equation (41a), we obtain the following ex- Our problem has a direct physical analogy to that of a
pression for S, (II): vibrating lattice of atoms. the so-called Einstein model (Careri,
1984). Each atom occupies a potential well. with the potential
S,(u) = S,(P) = C In fij + C, energy of each atom being a quadratic function of its displace-
ment from the center of the well. At high temperatures, the
= 4 1 In (C wjk u:) + C, (43) motions of all atoms are independent. so no direct correlations
I k
of motion exist between lattice sites. In this case the statistical
The threshold variances oi and cr’, can be easily introduced if state of the lattice is fully described by the mean displacements
fif is constrained to lie between them. Using equation (42) for of all atoms from their rest positions. that is. their variances.
the estimation of of, the gradient becomes Assume that the material is thermally isolated from its sur-
roundings. and that its total thermal energy is conserved. By
examining the expression for entropy given in equation (43),
we see that the material has highest entropy when the distri-
bution of vibrational energy throughout the lattice is most
uniform. or when pi is nearly constant throughout. On the
other hand. the entropy of the crystal can go negative without
Consequently, S, takes the form
bounds as the vibrational energy characterized by fij(uj) be-
comes concentrated at only a few, points in the lattice.
In using stochastic inversion. our goal has been to con-
centrate the solution in model space to as local an area as
possible: stochastic inversion. as we defined it. is a process
that drives the solution to rninir,~umrnrrop~*.
Of course. by specifying a low-end cutoff variance cri, we
arc adding the constraint that entropy cannot go to negative
in which the 0; are given by equation (36). The constants infinity. In terms of our analogy of the crystal lattice, atoms in
C, = In cri and C, = In o’, are required to make S, continu- all parts of the lattice must be allowed to retain a residual
ous at the threshold points o. and 0-r This multidimensional amount of vibrational energy.
version of S,(u) is the simplest generalization to the 1-D spar- Burg (1975) proposed maximizing the entropy measure of
senessfunctional gP(u) given by equation (34). equation (43) to make the variances 0: as uniform as possible
within the constraints imposed by the original time series data
(in the form of expected values). Our only justification for
Scale invariance and entropy mirCmi-_img the entropy functional of equation (43) under the
i-
constraints imposed by the recorded data is that we expect the
If u is uniformly scaled by the factor u, the 1-D sparseness solution to be sparse and clumped. The maximum entropy
functional of equation (34) becomes methods developed by Jaynes (1957). Burg (1975), and Shore
C, u2/2a202 aoo>lul and Johnson (1980) attempt to draw the least amount of infer-
0
ence from the available data. Minimum entropy methods,
S,(u) = C, + l/2 + In (I u I/so,) ao,Ilulsao,, such as the present method, mimic more closely human
C, + In (0, loo) + u2/‘2u205, thought processes in their attempt to draw the greatest
uo, < Iul
amount of inference from the information at hand.
(46)
where u now refers to the new, resealed value. Because of the SLANT-STACK STOCHASTIC INVERSION
presence of the logarithm, the gradient in the middle range of
u remains unchanged (equal to l/u); scaling affects the gradi- We can now apply the theory explained in the last section
ent only in redefining the threshold standard deviations o. directly to the problem of slant-stack inversion. Let & now be
and o, to, respectively, ao, and ao, In this respect the the slant-stack operator L,, defined in equation (10); LI is
sparseness functional is scale-invariant. If the limiting values subsequently given by equation (9):
ao, and ao, still bracket the range of “interesting” ampli-
Slant L,,:
tudes of u, resealing u will not markedly affect the value of the
functional. PZ
The multidimensional functional S,(u) derived in the last
d,,(h, f) = dp U(P, f - PN, (10)
sPI
section reduces to a simple form when o. and or are
ignored-the extensive quantity and
Slant LTs:
C ln Pi
hz
The variable 0, is an estimate of the standard deviation of the %,(P, 7) = dh d(h, T + ph). (9)
shl
local population about the point uj. There is a close relation-
ship between this expression for S,(u) and formal definitions If events in the data domain exhibit linear moveout, the slant-
stack stochastic inverse should be able to cluster sparse events aliasing problems than CMP gathers, because each trace is the
in the slant-stack (model) domain. This justifies use of a spar- recording of a single geophone in the well. Each trace on a
senessmeasure, such as S,(u) of equation (45), whose gradient CMP gather is usually the output of an extended geophone or
can be used to force the inverse to be sparse. Although the hydrophone array whose purpose is to filter out precisely
transformation L,, from model space to data space in this those events that might otherwise be aliased on the data.
section differs from the b of equation (6), the same sparseness The imaging power of stochastic inversion is demonstrated
functional and the same algorithm can be used to get an by the estimated stochastic inverse u of the synthetic data,
approximate inverse u z & 1d to the data. shown in Figure 9; u is the result of twenty iterations of the
Figure 7 illustrates a portion of a VSP courtesy of ARC0 same steepest descent algorithm used to generate the sto-
Oil and Gas Co. Four distinct families of events are visible on chastic inverse of the velocity stack, illustrated in Figures 1 to
Figure 7a: a 600 m window in depth from 0.8 to 1.4 km. The 5. The estimate of u is shown in Figure 9b, and the envelope of
strongest events, traveling downward at speeds of 3.0 to 3.6 u, i.e., a(u). is shown in Figure 9a. The stochastic inverse has
km/s, are the direct compressional waves from the source. the ability to reduce the aliased power by relegating it to its
Barely visible are events with the same velocity as the proper, unaliased location. There remain some events which
downgoing waves, but with opposite dip; they are compres- we can be certain are artifacts, because of the physical con-
sional waves reflected from the sedimentary interfaces near the straints imposed on event velocities in the VSP. Nevertheless,
wellbore. The direct downgoing and reflected upcoming the ability of the stochastic inverse u to model the original
waves, the events of interest on a VSP, provide velocity and data is remarkable. Figure IO compares the original data d
reflectivity information on the sediments in the vicinity of the with the estimated data L,, u and with the residual d - &,, u.
wellbore. The fit is impressive, considering that I,u is the result of 20
The remaining two families of events are upgoing and iterations of the algorithm on a system of equations of dimen-
downgoing waves with a propagation speed of approximately sion 15 000. For a detailed description of the gradient-descent
1.5 km/s. They are tube waves, i.e., interface waves which arise algorithm used in these examples, see Thorson (1984).
from the presence of the cylindrical interface between the well
casing and the borehole fluid. A reflected (upcoming) tube
wave can be seen to occur at approximately the 1 350 m level,
caused by a change in casing at that depth (DiSiena et al.,
1980). Since these events are restricted almost entirely to
within the wellbore, they provide no useful information about
depth (km) slowness (s/km)
the surrounding geology, but serve only to mask the direct
1.2 O.R -0.5 0.0 0.5
compressional events.
It would be desirable to filter the tube waves out of the
VSP, but this cannot be done directly because of the unfortu-
nate coincidence of aliased tube wave energy with the direct
wave energy in the Fourier domain. Let us consider the alias-
ing problem in more detail by slant stacking the VSP (Figure
7b). Energy on the slant stack clusters, as expected, at slow-
nesses of kO.6 and iO.3 s/km, corresponding to the respec-
tive velocities of the tube waves and the compressional waves.
Unfortunately, there is a large amount of overlap between the
aliased events of one wave type and the unaliased events of the
other wave type.
The aliasing on the slant stack L,:d in Figure 7b is quite
strong; this can be seen more clearly by considering the slant
stack of a simple synthetic model for u (Figure 8). This syn-
thetic model consists of the downgoing direct waveform and
downgoing tube waveforms sampled, at their true velocities,
from’the slant stack of Figure 7b. In Figure 8, the synthetic
data set in panel (b) was created by slant stacking (with L,,)
the synthetic u shown in panel (a). The forward stack (L,‘,d) of
the synthetic data is illustrated in panel (c). Comparing the
two stacks C,zd in Figures 7b and 8c, we see that the pattern
of events on the synthetic stack match quite closely the pat-
tern on the real data stack. However, it is obvious that all of
the energy outside the heavily outlined boxes of the synthetic
stack in Figure 8c are aliasing artifacts. They are the result of (a) VSP (b) Slant stacked VSP
constructive interference between events on the synthetic data
panel at dips other than the event’s true dip. It is an interest- FIG. 7. Slant stack on a death window of a VSP. The data in ...
ing coincidence that the peaks in the power of the aliased panel (a). courtesy of ARC@, consist of 21 traces taken from a
VSP between 800 m and 1 370 m with a death interval of 30 _._
5
events seem to lie precisely on top of the unaliased events, at m. Panel (h) is a slant stack over this winhow; the slowness
the slowness values of f 0.6 and k 0.3 s/km. interval is 0.026 s/km, and the time sampling interval is 4 ms.
Vertical seismic profiles are naturally more susceptible to Aliased energy dominates the slant stack.
slowness (s/km) depth(km) slowness (s/km)

-0.5 0.0 0.5 1.2 0.8 -0.5 0.0 0.5
0.0
0.5
z
3
1.0
1.5
(a) Model (b) Synthetic (c) Slant stacked synthetic

VSP
FIG. 8. Aliasing artifacts on a synthetic VSP. Panel (b) is a synthetic model of the VSP of Figure 7a. It was created by
slant stacking panel (a); the two nonzero traces in panel (a) were actually pulled from the slant stack of Figure 7b.
Panel (c) is the slant stack (L,Ld) of the synthetic data. The right panel may be compared to the slant stack of Figure
7b. The only “legitimate” events on the slant stack are those enclosed in the boxes with the heavy outline, i.e., those
which also occur on the model in panel (a). The aliasing artifacts on the synthetic slant stack match closely the artifacts
seen on the real slant stack in Figure 7b.
slowness (s/km)
0.5
3
S
‘ 1.0
1.5
(a) Stochastic inverse (b) Variance estimate

FIG. 9. Stochastic inverse of the slant stacked VSP. The estimate u in panel (a) is the result of applying 20 iterations of
the stochastic inverse algorithm to the real data of Figure 7a. Panel (b) is an estimate of a(u), the envelope of u. Most of
the energy has been concentrated in the unaliased regions of the slant stack representing the four dominant dips
present on the VSP.
SUMMARY: CONDITIONS FAVORABLE TO INVERSE The stochastic inversion of equation (26) is a global process;
VELOCITY STACKING it does not model changes in amplitude or phase along the
reflector from one offset to another very well. Up to a point it
The stochastic inversion process described is basically a dip- can accommodate waveform variations with offset because it
decomposition scheme. It assumes that the given data set is a reproduces smooth variations fairly well, but this is done at
linear superposition of events with hyperbolic moveout (or the expense of the image’s sharpness in velocity space. A re-
linear moveout, as the case may be). The validity of this as- placement for the velocity stacking operator E which is less
sumption can be destroyed easily by applying an automatic
global in nature will certainly be more successful when lateral
gain correction to the data, whose function is to make events
velocity variations create large deviations in moveout from the
visible by gaining the traces in a nonuniform manner. The
hyperbolic-moveout ideal.
examples considered all avoided this problem; instead of an
automatic gain. identical amplitude corrections were applied
ACKNOWLEDGMENTS
with a gain proportional to 1’.
This work was supported by the sponsoring members of the

depth (km)
Stanford Exploration Project. Among the sponsors we es-
1.2 0.8
pecially thank ARC0 and Western Geophysical Co. for their
generous contribution of the field data sets used here.
REFERENCES
Aki, K., and Richards, P. G., 1980, Quantitative seismology, theory
and methods, 11: W. H. Freeman and Co.
Bard. Y.. 1974. Nonlinear parameter estimation: Academic Press, Inc.
Blackman. R. B., and Tukey, J. W., 1958, The measurement of power
spectra: Dover Publications, Inc.
Burg, J. P.. 1975, Maximum entropy spectral analysis: Ph.D. thesis,
Stanford Univ.
Careri, G., 1984, Order and disorder in matter: Benjamin/Cummings.
Deans, S. R., 1983, The Radon transform and some of its appli-
cations: John Wiley and Sons, Inc.
DiSiena, J. P., Byun, B. S., and Fix, J. E., 1980, Vertical seismic
profiling: a processing analysis case history: Presented at the 50th
Ann. internat Mtg. and Expos., Sot. Explor. Geophys., Houston.
Gazdag, J., 1980, Wave equation migration with the accurate space
derivative method: Geophys. Prosp., 28,6%70.
Hale. D., 1984, Dip-moveout by Fourier transform: Geophysics, 49,
741-758.
Jacobs, A. S., 1982, The pre-stack migration of profiles: Ph.D. thesis,
Stanford Univ.
Jaynes, E. T., 1957, Information theory and statistical mechanics 11:
The Physics Review, 108, 171l190.
Judson. D. R., Schultz, P. S., and Sherwood, J. W. C., 1978, Equal-
izing the stacking velocities of dipping events via Devilish: Present-
ed at the 48th Ann. Internat. Mtg. and Expos., Sot. Explor. Geo-
phys., San Francisco.
Lanczos, C., 1961, Linear differential operators: D. van Nostrand
Company.
Larner, K., 1979, Optimum-weight CDP stacking: Western Geo-
physical Co., unpublished notes.
Musicus, B. R., 1982, Iterative algorithms for optimal signal recon-
struction and parameter identification given noisy and incomplete
data: Ph.D. thesis, Mass. Inst. of Tech.
Schultz, P. S., and Sherwood, J. W. C., 1980, Depth migration before
stack: Geophysics, 45, 376393.
Shore, J. E., and Johnson, R. W., 1980, Axiomatic derivation of the
(a> (cl principle of maximum entropy and the principle of minimum cross-
entropy: Inst. Elect. and Electron. Eng., Trans. Inf. Theory, IT-26,
FIG. -10. Restoration of the VSP. The center panel (b) is the 2637.
slant stack &,,,u of the inverse u already shown in Figure 9a. Thorson, J. R., 1984, Velocity stack and slant stack inversion meth-
L,Y,Yu
models the data d very well; the data set d of Figure 7 is ods: Ph.D. thesis, Stanford Univ.
Ulrych, T. J.. and Clayton, R. W., 1976, time series modelling and
replotted, for comparison, in panel (a). On the right, panel (c) maximum entropy: Phys. of the Earth and Plan. Int., 12, 188-200.
displays the difference, C,u - d, between the two panels (a) Yilmaz. O., and Claerbout, J. F., 1980, Prestack partial migration:
and (b). Geophysics, 45, 175331779.
APPENDIX A
ADJOINT PAIRS FOR VELOCITY STACKS AND SLANT STACKS
In the operator notation of the text, Lr may refer to either a implies that an inner product exists in each of the input (data)
slant stack or a velocity stack. We now address the question: and output (model) spaces of LT. ( -, - )p denotes the inner
what are the operators L that are adjoint to CT for each product in model space; likewise an h subscript denotes data
choice of stacking operator? The definition of an adjoint space. Operators C and LT are adjoint if they satisfy
(u, Lr4, = (Lu, d),, (A-1) and
for all u in the (p, r) domain and all d in the (h, t) domain that
have finite norms. For slant stacks, the inner products are
defined with unit weighting:
Slant
I The adjoint is then found by combining these inner product
dctinitions with equation (3). the definition for the velocity
@I, uJp = dp dT u,(P, Tbz (P> T)> (A-2)
1.i stack :
_I
and (!a dh = (u, L’d),
;c
s
I
(d,, d,), = dh dt d,(h, t) d, (h, t). (A-3) I
s1*
-J:
=I II
T dpdT u(p, T)
0
dh d(h, ,/T' + p2h2)
The adjoint L in the case of slant stacking is found by combin-

ing relation (A-l) with the definition for the slant stack, equa- I
tion (9) and interchanging the order of integration: =

T dp dT dh u(p, T) d(h, \/ T2 + p2h2). (A-9)
a
m
(Lu, d)h = dp dr 4~3 5) dh d(h, T + ph)
I! 5 -cc Now let t = Jm; then T = J;2_p2h2 and dr = t/T
-cc
dt, so that
J.r s
= dp dt dh u(p, t - ph) d(h, t) I
N (!A d),,= dpdh t dt U(P> (A-10)

oh
0
5
Err dh dt d(h, t) j dp U(P, t - PM. (A-4) If we assume that u(p, Jt2 - p2h2) = 0 for 0 I t I ph [that is,
--u u(p, T) is zero for an imaginary time argument T], the lower
-1
limit on the time integral may be replaced by zero, giving
A change of variable t = T + ph was made, and the order of x
integration may be interchanged if all of the integrals are

assumed to exist; that is, assuming
II n II: = (n, n)r and II d II: = (d, &

(Lu, 4, =
1.i
0
t dh dt d(h, t)
s
0
”
dp u(p, ,,/v). (A-l 1)
are both finite. The form for Lu is apparent in the last ex- The integral form for Lu can be identified as the innermost
pression of equation (A-4). integral of equation (A-l l), by comparing this equation with
Summarizing, a consistent slant stack adjoint pair L and L’ the previous definition for ( . , - )h given in equation (A-7). As
can be defined as in the slant-stack case, all integrals are assumed to exist. We
summarize the velocity stack case:
Slant L,,:
a Velocity C :
d,,@, t) = & U(P>f - ~4, (A-5)
and
Slant J&:
3c
s -m
and
d,,,(k f) =
s
0
a
dp U(P,J=-%?, (A-12)
%(P> 7) = dh d(p, T + ph). (A-6)

5 -n Velocity LT:
The only difference between a slant stack and its adjoint is

thus seen to be a change in sign of the dips over which sum-
mation is effected.
To derive an adjoint L to the velocity stack operator, it is
%s(P, 7) =
s
0
“dh d(h, dm).
The transpose to NM0 and stacking can be recognized as

(A-13)
convenient to define first an inner product weighted by the reverse NM0 and stacking. All points lying on the common
time variable in each space: elliptical path defined by r2 = t2 - p2h2 are summed into one
offset. Programming the transpose operation is virtually iden-
Velocity
m tical to programming the original operation of slant stacking
or velocity stacking; the only difference between the routines
(u*, up = T dp dr u,(p, T)Uz(P, @, (A-7) lies in the type of moveout stretch to apply to each trace
ss
0 before stacking.

(Thorson & Claerbout, 1985) Velocity Stack and Slant Stack Stochastic Inversion

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

(Thorson & Claerbout, 1985) Velocity Stack and Slant Stack Stochastic Inversion

Caricato da

Copyright:

Formati disponibili

GEOPHYSICS, VOL. 50, NO. 12 (DECEMBER 1985); P. 2727-2741, 10 FIGS.

Velocity-stack and slant-stack stochastic inversion

Jeffrey R. Thorson* and Jon F. ClaerboutS

inversion process is complicated by missing data, be-

Least-squares formulation of the problem

Let us look for an alternate choice for the velocity stacking

The stochastic inverse

Limitations (a) Stochastic inverse (b) Offset weighted

FIG. 3. A comparison between panel (a)+ the stochastic inverse

offset lkml offset (km1 offset (km1

(a) CMP gather (b) Stochastic estimate (c) Residual

velocity lkm/secl velocity (km/s ec I velocity (km/se ?C 1

(a) Offset weighted (b) Semblance analysis (c) Stochastic inverse

VELOCITY-STACK STOCHASTIC INVERSION MAP estimation is defined as maximization of p(u(d), the

The problem of unknown model variances

Thus. maximizing the a posteriori model density p(uld) is

equivalent to maximizing the product of the noise density p(n)

and cally, let u, (conditional upon II) represent an independent,

The solution u to equations (19) is known as the stochastic (25)

The estimated standard deviation o in equation (31) is plotted

where the values wik comprise a local smoothing filter. All

To make the expression for the gradient in equation (39) con-

slowness (s/km) depth(km) slowness (s/km)

(a) Model (b) Synthetic (c) Slant stacked synthetic

(a) Stochastic inverse (b) Variance estimate

This work was supported by the sponsoring members of the

(u, Lr4, = (Lu, d),, (A-1) and

(d,, d,), = dh dt d,(h, t) d, (h, t). (A-3) I

The adjoint L in the case of slant stacking is found by combin-

tion (9) and interchanging the order of integration: =

N (!A d),,= dpdh t dt U(P> (A-10)

integration may be interchanged if all of the integrals are

II n II: = (n, n)r and II d II: = (d, &

dp u(p, ,,/v). (A-l 1)

%(P> 7) = dh d(p, T + ph). (A-6)

The only difference between a slant stack and its adjoint is

The transpose to NM0 and stacking can be recognized as

Potrebbero piacerti anche