11 Multiattribute

11-1
Seismic Inversion and AVO

applied to Lithologic Prediction

Part 11 - Using multiattribute
transforms to predict log properties
from seismic data
11-2
Introduction to multiattribute analysis
The Objective of multiattribute analysis is to:

find a relationship between well log and seismic data at the well
locations, and
use this relationship to predict or estimate a volume of the log
property at all locations of the seismic volume.

The data that is needed includes:

A seismic volume (usually 3D).
A series of wells which tie the volume.
A target log in each well, such as porosity, which is to be predicted.
Information in each well for converting from depth to time, usually in
the form of a check-shot corrected sonic log.
(Optional) One or more external attributes in the form of seismic
3D volumes.
11-3
Theoretically, any type of log property may be used as a target.

Practically, the following types have been successfully predicted:

P-wave velocity
Porosity
Density
Gamma-ray
Water saturation
Lithology logs

The only requirement is that an example of the target log must exist
within each of the wells.

Since multiattribute analysis assumes that the target log is noise-free, it
is usually important to edit the target logs before applying the process.

Since we will be correlating the target logs with seismic data, the proper
depth-to-time conversion is critical. For this reason, check-shot
corrections and manual correlation are usually necessary.
Introduction to multiattribute analysis
11-4
Inversion Multiattribute analysis
Uses seismic and well log
data.
Uses seismic and well log
data.
Predicts a volume of
acoustic impedance.
Predicts a volume of any
log property.
Uses the convolutional
model to relate logs with
seismic.
Does not use any a priori
model. Instead,
determines an arbitrary
relationship statistically.
Requires the extraction of
the wavelet.
Does not require wavelet
extraction. Effectively, the
wavelet is part of the
derived relationship.
Operates on post-stack
seismic data.
Operates on attributes of
the seismic data, possibly
including pre-stack
attributes.
May be used with very few
wells as few as one.
Requires sufficient well
control at least six.
The result is validated by
creating a synthetic
seismic section which
matches the real data.
The result is validated by
hiding wells and
predicting them from other
wells.
The effective resolution is
limited by the seismic
bandwidth.
The resolution may be
enhanced by neural
network analysis.

Multiattribute analysis can be thought of as an extension of
conventional post-stack inversion:
11-5
Seismic Attributes are transforms, generally non-linear, of a seismic
trace.

There are two types of attributes:

Sample-based: calculated from the trace on a sample-by sample
basis.
Example: amplitude envelope.

Horizon-based: calculated as averages between two horizons.
Example: average porosity between two horizons.

The attributes used in this section are all sample-based.
Seismic attributes
11-6
Examples of seismic attributes:

Seismic attributes
11-7
Attributes derived directly from the seismic traces can be grouped
into the following categories:

Instantaneous attributes
Windowed frequency attributes
Filter slices
Derivative attributes
Integrated attributes
Time (a linear ramp)

We will look at the theory of each group of attributes with seismic
examples.

In our case study we will also use seismic attributes that cannot be
calculated internally because:
They are proprietary to a certain company e.g. coherency
They are too complicated e.g. seismic inversion, AVO attributes, etc
Seismic attribute summary
11-8
|(t)
Time
h(t)

s(t)
A(t)
Instantaneous Attributes were first described in the classic paper by Taner et al
(Geophysics, June, 1979). They are computed from the complex trace, C(t), which
is composed of the seismic trace, s(t), and its Hilbert transform, h(t), which is like
a 90 phase shifted trace. Writing the complex trace in polar form, as shown
below, gives us the two basic attributes: the amplitude envelope, A(t), and
instantaneous phase, |(t). (Note that the term instantaneous amplitude is used
synonymously with amplitude envelope.)
(
=
+ =
=
+ =
=
+ =
) t ( s
) t ( h
tan ) t (
) t ( h ) t ( s ) t ( A
1 i : where
) t ( sin ) t ( iA ) t ( cos ) t ( A
e ) t ( A
) t ( ih ) t ( s ) t ( C
1
2 2
) t ( i
|
| |
|
11-9
A third basic attribute is the instantaneous frequency, which is the
time derivative of the instantaneous phase. In equation form, we can
write:
Other instantaneous attributes can be made from combinations of
the three basic attributes, as shown below:
Finally, the apparent polarity attribute is the amplitude envelope
multiplied by the sign of the seismic sample at its peak value,
applied in a segment between the troughs on either side of this peak.
frequency ous instantane
dt
) t ( d
) t ( = =
|
e
frequency weighted amplitude ) t ( ) t ( A
phase weighted amplitude ) t ( ) t ( A
phase inst cos weighted amplitude ) t ( cos ) t ( A
phase ous instantane cosine ) t ( cos
=
=
=
=
e
|
|
|
11-10
Now, lets look at examples of each of the instantaneous attributes
applied to inline 95 from the input 3D volume. The line is shown below
in colour amplitude form with wiggle trace overlay. The sonic log from
well 08-08 has also been overlain.
11-11
Instantaneous phase of inline 95.
Amplitude envelope of inline 95.
11-12
Amplitude weighted cosine
phase of inline 95.
Cosine of instantaneous
phase of inline 95.
11-13
Amplitude weighted phase
of inline 95.
Apparent polarity of inline 95.
11-14
Instantaneous frequency
of inline 95.
Amplitude weighted
frequency of inline 95.
11-15
From this window, either the average frequency amplitude or
the dominant frequency amplitude is chosen, and this value is
placed at the center of the window. A new window is then
chosen M samples later, and the new frequency attribute is
calculated, and so on.

In the following examples N =64 and M =32.
A second set of attributes is based on a windowed frequency
analysis of the seismic trace. In this process, the Fourier transform
of each seismic trace is taken over a N sample window.
11-16
Dominant frequency of inline 95.
Average frequency of inline 95.
11-17
Filter slice attributes
A third set of attributes is comprised of narrow band filter slices of the
seismic traces. For example the following 6 slices could be used:

5/10 15/20 Hz
15/20 25/30 Hz
25/30 35/40 Hz
35/40 45/50 Hz
45/50 55/60 Hz
55/60 65/70 Hz

The figures on the next slide show the lowest and highest frequency
slices.

11-18
5/10 15/20 Hz filter slice
of inline 95.
55/60 65/70 Hz filter slice
of inline 95.
Filter slice attributes
11-19
.
,
2
2 1 1
1
2 1 1
2
1
t
s s s
t
d d
d
t
s s
d
i i i i i
i
i i
i
A

=
A
=
A
The derivative examples on the next two slides are from

inline 95.
A fourth set of attributes is based on the first or second derivative of
the seismic trace or its amplitude envelope (or instantaneous
amplitude, which you recall is synonymous with amplitude envelope).
The derivatives are calculated in the following way, where s
i
is the i
th

seismic or amplitude envelope sample, d1
i
is the i
th
first derivative, d2
i

is the i
th
second derivative, and At is the sample rate:
11-20
Derivative of inline 95.
Derivative of amplitude
envelope of inline 95.
11-21
Second derivative of inline 95.
Second derivative of amplitude
11-22
1
+ =
i i i
I s I
At the end of the running sum the integrated seismic trace is filtered
by running an N point smoother along it, and removing the resulting
low frequency trend. The integrated amplitude envelope is
normalized by dividing by the difference between the minimum and
maximum samples over the total number of samples.

The integrated examples on the next slide are from inline 95.
A fifth set of attributes is based on the integrated seismic trace or its
amplitude envelope (or instantaneous amplitude, which you recall is
synonymous with amplitude envelope). The integrated values are
calculated in the following way, where s
i
= the i
th
seismic or
amplitude envelope sample, I
i
= the integrated value. Note that this
is a running sum.
11-23
Integrated traces of inline 95.
Integrated amplitude
11-24
Time attribute of inline 95 (Note: it would look the same for any line in
the volume!).
Time attribute
The last attribute is the time attribute. This is simply the time value of
the seismic trace, and thus forms a ramp function that can add a
trend to the computed reservoir parameter.

Here is a plot of the time attribute:
11-25
Linear regression
This display shows
the target log, a
seismic trace, and an
external attribute (the
inverted seismic
trace):
Multiattribute analysis tries to find a relationship between the target
log and a combination of attributes of the seismic trace. One simple
way of doing this is to measure the correlation between the target data
and an attribute by cross plotting the two and computing the linear
regression coefficients.

11-26
This is a cross plot, showing the target, P-wave, on the vertical axis against
the inversion attribute, with the linear regression line shown in red:
Linear regression
11-27
=
=
N
1 i
2
i i
2
) bx a (y
N
1
E
=
=
N
1 i
y i x i xy
) m )(y m (x
N
1
where the means are defined as:

The covariance is defined as:
We can analyze the cross plot on the previous page in the
following way. The regression line has the form:
bx a y + =
This line minimizes the total prediction error:
=
=
N
1 i
i x
x
N
1
m

=
=
N
1 i
i y
y
N
1
m
Linear regression
11-28
The prediction error is the RMS difference between the actual target
log and the predicted target log.
Applying the regression line gives a prediction of the target attribute:
The normalized covariance is defined as:
y x
xy
o o
o
=
Linear regression
11-29
The correlation on the linear regression can sometimes be improved by
applying a non-linear transform to either the target variable or the
attribute variable or both:
Linear regression
11-30
Cross plotting against 2
attributes:
Cross plotting against 1 attribute:
Multilinear regression
An extension of the conventional cross plot is to use multiple attributes,
which leads to the concept of multilinear regression.
11-31
The concept of multilinear regression using multiple seismic attributes
is illustrated above. At each time sample, the target log is modeled as a
linear combination of several attributes.
Multilinear regression with seismic attributes
11-32
where: |(t) = porosity
I(t) = acoustic impedance
E(t) = amplitude envelope
F(t) = instantaneous frequency
( ) ( ) ( ) ( ) t F w t E w t I w w t
3 2 1 0
+ + + = |
N 3 N 2 N 1 0 N
2 3 2 2 2 1 0 2
1 3 1 2 1 1 0 1
F w E w I w w
F w E w I w w
F w E w I w w
+ + + =
+ + + =
+ + + =
|
|
|
Predicting porosity using three attributes

This can be written as a series of linear equations:
Let us illustrate this concept using three attributes to predict porosity.
Notice below that we have added a zero weight, or bias, to include a
possible DC component in the solution:
11-33
Or, in matrix form as:
Aw p or ,
w
w
w
w
F E I 1
F E I 1
F E I 1
3
2
1
0
N N N
2 2 2
1 1 1
N
2
1
=
(
(
(
(
(
(
(
(
=
(
(
(
(

|
|
|
This can be solved by least-squares minimization to give
| | p A A A w
T
1
T

=
,
F F F
E E E
I I I
1 1 1
F E I 1
F E I 1
F E I 1
F F F
E E E
I I I
1 1 1
w
w
w
w
N
2
1
N 2 1
N 2 1
N 2 1
1
N N N
2 2 2
1 1 1
N 2 1
N 2 1
N 2 1
3
2
1
0
(
(
(
(
(
(
(
(
|
|
|
|
|
.
|
\
|
(
(
(
(
(
(
(
(
=
(
(
(
(

|
|
|
As a detailed computation, note that:

11-34
=
=
N
1 i
2
i 3 i 2 i 1 0 i
2
) F w E w I w w (
N
1
E |
These weighting coefficients minimize the total prediction error:
Or, more completely:
,
F
E
I
F E F I F F
F E E I E E
F I E I I I
F E I N
w
w
w
w
N
1 i
i i
N
1 i
i i
N
1 i
i i
N
1 i
i
1
N
1 i
2
i
N
1 i
i i
N
1 i
i i
N
1 i
i
N
1 i
i i
N
1 i
2
i
N
1 i
i i
N
1 i
i
N
1 i
i i
N
1 i
i i
N
1 i
2
i
N
1 i
i
N
1 i
i
N
1 i
i
N
1 i
i
3
2
1
0
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
=
(
(
(
(

=
=
=
=
= = = =
= = = =
= = = =
= = =
|
|
|
|
11-35
Obviously, we can extend multilinear regression to any number
of attributes. In fact, it would appear that the more attributes the
better. This is because:

Decreasing Prediction Error

The prediction error for N+1 attributes can never be larger than
the prediction error for N attributes.

How can we be so sure?

If it were not true, we could always make it so by setting the last
coefficient to zero.
Multilinear regression with seismic attributes
11-36
The first problem that we have is as follows:

Given the set of all attributes, how can we find combinations of attributes
which are useful for predicting the target log?

Multiattribute analysis uses a process called step-wise regression:

(1) Step 1: Find the single best attribute by trial and error. For each
attribute in the list (e.g. Amplitude Weighted Phase, Average
Frequency, etc) calculate the prediction error. The best attribute
is the one with the lowest prediction error. Call this attribute
1
.

(2) Step 2: Find the best pair of attributes, assuming that the first
member is attribute
1
. For each other attribute in the list, form all
pairs,
(attribute
1
, Amplitude Weighted Phase),
(attribute
1
, Average Frequency), etc.
The best pair is the one with the lowest prediction error. Call this
second attribute attribute
2
.
Choosing combinations of attributes
11-37
(3) Step 3: Find the best triplet of attributes, assuming that the first
two members are attribute
1
and attribute
2
. For every other
attribute in the list, form all triplets:
(attribute
1
, attribute
2
, Amplitude Weighted Phase),
(attribute
1
, attribute
2
, Average Frequency), etc.
The best triplet is the one with the lowest prediction error. Call
this third attribute attribute
3
.

Carry on this process as long as desired.

Decreasing Prediction Error

The prediction error, E
N
, for N attributes is always less than or equal to
the prediction error, E
N-1
, for N-1 attributes, no matter which attributes
are used.
Choosing combinations of attributes
11-38
Validation of attributes
How can we know when to stop adding attributes?

Adding attributes is similar to fitting a curve through a set of points, using
a polynomial of increasing order:
Fourth Order
First Order
Third Order
11-39
The problem is that while the higher order polynomial predicts the
training data better, it is worse at interpolating or extrapolating beyond
the limits of the data as shown below. It is said to be over-trained:
For each polynomial, we can calculate
the Prediction Error, which is the RMS
difference between the actual y-value
and the predicted y-value.

As the order of the polynomial is
increased, the prediction error will
always decrease.
Fourth Order
11-40
To determine the validity of attributes, we can use the following
Validation procedure:

(1) Divide the entire data set into two groups:
Training data set
Validation data set
(2) When determining coefficients by regression, use the Training
Data Set
(3) When measuring the prediction error, use the Validation Data Set
As the figure to the right
shows, a high order
polynomial which fits the
Training Data well may still fit
the Validation Data poorly.
This indicates that the order
of the polynomial is too high.
11-41
Multiattribute analysis performs Validation by systematically leaving out
wells.

Assume we have 5 wells: {Well
1
, Well
2
, Well
3
, Well
4
, Well
5
}

Assume we have 3 attributes: {Impedance, Envelope, Frequency}

Perform the Validation

(1) Leave out Well
1
. Solve for the regression coefficients using only
data from {Well
2
, Well
3
, Well
4
, Well
5
}. This means solving this system of
equations, where the rows contain no data from Well
1
:
N 3 N 2 N 1 0 N
2 3 2 2 2 1 0 2
1 3 1 2 1 1 0 1
F w E w I w w
F w E w I w w
F w E w I w w
+ + + =
+ + + =
+ + + =
|
|
|
11-42
=
=
N
1 i
2
i 3 i 2 i 1 0 i
2
) F w E w I w w (
N
1
E |
(2) With the derived coefficients, calculate the prediction error for
Well
1
. This means calculate the following:
(3) Repeat this process for Well
2
, Well
3
, etc., each time leaving the
selected well out in the calculation of regression coefficients,
but using only that well for the error calculation.

(4) Calculate the Average Validation Error for all wells:
where now only data points for Well
1
are used. This gives us the
Validation Error for Well
1
, E1.
( )
5
E E E E E
E
5 4 3 2 1
A
+ + + +
=
11-43
This is a validation plot for
a typical analysis:

The horizontal axis shows
Number of Attributes used
in the prediction. The
vertical axis shows the
Root-Mean-Square
Prediction Error for that
number of attributes.
The lower (black) curve shows the error calculated using the Training
Data. The upper (red) curve shows the error calculated using the
Validation Data. The figure above shows that when 5 or more attributes
are used, the Validation Error increases, meaning that these additional
attributes are over-fitting the data.
11-44
This approach is limited
because it ignores the fact
that there is a big difference in
frequency content between
logs and seismic data, as
shown in this zoomed display:
Using a convolutional operator
Multiattribute Analysis so far
correlates each target sample
with the corresponding samples
on the seismic attributes:
11-45
Each target sample is
predicted using a weighted
average of a group of samples
on each attribute. The
weighted average is
convolution.
is now replaced by:
The convolutional operator
extends the cross plot
regression to include
neighboring samples:
N N 2 2 1 1 0
A w A w A w w P + + + + =
N N 2 2 1 1 0
A w A w A w w P - + + - + - + =
The previous equation:
where * represents convolution by an operator.
11-46
(
(
(
(
+
(
(
(
(
+ =
(
(
(
(
4
3
2
1
2
4
3
2
1
1 0
4
3
2
1
E
E
E
E
w
I
I
I
I
w w
|
|
|
|
Consider the example of predicting porosity from two attributes:
Now let the weights, w
i
, become 3-point convolutional operators:
( ) ( ) ( ) | |
T
i i i i
1 w , 0 w , 1 w w =
11-47
(
(
(
(
(
(
(
(
+
+
+

+
(
(
(
(
(
(
(
(
+
+
+

+ =
(
(
(
(
4
3
2
1
2 2
2 2 2
2 2 2
2 2
4
3
2
1
1 1
1 1 1
1 1 1
1 1
0
4
3
2
1
E
E
E
E
(0) w 1) ( w 0 0
1) ( w (0) w 1) ( w 0
0 1) ( w (0) w 1) ( w
0 0 1) ( w (0) w
I
I
I
I
(0) w 1) ( w 0 0
1) ( w (0) w 1) ( w 0
0 1) ( w (0) w 1) ( w
0 0 1) ( w (0) w
w
|
|
|
|
(
(
(
(
(
(
(
=
(
(
(
(
) 1 ( w
) 0 ( w
) 1 ( w
I I 0
I I I
I I I
0 I I
1
1
1
3 4
2 3 4
1 2 3
1 2
4
3
2
1
|
|
|
|
The second term can be re-arranged to give:
This is a new system of linear equations in
which each weight, w
i
, has been replaced
by three weights, w
i
(-1), w
i
(0), w
i
(1). This
can be solved by least-squares regression
just as before. The only difference is that
for two attributes, we now have 3+3+1 = 7
parameters.
The new matrix equation becomes:
11-48
Using the Convolutional
Operator is like adding
more attributes: it will
always improve the
Prediction Error, but the
Validation Error may not
improve the danger of
over-training is increased.
11-49
This example shows that
as the operator length is
increased, the Training
Error always decreases.
The Validation Error
decreases to a minimum
and then increases again
for longer operators.
11-50
We will now illustrate the preceding concepts using a channel
sand case study from Alberta, the Blackfoot 3-D survey, with the
following parameters:

Location: Southern Alberta, Canada

Recorded: October, 1995

Target: the Glauconitic member of the Mannville group.

Reservoir: sand channel at depth of 1550 m.
12 wells tie the 3-D volume.

Our goal will be to estimate P-wave velocity over the complete
survey area using a fit between the 12 wells and the seismic
attributes.
Blackfoot Channel Sand Case Study
11-51
This map shows the location of the Blackfoot survey area, with the portion
used in this study outlined in red. The objective, a Glauconitic channel
within the Lower Cretaceous Mannville formation, is shown running north-
south on the map.
Alberta
Calgary
N
Channel
11-52
This map shows a blow-up the Blackfoot survey area, with the wells
annotated. Notice that the survey has been rotated by 90 degrees to fit
the page better. The red line on the right side of the survey is the
location of the seismic line shown in the next figure.
11-53
The figure above shows in-line 95 from the Blackfoot 3D survey. The sonic
log from well 08-08 has been inserted at its correct location, and a seismic
event 20 ms above the channel has been picked and annotated.
11-54
The figure above shows the tie between the synthetic seismogram from well
08-08 and an averaged trace from its location on the Blackfoot 3D survey.
11-55
The figure above shows the tie between the impedance inversion of line 95,
using a model-based inversion. Notice that the channel is clearly visible in
the centre of the line at a time of between 1060 and 1080 ms.
11-56
The map above shows the arithmetic average of the impedance within the
channel sand throughout the Blackfoot 3D survey. Note the well defined
channel location, shown as a low impedance anomaly.
11-57
The figure above shows the sonic log on the left, the seismic trace in the
middle, and the inverted trace on the right for two of the twelve wells in the
survey area. The sonic log will be estimate from the trace and impedance
attributes.
11-58
The figure above shows (a) the crossplot between the seismic trace and the
sonic logs, and (b) the crossplot between the inverted trace and the sonic
logs for all twelve wells. Notice that the impedance fits the logs much better
than the seismic trace.
(a) (b)
11-59
The above figure shows the RMS and correlation coefficient between
nonlinear transforms of the P-wave log and the inverted impedance.
Notice that the best correlation is -0.511.
11-60
The above figure shows the best combination of attributes, using a 7
point convolutional operator. As we add each successive attribute in the
list, the training error goes down. However, note that the validation error
goes up for the last attribute. This is shown visually in the next slide.
11-61
The above figure shows a graphical display of the training and validation
error. The training error goes down continuously. However, note that the
validation error goes up for the last attribute.
11-62
The above figure shows the application of multiple attribute regression
over the complete logs. However, the training was done just between the
zone of interest, shown by the blue horizontal lines.
11-63
The above figure shows the application of multiple attribute regression
over the zone of interest, using all the wells.
11-64
The above figure shows the validation of multiple attribute regression over
the zone of interest, found by leaving out, and predicting, each well in turn.
11-65
The above figure shows the error, both validation (red) and total (black)
for each well. Notice that well 13-16 appears anomalous.
11-66
The above figure shows the application of multi-attribute regression with
convolutional weights to one of the lines from the Blackfoot dataset.
Note the excellent definition of the channel.
11-67
The above figure shows the application of multi-attribute regression with
convolutional weights to the complete Blackfoot dataset, averaged at the
channel itself. Note the lateral extension of the channel.
11-68
Neural Networks
Now that we have looked at the theory of multiple attribute
regression and applied it to a channel sand case study, we will
look at the theory of neural networks and see how they can be
used to improve the prediction of reservoir parameters.

We will consider three different types of neural networks.

After discussing the theory, we will return to our channel sand
case study.

11-69
Log
Non-linear prediction:
Attribute
Linear prediction:
Log
Attribute
Why use a Neural Network?
We want to
account for non-
linear
relationships
between logs and
attributes.
11-70
Neural Networks
We will consider three types of Neural Network:

MLFN Multi-Layer Feed Forward

- Similar to traditional back-propagation.

PNN Probabilistic Neural Network

- Used to classify data, similar to
discriminant analysis, or Bayesian
classification.

GRNN Generalized Regression Neural Network

- Used to predict data, similar to regression
analysis.

11-71
MLFN Neural Network
In the above diagram, the black circles in the input layer contain the input
attributes and the black circles in the hidden and output layers represent
neurons, or perceptrons, which consists of a summation and nonlinear
activation function. The connecting lines between the circles are weights,
which are determined by the algorithm. This is done by minimizing the
error between the output and a training sample.
11-72
2 nodes in
hidden layer:
5 nodes in
hidden layer:
Effect of Changing Nodes in Hidden Layer
These displays show the effect of changing the number of hidden layer
nodes for the simple 1-attribute case:
11-73
5 nodes in
hidden layer:
10 nodes in
hidden layer:
Effect of Changing Nodes in Hidden Layer
11-74
MLFN Neural Network Summary
Advantages:

(1) Traditional form is well described in all Neural Network books.

(2) Once trained, the application to large volumes of data is relatively
fast.

Disadvantages:

(1) The network tends to be a black box with no obvious way of
interpreting the weight values.

(2) Because simulated annealing uses a random number generator to
search for the global optimum, training calculations with identical
parameters may produce different results.
11-75
The above figure shows the application of MLFN to four of the twelve
wells in the Blackfoot dataset. Note that the correlation coefficient is
slightly above 0.7.
11-76
The above figure shows the cross plot of actual P-wave values versus
predicted P-wave values for all of the wells using the MLFN algorithm.
11-77
The above figure shows the application of GRNN to one line from the
Blackfoot dataset. Note the improvement in the detail of the channel over
the linear multi-regression technique. The application window starts at
950 ms.
11-78
The above figure shows the application of MLFN to the complete
Blackfoot dataset, averaged at the channel itself. Note the improved
lateral extension of the channel over multilinear regression.
11-79
Probabilistic Neural Network (PNN)
The Probabilistic Neural Network, or PNN, is a second
type of neural network used for multi-attribute analysis.
The PNN is used for classification.

In classification, we classify an input seismic sample
into one of N classes (e.g. sand, shale, carbonate, or oil,
gas, water, etc.) This can also be done using Linear
Discriminant Analysis (LDA), (which will be discussed in
the next section) so PNN can be thought of as the non-
linear extension of LDA.

To understand PNN, we will first look at the concept of
distance in attribute space, and then start with the
classification problem.

11-80
Assume we have three points (p
1
, p
2
, and p
3
) on a map that are functions of
the coordinates X and Y, and we want to relate them to point p
0
.

This can be done using the distances from point p
0
to each of the
other points:
X
Y
p
1
p
2

p
3

d
1

p
0
d
2

d
3

x
1
x
0
x
3
x
2
y
3
y
0
y
1
y
2
( ) ( ) 3. , 1, i , y y x x d
2
0 i
2
0 i
2
i
= + =
11-81
Log Seismic Attributes
X Y
x
1
x
2
x
3
x
0
y
1
y
2
y
3
y
0
Now let us revisit the earlier picture showing the log and attributes, where we
have dropped the third attribute. If we label the two attributes X and Y, and
show four points on the attributes, we can now reinterpret the previous plot
as meaning distance in 2D attribute space (which can be generalized to N-
dimensional space. Note that we have not yet considered the log.

11-82
Next, we will use the log only to indicate where there are two different
classes, A and B (maybe sand and shale, or gas sand and wet sand).
The first three points are in Class A. Three more points have been
added, in Class B. Let us see what this would look like on an X-Y plot.

X Y
x
1
x
2
x
3
x
0
y
1
y
2
y
3
y
0
x
4
x
5 x
6
y
4
y
5
y
6
Class A
Class B
11-83
X
Y
p
1

p
2

p
3

d
1

p
0
d
2

d
3

p
6

p
4

p
5

d
4

d
5

d
6

Class A
Class B
In the above figure, all six points have been plotted in attribute space, and
the distances between point p
0
and all the other points have been
annotated. Notice that point p
0
is closer to Class A than it is to Class B.
11-84
2
2
6
2
2
5
2
2
4
2
2
3
2
2
2
2
2
1
d d d
0 B
d d d
0 A
e e e ) p ( g and , e e e ) p ( g
o o o o o o

+ + = + + =
This leads us to the famous Bayes Theorem, which allows us to assign
a probability to each class, as follows:
The decision is then simple. If P
A
>P
B
, the point p
0
is in Class A, and if
P
A
<P
B
, the point p
0
is in Class B.
The PNN Weighting Function
In fact, PNN does not use distance on its own, but applies an exponential
weighting function to the distance (called the Parzen Estimator). For the
two classes, we can write:
) p ( g ) p ( g
) p ( g
P and ,
) p ( g ) p ( g
) p ( g
P
0 B 0 A
0 B
B
0 B 0 A
0 A
A
+
=
+
=
11-85
To visualize the effect of the weighting functions, here are the functions
for Class A on its own and Class B on its own, and the two classes
together. This is for a 6 point problem similar to the one shown earlier.
Visualizing the weighting functions
11-86
Here are the probability functions for finding Classes A and B. Note that
this is a simple 2 class linear problem with only two attributes, and most
applications are much more complex.

Visualizing the probability functions
11-87
The effect of sigma on PNN
Here is the effect of varying sigma for both classes. In one case, the
value is too low, and the result is too spiky. In the other, the value is
too high, and the result is too smooth.
11-88
Generalized Regression Neural Network (GRNN)
The Generalized Regression Neural Network, or GRNN, is a
third type of neural network used for multi-attribute analysis.

The GRNN is used for mapping, or prediction, of reservoir
values.

In mapping, we map an input seismic sample into a reservoir
parameter such as porosity, using the multiple attributes.

This is the same thing that we did with multi-linear regression
and MLFN, but GRNN uses a different approach, based on the
PNN.

11-89
Now let us once again look at the earlier picture showing the log and
attributes. Now we will let the p
i
values be the log values themselves,
where only p
0
is unknown. Let us look at the formula for predicting the
unknown log value, which is simply an extension of classification.

X Y
x
1
x
2
x
3
x
0
y
1
y
2
y
3
y
0
p
1
p
3
p
0
p
2
p
N
x
N
y
N
All values known

p
0
unknown, predict from x
0

and y
0
.

p
1
to p
N
known
11-90
Consider the first 3 points, which we will call the training points:
(
(
(
1
1
1
y
x
p
(
(
(
2
2
2
y
x
p
(
(
(
3
3
3
y
x
p
Log value
Attributes
We wish to get a new output point, p
0
, where we know
the values of x
0
and y
0
:
(
(
(
0
0
0
y
x
p
As in classification, we solve for p
0
by
comparing the attributes associated with p
0
:

(
0
0
y
x
with the attributes associated with p
1
to p
3
:
(
1
1
y
x
(
2
2
y
x
(
3
3
y
x
11-91
But now, we multiply the exponential functions from each training
point with the known log values, and divide by the sum of the
exponential functions. Remember that the distances all relate to
points x
0
and y
0
.
2
2
3
2
2
2
2
2
1
2
2
3
2
2
2
2
2
1
d d d
d
3
d
2
d
1
0
e e e
e p e p e p
p
o o o
o o o

+ +
+ +
=
Notice the similarity of the above equation to the multi-linear
regression equations. However, in multi-linear regression, the
covariance matrix contains cross-products of the log values with the
attributes themselves rather than with the weighting functions.
11-92
But the distances are calculated the same way:
In GRNN, we usually have many more than 2 attributes:
(
(
(
(
(
(
1
1
1
1
1
F
A
y
x
P
(
(
(
(
(
(
2
2
2
2
2
F
A
y
x
P
(
(
(
(
(
(
3
3
3
3
3
F
A
y
x
P
Instantaneous amplitude
Instantaneous frequency
( ) ( ) ( ) ( )
2
0 1
2
0 1
2
0 1
2
0 1
2
1
F F A A y y x x d + + + =
And so are the weighting functions used in both classification and
mapping.
11-93
PNN and GRNN Summary
The PNN is used for classification and the GRNN for
mapping.

In classification we need only the weights that depend on the
distance from the desired point to the training points.

The distance is measured in multi-dimensional attribute
space.

The distance is scaled by smoothers (the sigma values),
which are determined automatically by cross-validation.

In mapping, the weighting functions are multiplied by the
known log values to determine the unknown log values.
11-94
Sigma optimized
automatically:
Sigma reduced
to 1/10 the
optimized value:
GRNN - Effect of Changing Sigmas
11-95
Sigma optimized
automatically:
Sigma reduced to
1/2 the optimized
value:
11-96
Sigma optimized
automatically:
Sigma increased
to 2 times the
optimized value:
11-97
(a) Trend predicted from
multi-attribute transform
(b) GRNN Prediction of
residual
(c) GRNN Prediction
without cascading
Neural networks work best on stationary data, or data without a long period
trend. To facilitate this, we (a) first remove the trend, apply GRNN to the
residual, and then either add back the trend, as in (b), or not, as in (c).
Cascading with a trend
11-98
PNN/GRNN summary
Advantages:

(1) Because the GRNN/PNN are mathematical interpolation
schemes, the derived sigmas may be interpreted as the relative
weight given to each attribute.

(2) Unlike the MLFN, the training process is reproducible.

(3) In classification mode, PNN may produce probability estimates.

Disadvantages:

(1) Because the GRNN keeps a copy of all the training data, the
application time to the 3D volume may be very large. This
application time is proportional to the number of training samples.
This problem may be alleviated by applying to a small target
window.
11-99
The above figure shows the application of the GRNN to four of the twelve
wells in the Blackfoot dataset. Note that the correlation coefficient is 0.84,
much better than regression or the MLFN.
11-100
The above figure shows the cross plot of actual P-wave values versus
predicted P-wave values for all of the wells using the GRNN algorithm.
11-101
The above figure shows a comparison between (a) the cross plot of
actual P-wave values versus predicted P-wave values for all of the wells
using the MLFN algorithm, and (b) the cross plot of actual P-wave values
versus predicted P-wave values for all of the wells using the GRNN
algorithm. Notice the improvement with GRNN.
(a)
(b)
11-102
The above figure shows the validation of the GRNN to four of the twelve
wells in the Blackfoot dataset. Note that the correlation coefficient is 0.63,
which is still very good.
11-103
The above figure shows the application of GRNN to one line from the
Blackfoot dataset. Note the improvement in the detail of the channel over
the linear multi-regression technique.
11-104
The above figure shows the application of the GRNN to the complete
Blackfoot dataset, averaged at the channel itself. Note the improved
lateral extension of the channel.
11-105
(a) (b)
(c)
(d)
The above figure shows a comparison between the maps of the channel
from (a) inversion, (b) multi-linear regression, (c) MLFN, and (d) GRNN.
11-106
Summary of multiattribute analysis
Multiattribute analysis predicts log properties from seismic attributes,
using training data at well locations to determine a relationship, which is
then applied to a seismic volume.

Multiattribute analysis does not assume any particular model, but uses
statistical analysis to determine the attribute/log relationship.

Multiattribute analysis uses step-wise regression to determine the optimal
ordering of attributes.

Multiattribute analysis uses validation to determine the optimal number of
attributes.

Multiattribute analysis extends cross plotting to include the convolutional
operator, which accounts for frequency differences between target and
attributes.

Multiattribute analysis may be used to predict logs from other logs.

Multi-attribute analysis uses Neural Networks to enhance the high
frequency resolution and perform classification.

11 Multiattribute

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

11 Multiattribute

Caricato da

Copyright:

Formati disponibili

11-1

Seismic Inversion and AVO

The derivative examples on the next two slides are from

where the means are defined as:

Predicting porosity using three attributes

As a detailed computation, note that:

Potrebbero piacerti anche