Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
www.elsevier.com/locate/petrol
Abstract
The performance of a nave Bayes classifier is compared with a well-established statistical classification approach, linear
discriminant analysis, by considering core and log data from marineeolian sediments. The results indicate that both methods
perform adequately, and the Gaussian nave Bayes classifier provides estimates as good as those based on the linear discriminant
analysis for the given data set. Quadratic discriminant analysis, a more conventional Bayesian analysis, and kernel-based density
estimation methods perform unexpectedly poor, probably because of overfitting. We conclude that the normal distribution is
appropriate to fit the distribution of log readings in the present data, and the simplifications of nave Bayes provide a robust, simple
approach for facies identification.
2006 Elsevier B.V. All rights reserved.
Keywords: Facies; Well logs; Discriminant analysis; Nave Bayes classifier
1. Introduction
Facies identification is important in oil exploration
and development because facies often control the variation of petrophysical properties. Identification of
facies is generally based on core samples and outcrop
characteristics. Because available core and outcrop are
usually limited, establishing relationships between
facies and more readily available data sources, in particular well logs, is highly desirable.
Some efforts have been made to use statistical methods such as discriminant analysis (Sakurai and Melvin,
1988; Avseth et al., 2001; Tang et al., 2004) to identify
Corresponding author.
E-mail address: liyumei@uwyo.edu (Y. Li).
0920-4105/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.petrol.2006.06.001
facies from well logs. The past decade has also seen
applications of Artificial Neural Network (ANN) (Derek
et al., 1990; Wong et al., 1995; Siripitayananon et al.,
2001; Bhatt and Helle, 2002) and fuzzy logic (Cuddy,
2000; Saggaf and Nebrija, 2003) in facies classification.
Initial successes of ANN for facies prediction have
inspired enthusiasm, leading to claims that it has the
potency to dominate or take over other analytical tools
used in the exploration and production industry
(Iloghalu, 2003). However, the reliable use of neural
networks requires experience for adjusting parameters
and a large amount of training time, especially for large
data sets (Wong et al., 1995; Avseth et al., 2001).
All methods use a training data set consisting of
observed cases with full information about both predictors (in our application, well-log readings) and groups
(in our case, facies). Based on the training data set, one
150
geneous formations like fluvial deposits, the prior distribution may change from one well to another (Coudert
et al., 1994). The heterogeneity of deposits makes the
choice of prior a challenge. Second, probabilities required by a fully Bayesian method are hard to obtain for
more than one predictor. The nave Bayes classifier
assumes independence among predictors, but well logs
are often dependent. It is not clear whether violation of
the independence assumption will affect the facies classification. Third, it is still unknown what distributions are
appropriate to fit different log readings and how different
distributions affect the facies prediction. Kapur et al.
(2000) discretized values of predictor variables and used
a counting rule to calculate probabilities. They emphasized the importance of picking appropriate bin sizes: If
too few bins are selected, the FOP (facies occurrence
probability) lacks the ability to discriminate between
adjacent log readings. If there are too many bins, the FOP
will not be estimated precisely.
This study evaluates the performance of discriminant
analysis and a normal-based nave Bayes classifier in
facies identification from well logs by applying the logfacies correlation derived from the training set in three
hold out wells.
2. Methodology
2.1. Nave Bayes classifier
Bayes theorem aims to determine the conditional
probability of parameter values given the data by combining expectations based on previous experience (prior
probabilities) with information from available data. In
this study, Bayes theorem is used to calculate the probability of the occurrence of a certain facies given the
well-log readings and to assign the facies of the highest
posterior probability to that observation depth.
The application of Bayes theorem in facies classification can be written as follows:
P fj jX x P fj
PX xj fj
PX x
i1
m
P
P f
n
Y
j1
i1
PXi xi j fj
151
2 xuf
1
2r
Pxj f q e f
2kr2f
1
Fig. 1. Location of seven wells in Teapot dome, Powder River Basin, Wyoming.
152
Fig. 2. The matrix plot of GR, NPHI, RHOB and LOGRT shows moderately strong pairwise correlations among NPHI, RHOB and LOGRT.
variance is slightly larger than the maximum likelihood estimator of variance, and in this situation either
the sample variance or the MLE may be used with little
Fig. 3. Boxplots of GR, NPHI, RHOB and LOGRT grouped by facies show that overlap of well-log responses is common among the five facies, and
that the most discriminating individual well logs are RHOB and LOGRT.
Description
Frequency
Sand dune
160
Interdune
Sand sheet
Shallow
marine
Sabkha
200
110
153
38
85
Fig. 4. The nave Bayes posterior probabilities, LDA-predicted facies, and observed facies columns of well 55 (f 1 = SD, f 2 = ID, f 3 = SM, f4 = SB,
f5 = SS, LDA = linear discriminant analysis). For clarity, probabilities of classification are split into two figures, Fig. 4 for nave Bayes (BAY) and Fig.
5 for linear discriminant analysis (LDA). Probabilities give more detailed information than class identification (the highest probability class).
Probability curves indicate uncertainty in identification. See also Fig. 5.
154
Fig. 5. The posterior probability, observed facies and BAY-predicted facies columns of well 55 (f 1 = SD, f 2 = ID, f 3 = SM, f4 = SB, f5 = SS,
BAY = nave Bayes classifier). See also Fig. 4. The agreement between nave Bayes and LDA is close. Both methods locate the economically
important stratum f1 but identify a narrower band of f1 than is actually present. F3 is erratically identified by LDA, with similar but slightly superior
performance by nave Bayes. The dominant facies f 2 is identified by both nave Bayes and LDA, although other facies are sometimes labeled as f 2 by
both nave Bayes and LDA.
2.4. Cross-validation
Cross-validation evaluates classification performance
by using two independent samples of data, one to learn the
rule and another to test it. In this study, seven wells (Fig. 1)
were selected on the basis of stratigraphic and geographic
coverage, availability of appropriate well logs, and
availability of core analysis data. Due to limited data and
limited facies types in some wells, instead of leaving out a
randomly selected well as the test set, the hold out well was
Table 2
Classification results of linear discriminant analysis in well 55
Observed
SD
ID
SM
Predicted
Percent
SD
ID
SM
SB
SS
correct
8
0
0
9
66
4
0
0
8
0
0
2
0
2
2
47.1%
97.1%
50%
155
Table 3
Classification results of nave Bayes classifier in well 55
Observed
SD
ID
SM
Predicted
Percent
SD
ID
SM
SB
SS
correct
5
0
0
12
60
4
0
0
8
0
0
4
0
8
0
29.4%
88.2%
50.0%
156
157