Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A supplement to:
Biography
im Murtha, a registered petroleum engineer, presents seminars and training courses and advises clients in building probabilistic models in risk analysis and decision making. He was elected to Distinguished Membership in SPE in 1999, received the 1998 SPE Award in Economics and Evaluation, and was 1996-97 SPE Distinguished Lecturer in Risk and Decision Analysis. Since 1992, more than 2,500 professionals have taken his classes. He has published Decisions
v v v
Involving Uncertainty - An @RISK Tutorial for the Petroleum Industry. In 25 years of academic experience, he chaired a math department, taught petroleum engineering, served as academic dean, and co-authored two texts in mathematics and statistics. Jim has a Ph.D. in mathematics from the University of Wisconsin, an MS in petroleum and natural gas engineering from Penn State and a BS in mathematics from Marietta College. x
Acknowledgements
hen I was a struggling assistant professor of mathematics, I yearned for more ideas, for we were expected to write technical papers and suggest wonderful projects to graduate students. Now I have no students and no one is counting my publications. But, the ideas have been coming. Indeed, I find myself, like anyone who teaches classes to professionals, constantly stumbling on notions worth exploring. The articles herein were generated during a few years and written mostly in about 6 months. A couple of related papers found their way into SPE meetings this year. I thank the hundreds of people who listened and challenged and suggested during classes.
I owe a lot to Susan Peterson, John Trahan and Red White, friends with whom I argue and bounce ideas around from time to time. Most of all, these articles benefited by the careful reading of one person,Wilton Adams, who has often assisted Susan and me in risk analysis classes. During the past year, he has been especially helpful in reviewing every word of the papers I wrote for SPE and for this publication.Among his talents are a well tuned ear and high standards for clarity. I wish to thank him for his generosity. He also plays a mean keyboard, sings a good song and is a collaborator in a certain periodic culinary activity. You should be so lucky. x
Table of Contents
A Guide To Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3 Central Limit Theorem Polls and Holes . . . . . . . . . . . 5 Estimating Pay Thickness From Seismic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Bayes Theorem Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . 12 Decision Trees vs. Monte Carlo Simulation . . . . . . . . 14 When Does Correlation Matter? . . . . . . . . . . . . . . . . . 19 Beware of Risked Reserves . . . . . . . . . . . . . . . . . . . . 23
Risk Analysis
A Guide To
R
Risk Analysis
ance to isolated departments within organizations. Managers were notoriously unwilling to embrace results that presented probability distributions for reserves and net present value (NPV). Consultants offering services and software vendors know too well these levels of resistance. Now, finally, there seems to be broader acceptance of probabilistic methods, although as I write, my SPE Technical Interest Group digest contains strong negativism from traditionalists about probabilistic prices. Nonetheless, consider these items: the three most recent and five out of the last seven recipients of the SPE Economics and Evaluation award have been strong proponents of risk analysis; whereas the index to the last edition of the Petroleum Engineers Handbook only had two references to risk, the forthcoming edition will feature an entire chapter on the topic; my first paper on Monte Carlo simulation was presented at the Eastern Regional Meeting in 1987 and summarily rejected by the editorial committees for not being of adequate general interest. (It was a case study dealing with the Clinton formation, but the methods were clearly generic and used popular material
isk and decision analysis was born in the middle of the 20th century, about 50 years after some of the necessary statistics became formalized. Pearson defined standard deviation and skewness in the late 1890s, and Galton introduced percentiles in 1885. The term Monte Carlo, as applied to uncertainty analysis, was mentioned by Metropolis and Ulam: the Journal of the American Statistical Association in 1940. D.B. Hertz published his classic Harvard Business Review article in 1964. A couple of years later, Paul Newendorp began teaching classes on petroleum exploration economics and risk analysis, out of which evolved the first edition of his text in 1975, the same year as A.W. McCray and 2 years before R.E. Megill wrote their books on the subject. Ten years later, there was commercial software available to do Monte Carlo simulation. During this 50-year period, decision analysis, featuring decision trees, also came of age. Raiffas classic book appeared in 1968. By 1985, there were several commercial applications of software on the market. These developments, in many ways, paralleled the development of petroleum engineering, with the basics appearing in the 1930s, mature texts like
balance notions). Ten years later, I published Monte Carlo Simulation, Its Status and Future in the Distinguished Author series; the most popular SPE Applied Technology Workshop focuses on probabilistic methods; SPE, SPEE and WPC are working on definitions that include probabilistic language; and
hat do exit surveys of voters in presidential elections have in common with porosities calculated from logs of several wells penetrating a geological structure? The answer is that in both cases, the data can be used to estimate an average value for a larger population. At the risk of reviving a bitter debate, suppose that a carefully selected group of 900 voters is surveyed as they leave their polling booths. If the voters surveyed a) are representative of the population of a whole and b) tell the truth, then the ratio r = (Number of voters for Candidate A in survey)/(number of voters surveyed) should be a good estimate of the ratio. R = (Number of voters for Candidate A in population)/(number of voters in population) Moreover, by doing some algebra, the statistician analyzing the survey data can provide a margin of error for how close r is to R. In other words, you are pretty confident that (r - margin of error) < R < (r + margin of error). The margin of error formula depends on three things: the level of confidence of the error ( I am 90% or 95 % or 99% confident of this); the number of voters who chose candidate A; and the number of voters surveyed. To end this little diversion, here is the approximate formula for the margin of error in a close race (where R is roughly 45% to 55%), and we are satisfied with a 95% confidence level (the most common confidence level used by professional pollsters). Margin of error = 1/sqrt (N) approx. Where N = sample size, the number of voters polled. Thus when N=900, the margin of error would
Table 1. Well database Structure 1S 2S 3S 4S Area 5010 2600 4300 h_ave 52 42 66 Por_ave 0.12 0.10 0.12 Sw_ave 0.30 0.34 0.27
Nevertheless, in many applications, as we shall see later, X is a continuous variable, with a shape of a normal, lognormal or a triangular distribution, for example. Thus, our samples mean (the percentage of the people in our sample who voted for A) may not be exactly the mean of the normal distribution, but we can be 95% confident that it is within two values of the mean standard error of the true mean.
ranging from 5 % to 20%.In this case,the two distributions would be closer together. Note, however, that the average porosities will always have a narrower distribution than the complete set of porosities. Perhaps the contrast is even easier to see with net pays. Imagine a play where each drainage area tends to be relatively uniform thick, which might be the case for a faulted system. Thus, the average h for a structure is essentially the same as any well thickness within the structure. Then the two distributions would be similar. By contrast, imagine a play where each structure has sharp relief, with wells in the interior having several times the net sand as wells near the pinch outs.Although the various structures could have a fairly wide distribution of average thicknesses, the full distribution of h for all wells could easily be several times as broad. The distribution for A could easily be lognormal if the drainage areas were natural. In a faulted system, however, where the drainage areas were defined by faults, the distribution need not be lognormal. The
Recovery factors
Recovery factors, which convert hydrocarbon in place to reserves or recoverable hydrocarbon, are also average values over the hydrocarbon pore volume. The recovery efficiency may vary over the structure, but when we multiply the OOIP by a number to get recoverable oil, the assumption is that this value is an average over the OOIP volume. As such, they too would often be normally distributed. Additional complications arise, however, because of uncertainty about the range of driving mechanisms: will there be a water drive? Will gas injection or water injection be effective? Some people model these aspects of uncertainty with discrete variables.
Summary
The Central Limit Theorem suggests that most of the factors in a volumetric formula for hydrocarbons in place will tend to have symmetric distributions and can be modeled as normal random variables.The main factor (area or volume) can be skewed left or right. Regardless of the shapes of these input distributions, the outputs of volumetric formulas, oil and gas in place and reserves, tend to be skewed right or approximately lognormal. Because the conventional wisdom is to use lognormal distributions for all of the inputs,the above argument may be controversial for the time being.The jury is still out. We could take a poll and see what users believe. Oh yes, then we could use the Central Limit Theorem to analyze the sample and predict the overall opinion. What goes around comes around. Stay tuned for other applications of the Central Limit Theorem. x
Risk Analysis
H
H
Seismic Data
ow do we estimate net pay thickness, and how much error do we introduce in the process? Usually, we specify two depths, to the top and bottom of the target interval and take their difference.The precision and accuracy of the thickness measurement, therefore, depends on the precision and accuracy of two individual measurements. The fact that we are subtracting two measurements allows us to invoke the Central Limit Theorem to address the questions of error.This theorem was stated in a previous article. A suitable version of it is given in the Appendix. First, let us remind ourselves of some of the issues about measurements in general. We say a measurement is accurate if it is close to the true value, reliable or precise if repeated measurements yield similar results and unbiased if the estimate is as likely to exceed the true value as it is to fall short. Sometimes we consider a two-dimensional analogy, bullet holes in a target. If the holes are in a tight cluster, then they are reliable and precise. If they are close to the bullseye, then they are accurate. If there are as many to the right of the bullseye as the left, and as many above the bullseye as below, then they are unbiased. With pay thickness, we are interested in the precision of measurement in a linear scale, which we take to mean the range of error. Our estimate for thickness will be precise if the interval of error is small. Consider the following situation.
distances from the top marker to the target facies and from the bottom of the facies to the platform namely 100m and 200m respectively, and the facies thickness is 600m the overall distance between the markers is 900m. In one particular offsetting anomaly, the depth measurement to the lower anhydrite is 5,000m, plus or minus 54m. The depth estimate to the platform is 5,900m, plus or minus 54m.What is the range of measurement for the thickness of the target facies? First, we should ask what causes the possible error in measurement. In the words of the geoscientists, If the records are good, the range for picking the reflection peak should not be much more than 30 milliseconds (in two-way time) and at 3,600 m/second that would be about 54m. This, of course, would be some fairly broad bracketing range, two or three standard deviations, so the likely error at any given point is much less. We would also hold that there is no correlation between an error in picking the lower anhydrite and the error in picking the platform. A further question to the geoscientists revealed that the true depth would just as likely be greater than as it would be less than the estimate.That is, the estimates should be unbiased.
Solution
People who use worst case scenario arguments would claim that the top of the reefal facies could be as low as 5,154m and the bottom as high as 5,646m, giving a minimum difference of 492m. Similarly, they say the maximum thickness would be (5,754m-5,046m) = 708m. In other words, they add and subtract 2*54 = 108 from the base
Uncorrelated 99% confidence 95% confidence 90% confidence Correlated (r=0.7) 99% confidence 95% confidence 90% confidence
Normal [535m,665m] [551m, 649m] [558m, 642m] Normal [565m; 635m] [573m, 627m] [578m, 623m]
Triangular [520m ,680m] [539m, 661m] [549m, 651m] Triangular [555m, 645m] [566m, 634m] [572m, 629m]
Uniform [503m, 697m] [515m, 685m] [526m, 674m] Uniform [533m, 665m] [548m, 652m] [557m,643m]
Table 1. Effect of Distribution Shape on Ranges (maximum, minimum) for Net Pay Thickness
case of 600m to get estimates of minimum and maximum thicknesses. But, is that what we really care about: the theoretical minimum and maximum? We may be more interested in a practical range that will cover a large percentage, say 90%, 95% or 99% of the cases. A probabilistic approach to the problem says that the two shale depths are distributions, with means of 5,000m and 5,900m respectively. The assertion that there is no bias in the measurement suggests symmetric distributions for both depths. Among the candidates for the shapes of the distributions are uniform, triangular and normal. One way to think about the problem is to ask whether the chance of a particular depth becomes smaller as one moves toward the extreme values in the range. If the answer is yes, then the normal or triangular distribution would be appropriate, since the remaining shape, the uniform distribution, represents a variable that is equally likely to fall into any portion of the full range.Traditionally, errors in measurement have been modeled with normal distributions. In fact, K.F. Gauss, who is often credited with describing the normal distribution but who was preceded by A. de Moivre touted the normal distribution as the correct way to describe errors in astronomical measurements. If the uncertainty by as much as 54m is truly a measurement error, then the normal distribution would be a good candidate. Accordingly, we represent the top and bottom depths as: Bottom = Normal(5,900m; 18m), where its mean is 5,900m and its standard deviation is 18m Top = Normal(5,000m; 18m) Thick = Bottom Top 300m (the facies thickness is 300m less than the difference between the markers) We use 18 for the standard deviation because it is
10
Conclusions
The Central Limit Theorem (see below) can be applied to measurement errors from seismic interpretations. When specifying a range of error for an estimate, we should be interested in a practical range, one that would guarantee the true value would lie in the given range 90%, 95% or 99% of the time. When we estimate thickness by subtracting one depth from another, the error range of the result is about 30% smaller than the error range of the depths. If the depth measurements are positively correlated, as is sometimes thought to be the case, this range of the thickness decreases by another 50%.
Risk Analysis
11
Bayes Theorem
Pitfalls
D
o you know how to revise the probability of success for a follow up well? Consider two prospects, A and B, each having a chance of success, P(A) and P(B). Sometimes the prospects are independent in the sense that the success of one has no bearing on the success of the other. This would surely be the case if the prospects were in different basins. Other times, however, say when they share a common source rock, the success of A would cause us to revise the chance of success of B. Classic probability theory provides us with the notation for the (conditional) probability of B given A, P(B|A), as well as the (joint) probability of both being successful, P(A&B). Our interest lies in the manner in which we revise our estimates. In particular, we will ask: how much better can we make the chance of B when A is successful? That is, how large can P(B|A) be relative to P(B); and if we revise the chance of B upward when A is a success, how much can or should we revise the chance of B downward when A is a failure? As we shall see, there are limits to these revisions, stemming from BayesTheorem. Bayes Theorem regulates the way two or more events depend on one another, using conditional probability, P(A|B), and joint probability, P(A&B). It addresses independence and partial dependence between pairs of events. The formal statement, shown here, has numerous applications in the oil and gas industry.
Bayes Theorem
1. P(B|A)= P(A|B)*P(B)/P(A) 2. P(A) = P(A&B1) + P(A&B2) + + P(A&Bn), where B1, B2, Bn are mutually exclusive and exhaustive We can rewrite part 2 if we use the fact that: P(A&B) = P(A|B)*P(B) 2. P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + +P(A|Bn)*P(Bn) Part 1 says that we can calculate the conditional probability in one direction, provided we know the conditional probability in the other direction along with the two unconditional probabilities. It can be derived from the definition of joint probability, which can be written backward and forward. P(B|A)P(A) = P(A&B) = P(B&A) = P(A|B)P(B) Part 2 says that if A can happen in conjunction with one and only one of the Bis, then we can calculate the probability of A by summing the various joint probabilities. There are numerous applications of Bayes Theorem. Aside from the two drilling prospects mentioned above, one well-known situation is the role BayesTheorem plays in estimating the value of information, usually done with decision trees. In that context, the revised probabilities acknowledge the additional information, which might fall into the mutually exclusive and exhaustive categories of good news and bad news (and sometimes no news). Further, we can define P(~A) to be the probability that prospect A fails (or in a more general context that event A does not occur). A rather obvious fact is that
Success B Drill B Fail B Success A 30% divest Drill A Success B Drill B Fail B Fail A 70% divest Divest 3%
60% 40%
0 0 0 0
0 0 1 0
97%
Suppose we believe the prospects are highly dependent on each other, because they have a common source and a common potential seal. Suppose P(A)=.2, P(B) = .1, and P(B|A)= .6 This is the type of revised estimate people tend to make when they believe A and B are highly correlated.The success of A proves the common uncertainties and makes B much more likely. But, consider the direct application of Bayes Theorem: P(A|B) = P(B|A)*P(A)/P(B) = (.6)*(.2)/.1 = 1.2 Since no event, conditional or otherwise, can have a probability exceeding 1.0, we have reached a contradiction, which we can blame on the assumptions.
13
D
D
ecision trees and Monte Carlo simulation are the two principal tools of risk analysis. Sometimes users apply one tool when the other would be more helpful. Sometimes it makes sense to invoke both tools. After a brief review of their objectives, methods and outputs, we illustrate both proper and improper applications of these well-tested procedures.
values are varied together).The traditional tornado chart also is used to show how each perturbed variable affects the tree value when all other values are held fixed. This chart takes its name from the shape it assumes when the influences of the perturbed variables are stacked as lines or bars, with the largest on top. Trees, along with their cousins influence diagrams, are particularly popular for framing problems and reaching consensus. For small to moderate size problems, the picture of the tree is an effective means of communication. One of the most important problem types solvable by trees is assessing the value of information. In this case, one possible choice is to buy additional information (seismic interpretation, well test, logs, pilot floods). Solving the tree with and without this added-information branch and taking the difference between the two expected values yields the value added by the information. If the information can be bought for less than its imputed value, it is a good deal.
big Recov
25.0% 200
big Pay
25.0% 60
med Recov
Table 1. Defining parameters for inputs (Area, Pay, Recovery) to Monte Carlo model
big Area
small Recov
25.0% 2700
med Pay
forecasts, price forecasts, capital forecasts and operating expense forecasts. A Monte Carlo (MC) simulation is the process of creating a few thousand realizations of the model by simultaneously sampling values from the input distributions.The results of such an MC simulation typically include three items: a distribution for each designated output, a sensitivity chart listing the key variables ranked by their correlation with a targeted output, and various graphs and statistical summaries featuring the outputs. Unlike decision trees, MC simulations do not explicitly recommend a course of action or make a decision. Sometimes, however, when there are competing alternatives, an overlay chart is used to display the corresponding cumulative distributions, in order to compare their levels of uncertainty and their various percentiles.
50.0% 40 25.0% 20
small Pay
Success
99.0% 0
med Area
small Area
obtained by following some path, multiplying each of them by the corresponding probability, which is obtained by taking the product of the branch probabilities, and summing these weighted values. In the tree, each parameter is discretized to only three representative values, signifying small, medium, and large. Thus, area can take on the values 1,300, 2,000 or 2,700 acres. Of course in reality, area would be a continuous variable, taking on any value in between these three numbers. In fact, most people would argue 1,300 is not an absolute minimum, and 2,700 is not an absolute maximum.They might be more like P10 and P90 or P5 and P95 estimates.We can think of a small area being is some range, say from 1,000 to 1,500 with 1,300 being a suitable representative of that class. Similarly, 2,700 might represent the class from 2,500 to 3,000 acres. Each of these subranges of the entire range carries its own probability of occurrence. For simplicity, we have made all the triples of values symmetric (for example, 1,300, 2,000 and 2,700 are equally spaced), but they could be anything. For instance, area could have the values 1,300, 2,000 and 3,500 for small, medium and large. Likewise, we have assigned equal weights to the small and large representatives, again for simplicity and ease of comparison. We have made another simplification: we assume all possible combinations are realizable. Sometimes, the large value of area would be paired with three
15
Meanv=11.96266
Converting to an appropriate Monte Carlo model requires finding a suitable distribution for 0 7 14 21 28 35 each of the three inputs area, pay 5% 5% and recovery. In light of the 20.88 4.85 discussion above, we took the Figure 2. Histogram from simulation small values to be P5 and the big values to be P95. We also selected triangular distributions is each case (which is how Percentage MMSTB we obtained our P0 and P100 values). The 5% 4.9 resulting distributions are shown in Table 1. From the tree analysis,we can calculate the expected 10% 6.0 value as well as finding the two extreme values (smallest 15% 6.8 and largest) and their respective probabilities: 20% 7.5 Expected value 12MMSTB 25% 8.2 Minimum value 2.6MMSTB, P(min) = 1/64 30% 8.9 Maximum value 32.4 MMSTB, P(max) = 1/64 35% 9.5 What we cannot determine from the tree analysis is 40% 10.1 how likely the reserves would exceed 5 MMSTB, how 45% 10.8 likely t h ey wo u l d b e between 5 MMSTB and 15 50% 11.4 MMSTB, how likely they would be less than 12 55% 11.9 MMSTB, and so on. 60% 12.6 The histogram from the Monte Carlo analysis is 65% 13.4 shown in Figure 2, and its corresponding percentiles 70% 14.2 in Table 2.Thus, while the mean value coincides with 75% 15.0 the mean from the tree analysis (in part because of the 80% 16.0 symmetry of the inputs and lack of correlation), we 85% 17.3 learn much more about the range of possibilities: 90% 19.0 90% of the values lie between 4.9 and 20.9 95% 21.2 MMSTB; Table 2. Percentiles for there is about a 56% chance of finding less than output (reserves, MMSTB) 12 MMSTB (it is close to the median); from simulation there is only about a 20% chance of exceeding 16 MMSTB; and
16
5 95 Mode 60 70 45 45 12
8 6
18 12
Table 3. Monte Carlo model to compare oil based and water based mud systems
analysis, and any user Decision Trees Monte Carlo simulation would be remiss in Objectives make decisions quantify uncertainty avoiding this. But the Inputs discrete scenario distributions overall uncertainty is Solution driven by EV run many cases made more explicit Outputs choice and EV distributions in the Monte Carlo Dependence limited treatment rank correlation simulation, where the very nature of the model begs the Table 4. Comparison between Decision trees and Monte Carlo user to specify the simulation range of uncertainty. On balance, if I had to pick one model, I would pick the simulation, in part because I have had far better results analyzing uncertainty and presenting the details to management when I use simulation.But whatever your preference in tools, for this problem the combination of both a tree and the simulation seems to be the most useful.
17
25.0% 90
25.0% 2700
med Pay
No Problem
50.0% 60 25.0% 40 25.0% 80 +
70.0% 2700
Oil Mud
+
small Pay
TRUE 0
big Pay
Success
99.0% 0
med Area
med Pay
tree#1
+
50.0% 2000
small Pay big Pay
Decision 2960
Stuck/Fix
+ +
tree#1
No Problem
med Pay small Area
50.0% 30 25.0% 15
95.0% 3150
25.0% 1300
small Pay
Water Mud
+
FALSE 0
Failure
1.0% % 0
5.0% 3570
Figure 4. Tree to compare oil based vs. water based mud systems
time for projects (up to a point, when project schedule software is more applicable), production forecasts, and all sorts of cashflows, including those with fiscal terms. Decision trees must come down to something that compares alternative choices under uncertainty.
Distribution for CostWater/Sample/E22
1.600 1.400 1.200 1.000 0.800 0.600 0.400 0.200 0.000 2 E22: Meanv=2937.594 E23: Meanv=3148.974
Summary
Table 2 summarizes some of the features of the two methods, illustrating their principle differences. Dont be surprised to find someone using a tree solution to a problem you elect to solve with simulation or vice versa. Do try to use common sense and do as much sensitivity analysis as you can regardless of your choice. x
Values in 10 -3
Values in 10 -3
Figure 6. New cost comparison when oil based mud results in faster drilling.
18
v v v
4 Values in Thousands
4 Values in Thousands
Risk Analysis
Matter?
where:
What is correlation?
Often, the input variables to our Monte Carlo models are not independent of one another. For example, consider a model that estimates reserves by taking the product of area (A), average net pay (h) and recovery (R).In some environments,the structures with larger area would tend to have thicker pay.This property should be acknowledged when samples are selected from the distributions for A and h. Moreover, a database of analogues would reveal a pattern among the pairs of values A and h. Think of a cross-plot with a general trend of increasing h when A increases, as shown in Figure 1. Such a relationship between two variables can best be described by a correlation coefficient.
Cov (1/n)
x
and
(xi
varx
x)*(yi
y)
varx (1/n)
(xi
x)2
Excel has a function, sumproduct ({x},{y}), that takes the sum of the products of the corresponding terms of two sequences {x} and {y}. Thus, covariance is a sumproduct. The value of r lies between 1 (perfect negative correlation) and +1 (perfect positive correlation). Although there are tests for significance of the correlation coefficient, one of which we mention below, statistical significance is not the point of this discussion. Instead, we focus on the practical side, asking what difference it makes to the bottom line of a Monte Carlo model (e.g., estimating reserves or estimating cost of drilling a well) whether we include correlation. As we shall see, a correlation coefficient of 0.5 can make enough of a difference in some models to worry about it. Before we illustrate the concept, we need to point out that there is an alternate definition.
(xi
x) * (yi
y)
r
19
cov(x, y) x*
could be an order of magnitude larger than the other terms. Charles Spearman introduced an alternative formulation, which he labeled distribution-free and called it the rank-order correlation coefficient, in contrast to the Pearson coefficient defined above.To
Risk Analysis
Table 1. Area, porosity, and gas saturation (and their ranks) for 13 reefal structures
Name S1 M1 A1 P2 K3 D U1 P1 K2 K4 K1 U2 Z1 Area, km^2 10 24 37 6 28.8 6 34 13 60 11 58 48 108 Porosity 0.12 0.12 0.13 0.14 0.14 0.15 0.15 0.16 0.16 0.17 0.18 0.22 0.22 Sg 0.77 0.85 0.87 0.81 0.91 0.61 0.82 0.81 0.78 0.83 0.80 0.75 0.89 Rank_Area 3 6 9 1 7 1 8 5 12 4 11 10 13 Rank_por Rank_Sg 1 1 3 4 5 6 7 8 9 10 11 12 12 3 10 11 6 13 1 8 6 4 9 5 2 12
Table 2. Correlation and Rank Correlation coefficients for the reefal structure data.
Ordinary r Area Porosity Sg Area 1 0.67 0.36 Porosity 1 -0.02 Sg Rank r Area Porosity Sg Area 1 0.57 0.28 Porosity 1 -0.12 Sg
Each student is assigned a correlation coefficient, from 0 to 0.7, and uses it to correlate A with h and h with r (the same coefficient for both pairs).The students run the simulations using both these correlations and call out their results, which are tallied in Table Y:
r 0 .1 .2 .3 .4 .5 .6 .7 mean 17.5 17.9 18.1 18.3 18.6 18.8 19.0 19.3 StDev 7.7 8.6 9.1 9.7 10.5 10.9 11.3 11.9
Now of course, even if both correlations were warranted, it would be unlikely they would be identical. Nevertheless, we must note the standard deviation can increase by as much as 50% and the mean by about 10% under substantial correlation. The message is clear: pay attention to correlations in volumetric product models if you care about the danger of easily underestimating the mean value by 5% or more, or if you care about understating the inherent uncertainty in the prospect by 30% or 40%. Almost as important: often correlation makes little difference in the results.We just need to check it out. While it may not be obvious, nearly all the correlations in reserves models cause the dispersion
20
Pay
21
Summary
Things to remember: correlation is easy to calculate in Excel (the function Correl); there are two types of correlation coefficients: ordinary (Pearson) and rank-order Figure 5. Negative correlation: investment and NPV (Spearman). They tend to differ when one or both the variables is highly skewed; correlation might matter; it depends on the type of model and the strength of the correlation; correlation always will affect the standard deviation, often but not always increase it; correlation will affect the mean value of a product; and correlation is useful to describe sensitivity of output to inputs. x
Risk Analysis
22
Beware of
R
R
Risked Reserves
isked reserves is a phrase we hear a lot these days. It can have at least three meanings: 1. risked reserves might be the product of the probability of success, P(S), and the mean value of reserves in case of a discovery. In this case, risked reserves is a single value; 2. risked reserves might be the probability distribution obtained by scaling down all the values by a factor of P(S); or 3. risked reserves might be a distribution with a spike at 0 having probability P(S) and a reduced probability distribution of the success case. Take as an example Exploration Prospect A. It has a 30% chance of success. If successful, then its reserves can be characterized as in Figure 1, a lognormal distribution with a mean of 200,000 STB (stock tank barrels) and a standard deviation of 40,000 STB. Then: definition 1 yields the single number 0.3*200,000 = 60,000 STB; definition 2 yields a lognormal definition with a mean of 60,000 and a standard deviation of 12,000 (See Figure 2); and definition 3 is the hybrid distribution shown in Figure 3. By contrast, suppose another prospect, B, has a 15% chance of success and a reserves distribution with a mean of 400,000 STB and a standard deviation of 200,000 STB.Then under definition 1,B would yield the same risk reserves as A, 0.15*400,000 = 60,000 STB. However, consider Figure 2, which shows how B would be scaled compared with A,with the same mean but larger standard deviation.And Figure 4 shows how the original distributions compare. Assigning these two prospects the same number for the purpose of any sort of ranking could be misleading. Prospect B is much riskier, both in the sense that it has only half the probability of success than does A, and also because even if it is a success, the range of possible outcomes is much broader. In fact, the P10, where P=Percentile, of Prospect B equals the P50 of Prospect A.Thus, if you drilled several Prospect A types,for fully half of your successes (on average),the reserves would be less than the 10th percentile of one prospect B. The only thing equal about Prospects A and B is that, in the long run, several prospects similar to Prospect A would yield the same average reserves as several other prospects like B. Even this is deceptive, because the range of possible outcomes for several prospects like A is much different from the range of
Risk Analysis
23
Running economics
What kind of economics these two prospects would generate is another story. Prospects like A would provide smaller discoveries more consistent in size. They would require different development plans and have different economies of scale than would prospects like B. So, does that mean we should run economics? Well, yes, of course, but the question is with what values of reserves do we run economics? Certainly not with risked reserves according to definition 1,which is not reality at all.We would never have a discovery with 60,000 STB. Our discoveries for A would range from about 120,000 STB to 310,000 STB and for B from about 180,000 STB to 780,000 STB (we are using the P5 and P95 values of the distributions). So, surely, we must run economics for very different cases.We could take a few typical discovery sizes for A (or B), figure a production schedule, assign some capital for wells and facilities, sprinkle in some operating expenses and calculate net present value (NPV) at 10% and IRR (internal rate of return). My preference is not to run a few typical economics cases and then average them. Even if you have the percentiles correct for reserves, why should you think those carry over to the same percentiles for NPV or IRR? Rather, I prefer to run probabilistic economics. That is, build a cashflow model containing the reserves component as well as appropriate development plans. On each iteration, the field size and perhaps the sampled area might determine a suitable development plan, which would generate capital (facilities and drilling schedule), operating expense and production schedule the ingredients, along with prices, for cashflow. The outputs would include distributions for NPV and IRR. Comparing the outputs for A and B would allow us to answer questions like: what is the chance of making money with A or B? What is the probability that NPV>0? and what is the chance of exceeding our hurdle rate for IRR? The answers to these questions together with the comparison of the reserves distributions would give us much more information for decision-making or
Risk Analysis
ranking prospects. Moreover, the process would indicate the drivers of NPV and of reserves, leading to questions of management of risks.
Summary
The phrase risked reserves is ambiguous. Clarifying its meaning will help avoid miscommunication. Especially when comparing two prospects, one must recognize the range of possibilities inherent in any multiple-prospect program. Development plans must be designed for real cases not for field sizes scaled down by chance of success. Full-scale probabilistic economics requires the various components of the model be connected properly to avoid creating inappropriate realizations.The benefits of probabilistic cashflow models, however, are significant, allowing us to make informed decisions about the likelihood of attaining specific goals. x
24