Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervised Multiattribute Classification

Similar presentations


Presentation on theme: "Supervised Multiattribute Classification"— Presentation transcript:

1 Supervised Multiattribute Classification
3D Seismic Attributes for Prospect Identification and Reservoir Characterization Kurt J. Marfurt (The University of Oklahoma) Supervised Multiattribute Classification

2 1. Introduction Course Outline Introduction
Complex Trace, Horizon, and Formation Attributes Multiattribute Display Spectral Decomposition Geometric Attributes Attribute Expression of Geology Tectonic Deformation Clastic Depositional Environments Carbonate Deposition Environments Shallow Stratigraphy and Drilling Hazards Igneous and Intrusive Reservoirs and Seals Impact of Acquisition and Processing on Attributes Attribute Prediction of Fractures and Stress Data Conditioning Inversion for Acoustic and Elastic Impedance Image Enhancement and Object Extraction Interactive Multiattribute Analysis Statistical Multiattribute Analysis Unsupervised Multiattribute Classification Supervised Multiattribute Classification Attributes and Hydraulic Fracturing of Shale Reservoirs Attribute Expression of the Mississippi Lime The outline is designed to for a class that varies between one and five days. For the shorter time frames, I will emphasize ‘Attributes and the seismic interpreter’ if the audience consists primarily of interpreters. Alternatively, I will emphasize ‘Attributes and the seismic processor’ if the audience consists primarily of folks involved in geophysical acquisition, processing, and migration. You will find both short (1 day), normal (2 day), and long (5 day) versions of the notes on my ftp site

3 Multiattribute Analysis Tools
Interpreter-Driven Attribute Analysis Machine Learning Attribute Analysis Cross-correlation on Maps Cross-plotting and Geobodies Connected Component Labeling Component Analysis Image Grand Tour Interactive Analysis K-means Mixture Models Kohonen Self-Organizing Maps Generative Topographical Maps Unsupervised Learning Analysis of Variance (ANOVA, MANOVA) Multilinear Regression Kriging with external drift Collocated co-kriging Statistical Analysis Statistical Pattern Recognition Support Vector Machine Projection Pursuit Artificial Neural Networks Supervised Learning We have a broad range of tools for integrating the information provided by seismic attributes. These can be subdivided based upon the mechanism for decision making (computer or interpreter). Interactive decision making can be divided into visual and numerical techniques while machine learning techniques can be divided into supervised and unsupervised techniques.

4 Artificial Neural Nets (ANN)
As the name implies, artificial neural networks draw an analogy with those of biological neurons. Neurons

5 Artificial Neural Nets (ANN)
Objective: From continuous input measurements (e.g. seismic attributes): Predict a continuous output (e.g. porosity) Predict discrete lithologies (e.g. wet sand, gas sand, limestone, shale,…) The objective of an artificial neural network can be to define discrete or continuous output.

6 Artificial Neural Nets (ANN)
Attributes Observations +1 Looks like a duck? Quack like a duck? Walk like a duck? yes An example of ‘inductive reasoning’ that was perhaps originally due to a poet of the mid 19th century, but taken up to recognize communists by politicians in the 1950s USA. This quote is still commonly used by American politicians. no

7 Linear Neurons used in Predictive Deconvolution
Output Perceptron,r a1 a2 a3 aN w3 w2 w1 wN a0=1 (Bias) w0 N-long operator, w Prediction 1 2 3 Time (s) Prediction distance An image provided by Tury Taner of Rock Solid Images that shows the similarity between predictive deconvolution, and the prediction of artificial neural networks. We train our prediction operator on the initial part of the data where multiples may be more easily identified and then apply it to later parts of the data. (Courtesy Rock Solid Images)

8 The Perceptron { Output, r = Unknown weights, wi Input attributes, ai
y 0.0 0.5 1.0 +1.0 +1.5 +0.5 -0.5 -1.0 -1.5 yes no w2 wn w1 w0 a2 an a1 . . . Input attributes, ai 1 if y > +0.5 0 if y < -0.5 { Output, r = To obtain a discrete result (e.g. do we have a gas sand or do we have a water sand?) we need to convert continuous attribute data into discrete output measurements. This conversion is done using the perceptron, which asymptotically approximates 0 (no) or 1 (yes) at its extremes. Note that the perceptron in the above diagram is differentiable. a0=1 Unknown weights, wi

9 Inverter 1 y 1 a1 yes r no y w0= 0.5 w1= -1 input a1 output r
0.0 0.5 1.0 +1.0 +1.5 +0.5 -0.5 -1.0 -1.5 yes no y=-1*0+0.5*1= +0.5 -1*1+0.5*1= -0.5 input a1 output r 1 y a1 w1= -1 1 w0= 0.5 Those of us over 60 had the pleasure in university electronics labs of building computers using vacuum tube technology. There were four components of Boolean algebra, the inverter, the ‘or’, the ‘and’ and the ‘exclusive or’. This first example shows the truth table and weights, w, that perform the ‘inverter’ task.

10 Boolean OR 1 a2 y a1 1 r y yes no w2=1 w1=1 w0= -0.5 input a1 input a2
0.0 0.5 1.0 +1.0 +1.5 +0.5 -0.5 -1.0 -1.5 yes no y=1*1+1*1-0.5*1= +1.5 y=1*0+1*1+0.5*1= +0.5 y=1*0+1*1-0.5*1= +0.5 y=1*0+1*0-0.5*1= -0.5 input a1 input a2 output r 1 a2 y a1 w2=1 w1=1 w0= -0.5 1 This second example shows the truth table and weights, w, that perform the ‘or’ task.

11 Boolean AND 1 1 a2 y a1 1 r y yes no w2=1 w1=1 w0= -1.5 input x1
0.0 0.5 1.0 +1.0 +1.5 +0.5 -0.5 -1.0 -1.5 yes no y=1*1+1*1-1.5*1= +0.5 y=1*0+1*1-1.5*1= -0.5 y=1*0+1*0-1.5*1= -1.5 y=1*0+1*1-1.5*1= -0.5 input x1 input x2 output r 1 input a1 input a2 output r 1 a2 y a1 w2=1 w1=1 w0= -1.5 1 This third example shows the truth table and weights, w, that perform the ‘and’ task.

12 Boolean XOR 1 Doesn’t work! y a1 a2 input a1 input a2 output r
1 y Doesn’t work! Unfortunately, there are no weights, w, that perform the ‘exclusive or’ task for our single layer perceptron. a1 a2

13 Linear Separability OK! OK! Can’t separate! a2 a2 a2 1 OR XOR AND 1 1
OR 1 1 XOR AND a1 a1 Given the input data, the decision boundary indicated by the yellow line separates the two desired classes for both the ‘or’ and the ‘and’ operations. However, we need at least two decision boundaries to define the exclusive or operation. OK! OK! Can’t separate!

14 Boolean XOR the hidden layer! 1 y 1 Boolean AND Boolean OR h2 h1 1 1
0.0 0.5 1.0 +1.0 +1.5 +0.5 -0.5 -1.0 -1.5 yes no y=1*h1-1*h2+-0.5*1=0.5 y=1*h1-1*h2+-0.5*1=0.5 input a1 input a2 output r 1 y=1*h1-1*h2+-0.5*1=-0.5 y=1*h1-1*h2+-0.5*1=-0.5 y w0= -0.5 w1=1 1 w1=-1 the hidden layer! Boolean AND Boolean OR a2 h2 w2=1 w1=1 w0= -1.5 1 h1 a1 w2=1 w1=1 w0= -0.5 1 We achieve our results by introducing an intermediate level of perceptrons which we denote as a hidden layer. Given the this extra layer and the weights above, we are able to reproduce our truth table.

15 A typical neural network
Given the previous information, we now understand the importance of a hidden layer seen in almost any article on neural networks. input layer! hidden layer! output layer! (Ross, 2002)

16 Decision workflow Choose the classes you wish to discriminate
Choose attributes that differentiate these classes Train using calibrated or “truth” data Validate with “truth” data not used in the training step Apply to the target data Interpret the results The decision workflow as described by van der Baan and Jutten (2000). (van der Baan and Jutten, 2000)

17 Alternative perceptrons
Discrete output classes e.g. lithology Continuous output classes (e.g. porosity) Intermediate results (in hidden layer) r(w) fs[r(w)] fG[r(w)] fh[r(w)] differentiable We have several choices of perceptrons. For attribute analysis we most commonly use the differentiable sigmoid function. differentiable (van der Baan and Jutten, 2000)

18 2-attribute example with a single decision boundary
Attributes Weights Perceptron Output 0 or 1 r(w) a1 a2 w0 w2 w1 y (a) Single perceptron layer and (b) associated decision boundary. Adapted from Romeo (1994). Decision boundary (van der Baan and Jutten, 2000)

19 Example of two attributes with a single decision boundary
Brad Brad says: “We could have more than one decision boundary!” a1 a2 Decision boundary a2=-w1/w2*a1+w0/w1 Class 1 (a) Single perceptron layer and (b) associated decision boundary. Adapted from Romeo (1994). Class 2 (van der Baan and Jutten, 2000)

20 Hidden Layer Example of two attributes with three decision boundaries
Weights Perceptron Weights Output Hidden Layer 0 or 1 (a) Single hidden perceptron layer and (b) associated decision boundary. Adapted from Romeo (1994). Decision boundaries Explicit representation (van der Baan and Jutten, 2000)

21 Hidden Layer Example of two attributes with three decision boundaries
Weights Perceptron Output Hidden Layer 0 or 1 (a) Single hidden perceptron layer and (b) associated decision boundary. (After van der Baan and Jutten, 2000; in turn adapted from Romeo, 1994). Decision boundaries This is a more compact representation of the previous image (van der Baan and Jutten, 2000)

22 Example of two attributes with three decision boundaries
Class 2 boundary 1 Class 2 Class 2 a2 Class 1 In this example, we have two attributes, a1, and a2 . In order to define the two classes as shown above, we need at least three neurons in our hidden layer. (After van der Baan and Jutten, 2000). Class 2 boundary 2 Class 2 Class 2 boundary 3 a1 (van der Baan and Jutten, 2000)

23 The danger of too many boundaries (hidden neurons)
Brad Brad says: “You can overfit your data by putting in too many decision boundaries, thereby overdividing your attribute space!” If we have too many boundaries (defined by the number of hidden neurons), we run the risk of subdividing each measurement into its own domain. (courtesy Brad Wallet, OU)

24 The danger of too many degrees of freedom (polynomial fitting)
7th order polynomial a1 a2 1st order polynomial 2nd order polynomial Prediction error Prediction error Prediction error The danger of attempting to fit training data (in this case the pink circles) with a polynomial with too many degrees of freedom. To evaluate the validity of our model, we need to use measurements not used in our training, in this case the data sample represented by the green circle. The 2nd order polynomial better fits the training data than the 1st order polynomial. It also reduces the prediction error on the validation data set. In contrast, while the 7th order polynomial exactly fits the eight training data points, it usefulness in prediction is poor, as indicated by the large prediction error associated with the green validation data point.

25 The danger of too many attributes
3D hyperplane (a plane) 4D hyperplane 2D hyperplane (a line) The danger of attempting to fit training data (in this case the pink circles) with a too many attributes. In this case, we are looking for simple linear relationships which can be characterized by hyperplanes. To evaluate the validity of our model, we need to use measurements not used in our training, in this case the data sample represented by the green circle. The 3D hyperplane better fits the training data than the 2D hyperplane based on two attributes. It also reduces the prediction error on the validation data set. In contrast, while the 4D hyperplane exactly fits the four training data points, it usefulness in prediction is poor, as indicated by the large prediction error associated with the green validation data point. a3 Training data Validation data

26 A feed-forward network One of several ways of estimating the weights, w (easily understood by Geophysicists). Use a Taylor Series expansion: Initial guess based on random weights, w. a0=input attributes z0=output measurements Let’s define Prediction error given current weights, w. Van der Baan and Jutten (2000) present a neural-network solution technique that is easily understood by geophysicists who have worked with common tomography (or gravity and magnetic) inversion workflows. Given an initial guess, the equations are linearized about the solution at the current iteration. These perturbations are then estimated using least-squares and the normal equations. Sensitivity of output to the weights (Jacobian matrix) (note that f must be differentiable!) Equation predicting the output from the input (van der Baan and Jutten, 2000)

27 Tomography Known output (measurements) Unknown model parameters
Known previous model resultl Differentiable model system In reflection or refraction tomography, we know the modeling system (typically ray-tracing through a mesh or suite of layers) and the measured traveltimes, y. We do not know the model parameters (e.g. the slowness s=1/v). We form a error (also called residual) vector, Δyj=yj-f(xj), and solve for Δxj .

28 Neural networks Known input (attributes) Unknown weights
Known output (“truth” data) Differentiable model system In artificial neural networks, we know the input attribute training vector, x0, and the output training measurements, y0. We do not know the modeling system which is typically based on a suite of weights, w. We form a error (also called residual) vector, Δyj=y0-f(x0,wj), and solve for Δwj .

29 Computing the weights, w
f[r(w)] r(w) There exists a fundamental difference with the reflection or refraction tomography problem. In a neural networks application, both the output y and the input x are known, since y represents the desired output for training vector x. Hence, the problem is not the construction of a model x explaining the observations, but the construction of the approximation function f(x,w), where the weighting vector, w, is unknown. Note, in order to compute the sensitivity matrix, A(w), the function f must be differentiable. (After van der Baan and Jutten, 2000). Differentiable preceptron! (van der Baan and Jutten, 2000)

30 Iterative least-squares solution using the normal equations
Levenberg-Marquardt (or Tikhonov) Regularization The most common way of solving the previous equations through the use of the normal equations. At each iteration, we update the value of the previous jth iteration to obtain the value wj+1=wj+Δwj . To avoid inversion of a potentially singular matrix (whereby the results are independent of a postulated weight, wk), we add a small regularization factor to the diagonal, β.

31 A typical neural network
We end this short overview with Ross’ (2002) image of typical neural network workflow. It is our hope that at this point you understand (or at least are comfortable with!) how the method works, and are not upset that someone has hidden a layer! input layer! hidden layer! output layer! (Ross, 2002)

32 Example 1. Mapping a stratigraphic depositional system
(Ruffo et al., 2009)

33 Seismic line perpendicular to channel system
(Ruffo et al., 2009)

34 Seismic facies classification using a neural network classifier
The workflow comprises the following steps: a) Seed picking: the interpreter scans the seismic volume looking for examples of each lithofacies, picking locations and labeling them accordingly to a classification scheme. Each saved pick contains spatial location and expected lithofacies class values. b) Training: a statistical neural network classifier is trained to predict lithofacies from the seismic data in a way that is consistent with the examples it has been presented with (supervised classification): a seismic derived set of values is extracted at each seed location and it is used to predict the seed lithofacies class in order to derive a relationship between the seismic values and the most likely lithofacies. c) Classification: the trained classifier is used to compute a lithofacies volume, determining the most likely facies code at every grid node position within the seismic cube. (Ruffo et al., 2009)

35 Use 4-way averaged vertical 2D GLCM attributes parallel to dip at a suite of azimuths
We improved the GLCM attributes computation to better exploit 3D seismic data features and geological meaning: a) as seismic data is 3D whereas GLCM requires a 2D plane to operate on, the analysis has been repeated for different azimuths and averaged (Fig. 5(A)) to get a response that samples a 3D neighborhood of the analysis location; b) as GLCM attributes depend on the choice of an oriented lag where a constant orientation makes little geological sense (unless everything is flat), the seismic data have been ‘‘virtually’’ flattened prior to attribute evaluation using a procedure called dip-steering. With respect to such flattened domain, the two available alternative orientations are ‘‘parallel to layers’’ versus ‘‘orthogonal to layers’’ (Fig. 5(B)). Dip-steering is very important in making an image processing technique into a sound geophysical method, robust enough to capture facies variability in structurally complex contexts. (Ruffo et al., 2009)

36 Seeding the facies classification algorithm
(Ruffo et al., 2009)

37 Lithofacies classification
(Ruffo et al., 2009)

38 Lithofacies classification scheme
(Ruffo et al., 2009)

39 Lithofacies classification
(Ruffo et al., 2009)

40 Seismic facies overlain on seismic data
(Ruffo et al., 2009)

41 Horizon slice (Ruffo et al., 2009)

42 Example 2. Clustering of - and - volumes
A Lambda-Rho section (with polygons selected) and corresponding clusters on 3-D crossplots. (a) Polygons selected on a time slice from the Lambda-Rho volume. The red-bordered polygon indicates the area being analyzed. (b) Points within the red, yellow and purple polygons show up as different clusters. The gas anomaly (blue on the time slice and enclosed by the purple polygon) shows up with negative values for the fluid stack. (c) 3-D crossplot seen from the fluid stack side (d) 3-D crossplot seen from the fluid stack side and including only points from the purple polygon. (After Chopra et al., 2003) (Chopra and Pruden, 2003)

43 Neural network estimation
Neural network inverted gamma ray response. Note the distinct separation of sand from silt and shale. (b) Neural network computed porosity from the inverted density response. The density values have been masked out for gamma ray values representative of silt or shale, giving a relative porosity indicator for the sands. (After Chopra and Pruden, 2003) Gamma ray response Porosity (With mask generated from gamma ray response) (Chopra and Pruden, 2003)

44 San Luis Pass weather prediction exercise
August 24, 2005 – sunny August 25, storms August 26, sunny August 27, sunny August 28, sunny August 29, storms Exercise: flip 6 coins: Heads=sunny Tails=stormy Read out your correlation rate: 0/6 = /6 = /6 = /6 = /6 = /6=+0.67 6/6 = 1.00 heads tails

45 San Luis Pass weather prediction exercise
Which coins best predict the weather in San Luis Pass? Should Marfurt go fishing?

46 Potential risks when using seismic
attributes as predictors of reservoir properties When the sample size is small, the uncertainty about the value of the true correlation can be large. given 10 wells with a correlation of r=0.8, the 95% confidence level is [0.34,0.95] given only 5 wells with a correlation of r=0.8, the 95% confidence level is [-0.28,0.99] ! (Kalkomey, 1997)

47 Spurious Correlations
A spurious correlation is a sample correlation that is large in absolute value purely by chance. (Kalkomey, 1997)

48 The more attributes, the more spurious correlations!
(Kalkomey, 1997)

49 Risk = expected loss due to our uncertainty about the truth
Risk = expected loss due to our uncertainty about the truth * cost of making a bad decision Cost of a Type I Error (using a seismic attribute to predict a reservoir property which is actually uncorrelated) is: • Inaccurate prediction biased by the attribute. • Inflated confidence in the inaccurate prediction — apparent prediction errors are small. Cost of a Type II Error (rejecting a seismic attribute for use in predicting a reservoir property when in fact they are truly correlated) is: • Less accurate prediction than if we’d used the seismic attribute. • Larger prediction errors than if we’d used the attribute. (Kalkomey, 1997)

50 Validation of Attribute Anomalies
1. Basic QC is the well tie good? are the interpreted horizons consistent and accurate? are the correlations statistically meaningful? is there a physical or well-documented reason for an attribute to correlate with the reservoir property to be predicted? 2. Validation does the prediction correlate to control not used in training? does the prediction make geologic sense? does the prediction fit production data? can you validate the correlation through forward modeling? (Hart, 2002)

51 Validation of Attribute Anomalies
(Porosity prediction in lower Brushy Canyon) Right map has higher statistical significance and is geologically more realistic From probabilistic neural network. From multivariate linear regression (Hart, 2002)

52 Validation of Attribute Anomalies
(Through modeling the Smackover formation) Seismic Instantaneous frequency Envelope Field data Model data Seismic Attribute Correlations: “Trust, but verify!” (Hart, 2002)

53 Validation of Attribute Anomalies
(Through engineering and geology) Neural Net. R=0.96 Multivariate Linear Regression. R=0.89 Dip map. Engineering and geologic analyses indicate fractures, associated with high dip areas, play an important role in enhancing gas production from these tight carbonates. Stars indicate locations of wells drilled in 1999 (Hart, 2002)

54 Neural Networks In Summary
Neural networks find linear and nonlinear trends in the seismic data that can help correlate well control to maps and formations. Avoid using cyclical attributes (phase, strike,…) with neural networks. A good neural network application will mimic the interpreter who trains it. Don’t ask a poor interpreter to train a neural network! Lack of sufficient control or use of too many attributes can lead to false positive and false negative predictions!

55 “Understand your assumptions. Quality control your results
“Understand your assumptions! Quality control your results! Avoid Mindless Interpretation!” (Bob Sheriff, 2004) A scene of seismic interpreters that was cut from the film “Planet of the apes”. Note that the orangutan is clearly the professor, walking around, encouraging his students. Also note the banana motivation.

56 Normally, this is the last slide of both my 2-day and 5-day short courses. Thanks for bearing with me! -kurt


Download ppt "Supervised Multiattribute Classification"

Similar presentations


Ads by Google