Pattern Recognition Techniques in Petroleum Geochemistry

Pattern Recognition Techniques in Petroleum Geochemistry
L. Scott Ramos and Brian G. Rohrback Infometrix, Inc. This presentation provides an overview to the application of pattern recognition techniques found in the chemometrics literature to solving problems in petroleum geochemistry. Specifically, we look to these techniques to have application across the range of the petroleum industry. The examples presented here are in the exploration and exploitation (reservoir modeling) realms, although the technology has application in refining, gas processing, distribution, plant optimization and environmental monitoring. All of the examples presented today are available on existing analytical instruments or as a service. Daniel M. Jarvie Humble Instruments & Services, Inc.

Computer-Assisted Geochemistry
The emphasis in production geochemistry is to match oils to source rocks and to correlate one crude oil to others. We do this to trace migration or to assess the degree of communication among reservoirs. Computerized pattern recognition (aka chemometrics) is an efficient way to exploit the information richness of the data without sacrificing speed or accuracy. The emphasis in production geochemistry is to match oils to source rocks and correlate one crude to others in order to trace migration or assess degree of communication among reservoirs. To this we apply a combination of analytical techniques: wet chemistry, GC, MS, microscopy. The resulting flow of data can be overwhelming. Our traditional approach to this barrage has been to extract a much smaller number of measurements to form a basis for interpretation, but this filtering process may also discard information content relevant to efficient oil production. Use of pattern recognition techniques, or chemometrics, is an efficient way to exploit the information richness of the data without sacrificing speed or accuracy in the interpretation. Employing a multivariate approach provides a more robust method of monitoring oil fields and optimizing production yield. In this presentation, we describe applications of chemometrics to petroleum production: basin studies using multivariate regression and oil typing via multivariate classification.

An Overlay of Chromatograms
By overlaying chromatograms we can look both at the similarities and the differences in the crude oils. Software can use this underlying pattern to build quantitative and objective models. Patterns are demonstratable in all forms of analytical data. The first pass at interpreting these two chromatographic traces is that they are quite similar. But what does that word “similar” mean? By overlaying the chromatographic traces, we can look both at the similarities and the differences. Subtle changes in relative abundances of the major peaks and a few small peaks at the beginning of one chromatogram turn out to be diagnostic. The trick now is to build objective and quantitative models from data such as these: objective because we want to get the same interpretation regardless of who does the analytical work; quantitative because we need to apply the experience we gain to new samples.

Example: Automation of Geochemical Evaluations
Source rock typing can be done by using GC, GC/MS and stable isotopes on crude oils. We employ a series of chemometric models to first separate the samples based on gross characteristics (I.e., lacustrine versus marine) and then use fine tuning models to further characterize samples. Crude oil and source rock evaluations are the primary tasks facing an organic geochemist in the petroleum industry. The common tools of the trade are gross oil composition (using column fractionation, whole oil MS, API gravity, %S), detail of the value fractions (GC of saturates and aromatics), GC/MS to detail the biomarkers, and stable isotopes to attribute source characteristics. All of this work can lead to a log jam at the desk of the scientist tasked with completing the work. Chemometric pattern recognition provides a vehicle to accomplish much of the routine geochemical evaluation in an automated, computerized fashion. The advantage is in higher productivity and a faster turnaround of sample or basin reports. In this example, we will assemble a library of crude oils from around the world a construct a geochemical library that can be used to classify new oils or oil shows. The advantage is that we can rapidly evaluate oils, in this case based on the GC/MS and stable carbon isotopes of the saturate and aromatic fractions. This system gives information on the source rock type (paralic, resinitic, open marine, …) and relates the oils to well-studied geochemical provinces throughout the world. This model is commercially available through GeoMark Research and is called OilMod.

GC/MS Mass Chromatograms Tricyclic Terpanes m/z=191
Both the tricyclic (shown here) and the pentacyclic triterpanes are valuable markers in geochemical assessment. The traditional way of handling this data is to create ratios of the peaks in the mass chromatogram shown and use these as a set of analytical heuristics (rules of thumb) to interpret the data. The multivariate chemometric approach allows us to deal with the terpane pattern as a whole, not as a collection of disparate pieces. In this way we can take advantage of the natural correlations among these variables (peaks) that are found in the data.

GC/MS Mass Chromatograms Steranes m/z=217
Terpane information can be added to the sterane pattern to increase the “information content” of the data set. The sterane patterns supply a separate look that is diagnostic of the source and maturity of the crude oil.

Traditional Geochemistry
1.6 1.4 1.2 1.0 C29/C30 Hopane Carbonate 0.8 Marl 0.6 Coal/Resin Lacustrine 0.4 Marine Shale To demonstrate that there is “information content” in the geochemical data, we can look at the parameters one at a time and it is clear we can find patterns of association (groupings and trends) in the data. To evaluate all possible combinations of geochemical markers (and try to keep one relationship in mind as you examine a plot of different variables) is tedious at best. Pattern recognition technology affords us with a means of evaluating all of the data from a data set simultaneously and in an automatable and objective manner. Paralic/Deltaic 0.2 0.0 0.5 1.0 1.5 2.0 C22/C21 Tricyclic Terpane

Construction of a Geochemical Library
Source Rock Type # of Oils Marine Shale Paralic/Deltaic Marine Shale Marine Carbonate/Marl Evaporite/Hypersaline Marls Coal/Resinitic Terrestrial Source Lacustrine, Fresh Lacustrine, Saline The 424 oils used to construct the library were collected from 57 different basins/regions and have been pre-classified into 7 sourcerock type categories. The most commonly-represented source rock types (as should be expected) are the marine shale and the marine carbonate/marl origin oils. The age distribution is similarly diverse, spanning Precambrian to Neogene, with the dominant representation during and just after the rein of the dinosaurs (Upper Jurassic through Paleogene). The issue here is to assemble data on a sufficient number of oils to make the library valuable.

Assembly of a Library x11 x12 x13 ... x1m x21 x22 x23 ... x2m
A data matrix is constructed based on geochemically significant ratios drawn from the GC, GC/MS and stable carbon isotopes (saturate and aromatic). x11 x12 x x1m x21 x22 x x2m xn1 xn2 xn3 ... xnm The construction of the library was carried out using a multivariate analysis system. The procedure was to assemble a data table (or spreadsheet) that organized the library data. Chemometric classification techniques were then applied to build models around the data in this experience set. It doesn’t matter how many geochemical parameters are used to construct the data table. The software automatically filters out variables that have little or random variation. We have found that there is value in combining data from multiple instrument sources so that bulk chemistry is combined with specific marker compounds (and ratios).

KNN Method to Classify Marine Lacustrine Unknown
Although I won’t go into the details of the technique, we use a technique called K-Nearest Neighbor (or KNN) to do the classification work. In the figure above, every point on the screen represents a different oil sample. You can look at the plot as being a simple x-y plot of two new geochemical parameters that are a “discerning” linear combination of all of the geochemical variables measured. The definition of these axes is objectively set using factor analysis (aka principal component analysis). KNN projects new samples into the library and simply polls its nearest neighbors. In the example above we have set the K value to 5, meaning we will look at the five nearest samples from the library and take a vote. In this case, four of the five library entries would suggest that this sample more properly belongs to the marine than to lacustrine and the sample will be assigned to the marine group. KNN is handy because it is simple and it functions well even if some of the categories have very few samples (e.g., there are only 11 oils derived from evaporitic source rocks in the library). The problem with KNN is that it will make an assignment come hell or high water; it does not have the ability of assessing if the library assignment makes sense (only that it is the nearest neighbor). Marine Lacustrine

SIMCA Method to Qualify
Unknown To solve the problem posed by unusual samples, we build a probability envelope around both the marine and the lacustrine oils and compare the unknown oil to that model. In the case presented here, the ovals represent the 95% confidence interval surrounding both populations. The unknown clearly fits within the expectations for marine and is clearly outside the distributions typical for lacustrine oils. The modeling technique used for this qualification of the KNN classification is called Soft Independent Modeling of Class Analogy (I guess all of the good names were taken). Marine Lacustrine

Oil Classification Schematic
Oil Sample Paralic/Deltaic Terrestrial Coal/Resinitic Shale Aquatic Marine Marl/Carbonate The classification of oil samples is conducted in a hierarchical fashion. We use the first pass to distinguish aquatic from terrestrial source oils. Once separated into these major categories, we use one or more models to categorize the oils into their final groupings. We do this process for two reasons. First, because the separation of groups is driven by differences in the geochemical parameters. Thus, small differences that are diagnostic of separating shales and marls are likely to be masked when all of the oil samples are considered together. A second reason we take this step-wise approach is that the variables important to separate say marine from lacustrine are different from those used to further classify the lacustrine samples into fresh water and saline regimes. Evaporite Fresh Water Lacustrine Saline Water

Automation of a Hierarchical Classification
• • • elseif All == 3 load knn model from ‘aquatic.mod’ G3 = predict if G3 == 1 load knn model from ‘marine.mod’ predict elseif G3 == 2 load knn model from ‘lacustr.mod’ end The sequence of models implied in the last slide can be combined to form an expert system that appears to the user as a single prediction step. The example in this slide is a macro outlining the steps taken in separating lacustrine from marine oils. First the “aquatic” model is run to separate the marine from lacustrine. Then the marine and lacustrine models are run to further subdivide the oil sources. The advantage of such a system is that it can be called from an analytical instrument or a LIMS directly and automatically. Although 6 different KNN and 7 different SIMCA models are employed in the assessments, the InStep expert system runs the succession as if it were a single pass.

Example: Reservoir Oil Fingerprinting
Chromatography allows us to determine if one reservoir is linked to another by looking at marker peaks that show between the normal alkanes. This process can be done either by choosing an appropriate set of marker peaks ahead of time or by evaluating the whole chromatographic pattern. GC is usually the technique of choice due to the lower cost of analysis and faster turnaround time. In the previous example, we simply automated the traditional approach to geochemical analysis. We achieved some benefits from using pattern recognition technology, primarily in that the classifications we made are done on an objective basis (I.e., independent of the operator). Other benefits include automatic flagging of oils that are not within the logical reach of the library. But ultimately, we only did what we did based on prior knowledge (that helped us choose the appropriate biomarkers, etc.) In this example, we will move one step away from the knowledge base of a geochemist. We will apply the chemometric approach to interpreting oils from different reservoirs, that in this case, have a very similar source facies. We do this to assess whether the reservoirs in question are connected.

Crude Oils from Two Reservoir Systems
n-C12 n-C15 n-C17 n-C19 Pr Ph This slide displays a portion of the chromatograms from 9 oils. These oils originate from two different reservoirs that are thought to be isolated from one another. The chromatograms are color coded with oils from one reservoir colored yellow and the other colored red. It is clear that these oils are very similar in their bulk concentrations of alkanes. It is equally clear that a comparison of normal alkanes is not going to give very good separation of the two populations. Let’s focus on a portion of the fine structure of the chromatogram to look for the diagnostic regions for separating the two groups. In this case, we will focus on the peaks between C15 and C16, as shown by the zoom box on the slide.

Marker Compounds Between n-C15 and n-C16
Expanding this portion of the chromatogram starts to bring up detail that is usually overlooked in geochemistry. That said, the patterns displayed are very similar for the reservoirs. It is important to put the chromatograms on equal footing and correct for variations of sample loading and baseline offset. Do do this we normalize the chromatographic traces.

Normalizing the Chromatograms to Accentuate Differences
Once we normalize the chromatograms, we see some subtle changes in the relative abundances of the compounds represented by the peaks. Note that we do not need to specifically identify the components used, we simply need to recognize them by retention time to pick out the difference. In this example, the early eluting peaks are more prevalent in the red reservoir and the later peaks are more abundant in the yellow. This information is sufficient to determine if newly acquired oils are in connection with the red reservoir, the yellow reservoir, or neither.

Example: Monitoring Yield from Multiple Reservoirs in Open Hole Completions
We can use chromatographic patterns to determine the relative yield from more than one reservoir even where there is no casing. In this example, the field is undergoing water flood to drive the oil to producing wells. One of the producing zones is significantly more porous than the other. Because pumping water is the primary cost, knowing the relative yields from each reservoir is important. Pattern recognition also can flag the unusual . . . Many of the older fields in the world were never cased. Termed open-hole completions, they were less expensive to drill and the issue of stratigraphic leakage was less important. The difficulty with open hole completions is that you don’t really know where the oil is coming from. The case presented here is located in West Texas, where a 700+ oil well field is well along the tertiary recovery route. Fully half of the wells have been converted from producers to water injectors (in order to drive the oil to the remaining wells). The problem here is that the geology has not cooperated with modern economics. Geologically, the two producing strata (Zones A and B) have different porosity/permeability. Economically, the cost of production is wholly dictated by the price of pumping the recovery water. If production from any given well is found to be largely from the low porosity strata, it is a waste of money to inject water at the neighboring wells (water, being lazy, will take the path of least resistance). We can use geochemistry (through the gas chromatograph) and chemometric pattern recognition to determine the relative contribution of each reservoir. . . .with an interesting twist.

Production Well 696 Well Stimulation
30 25 Well Stimulation 20 15 10 5 Well 696 was stimulated on February 1, The result of this stimulation was an increase in oil production. This is clearly why we stimulate wells in the first place. Standard procedure in this field is to run gas chromatography on the oils corresponding to any increase in well productivity. 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Production in the latest 30 production intervals (bbl/day) After closing Well 696 in and pressurizing the reservoir system, an increase in production was noted.

Well 696 - Chromatograms 1994 Production Pre-Stimulation
Post-Stimulation With the increase in oil production there was a bit of a change in the oil profile. The relative intensities of the peaks falling between the alkanes have changed. The question is whether this change is significant. Enter chemometric analysis . . . Are the differences in hydrocarbon distribution significant?

Well Oil Profile Production in Well 696 has changed in composition significantly since stimulation work was done. The interpretation is that the well is now producing from a new zone, deeper than the A or B zones already characterized. Zone C We have an experience set of oil chromatograms that mark whether the oils originated from Zone A or Zone B or if they represent a mixture of the two sources. This relationship is shown on this slide. This plot is similar to the one shown earlier (it is called a Principal Component Scores plot), where each labeled point represents a different oil sample and the closer each point is to another point, the more similar the chromatogram. Most of the oils fall along a mixing line from relatively pure Zone A contribution to primarily Zone B. All of the points colored in shades other than light blue are within the expectations of mixing. With the discovery of the changed pattern in the new production from Well 696, we have some indication that there is another Zone, deeper than A and B, that may be responsible for the increase. In light of this new information, several other wells were re-evaluated to possibly having significant levels of “Zone C” production. Zone B Zone A Some other wells also seem to show Zone C input.

Zone Apportionment Well 696
Stimulation Zone C Zone B We can go back to the historical data on the 696 well and apportion the relative yields from each reservoir. This graph can be updated automatically when the GC analysis is run. Zone A Yield by Zone in the latest 30 production intervals (bbl/day) We have an implied interpretation based on the geochemical differences in the chromatograms.

Field Production Characteristics
Well 696, Region 4 Production 23 bbls/day Water 85% 13% Zone A; 17% Zone B; 70% Zone C Probably the best way to deliver the results of a geochemical assessment (whether it is being done automatically or manually) is to provide it as a field map. No one really cares to see the data, nor is the chromatogram of any particular use. The field production supervisor is going to respond to the interpretation much like a process control operator will. In the example here, the 700+ well field is shown with the water injection wells colored blue and the producing wells are coded based on which Zone dominates the production. Note that several of the wells have been colored red (in this case indicating that our model is suggesting a significant contribution by the newly discovered Zone C. Our vision for the future is that the geochemistry and the pattern recognition will be performed in the background. Building systems such as the one implied in this slide is an important step in making maximum use of the technology available to us. Perhaps the best way to display the interpretation is by color-coding a map. A Zone Dominates C Zone Significant B Zone Dominates Injection Wells

Conclusions Source of a crude oil: Chemometric pattern matching is effective in routine geochemical evaluations and multi-step classification procedure is preferable (minimizes classification errors) GC, GC/MS, GC/MS plus isotopes Reservoir fingerprinting: The techniques can determine if a reservoir is connected to its neighbors, evaluate reservoir mixing and flag unusual samples GC on peak tables or whole chromatograms There are myriad applications for pattern recognition technology in the petroleum industry. Both the multivariate tools and the versatility of the analytical instruments enable us to do things that were do-able before, but required perseverance.

Pattern Recognition Techniques in Petroleum Geochemistry

Similar presentations

Presentation on theme: "Pattern Recognition Techniques in Petroleum Geochemistry"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pattern Recognition Techniques in Petroleum Geochemistry

Similar presentations

Presentation on theme: "Pattern Recognition Techniques in Petroleum Geochemistry"— Presentation transcript:

Similar presentations

About project

Feedback