Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.

Similar presentations


Presentation on theme: "Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005."— Presentation transcript:

1 Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005

2 2T. Petrou 17 Aug 2005 Presentation Outline ESP Overview Survival Analysis Review Dataset Explanation Problems Modeling Process and Improvements NPV Calculations for ESP’s Conclusions

3 3T. Petrou 17 Aug 2005 ESP overview More than 60 percent (and rising) of producing oil wells require some type of assisted lift to produce the recoverable oil. ESPs are typically used where there is insufficient pressure to lift the fluids to the surface (typically in older, more watered- out wells). Provide cost effective production by boosting fluid production from these less efficient, older reservoirs.

4 4T. Petrou 17 Aug 2005 Survival Analysis (SA) Review Survival Analysis refers to the statistical procedures for modeling the time until an event occurs. Censoring occurs when a pump has yet to fail at the time of data collection.

5 5T. Petrou 17 Aug 2005 Capable of providing insight into which explanatory variables significantly affect run times. Predict run times of ESPs given various values of the explanatory variables. Generate estimated survival curves -Produce a bond-type risk ranking scheme -Provide annuity-type NPV calculations for ESP value -Simulate sample reservoir ESP usage Survival Analysis Benefits

6 6T. Petrou 17 Aug 2005 Survivor and Hazard Functions Survivor function S(t) Gives the probability that an individual survives longer than time t Hazard function h(t) Gives the instantaneous potential per unit time for failure given that the pump has survived up to time t The models applied are defined in terms of the hazard function.

7 7T. Petrou 17 Aug 2005 Generating Survival Curves Three main methods: Non-parametric (Kaplan-Meier) Parametric (exponential, Weibull, etc…) Semi-parametric (Cox Proportional Hazards) –Factors and covariates are compared to a baseline hazard function –Allows us determine which combination of potential explanatory variables are most significant

8 8T. Petrou 17 Aug 2005 Formulation of Cox Proportional Hazards Model Given two pumps (R and C), made by two different manufacturers, their hazard functions would be, where is a constant known as the relative risk. If is less than 1 then pump R would be less likely to fail at any given time. Since the relative hazard cannot be negative, we let The comparative baseline level can be arbitrarily chosen. If a different baseline level is chosen, the parameters would change but all statistical significance tests would remain the same.

9 9T. Petrou 17 Aug 2005 Step-wise modeling overview 1.Data transformation with expert collaboration 2.Step-wise model selection with factor collapsing 3.Model verification and validation 4.Model implementation Once all steps are complete, an automated process can then be set up for quick statistical ESP analysis.

10 10T. Petrou 17 Aug 2005 Data Introduction The data contains nearly 25000 different records of ESPs from around the world. There are 58 explanatory variables consisting of factors and covariates. Problems with large data: Difficult to find correct model Very time consuming Inconsistencies abound Problems with this data High correlation (multicollinearity) Low-failure occurrences Missing data Pragmatic Approach Different subsets of data were chosen.

11 11T. Petrou 17 Aug 2005 Highly Correlated Data The best way to alleviate multicollinearity issues is to work with someone that has expert knowledge of the database to remove redundant explanatory variables. In the absence of an expert, sifting through the data by hand is a must. Producing a cross-table of the data is one method to find variables that are highly correlated. SYSMFG PMPMFG A perfect one-to-one correlation is found. Removing one of the variables is necessary.

12 12T. Petrou 17 Aug 2005 Removing Data Variables exhibiting the near one-to-one correlation were removed. There were also many other variables that were subsets of one another There might possibly be a chance to replace some variables with the variables that are subsets of them. Knowing one variable level can possibly give information about 15 others. Reducing the data will help with model interpretation as well as computing time.

13 13T. Petrou 17 Aug 2005 Transforming Low Counts and Missing Data All factors in the data can be comprised of several levels each. Levels with low counts can severely skew the model building process. To alleviate this problem, all levels were required to have at least 15 records. Missing data was also an issue. Several variables had more than half their values recorded as ‘NA’. If the NA group contained more than 15 entries then, this group was changed to a level named ‘Unknown’ Again, collaboration with an expert is needed to investigate the cause for the missing entries.

14 14T. Petrou 17 Aug 2005 Data With No Failures Right censored data can make for difficult analysis A factor level with no failures is essentially implying that an ESP will never fail. No information about failure rate is being provided. To alleviate this problem, the levels can be eliminated from the data all together or combined with another level with help from an expert.

15 15T. Petrou 17 Aug 2005 Model Selection Once a ‘good’ set of data is produced, a step-wise procedure will add or remove variables one at a time until a statistically ‘best’ model is found. Different combinations of explanatory variables will affect selection procedure. The step- wise procedure is conservative and will tend to keep variables in the model that might not be necessary. Once this model is found, each variable is looked at individually and a decision is made whether or not to drop the variable.

16 16T. Petrou 17 Aug 2005 Factor Collapsing Once a final model is chosen, a procedure to combine levels of similar hazards is began.

17 17T. Petrou 17 Aug 2005 Model Validation A valid model is one that is consistent, reliable and not sensitive to small changes in the data. Methods to check validity: Randomly split data and retrieve a new model for each half and compare. Randomly split data and use model found for first half to model second half and compare coefficients Use a bootstrapping method to obtain many different sets of data and apply the model building procedure Obtain new data, repeat model building procedure and compare. This method could be useful to see how the model changes over time.

18 18T. Petrou 17 Aug 2005

19 19T. Petrou 17 Aug 2005 Conclusions Pragmatic risk ranking and valuation tools for ESPs have been created Pragmatic tools for dealing with large, sparse, and inconsistent data as well as Modeling this data in a consistent fashion

20 20T. Petrou 17 Aug 2005


Download ppt "Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005."

Similar presentations


Ads by Google