Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik.

Similar presentations


Presentation on theme: "1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik."— Presentation transcript:

1 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik Banerjee

2 2 ISS – Intelligent Systems Solutions Group of researchers/academics Working with CAS (Centre for Adaptive Systems) Remit:  Provide Technology Transfer and Expertise to Industry  Assist NE SME’s and stimulate business growth Obtain funding, e.g. SMART Awards, GONE, etc.

3 3 ISS Projects RAM Energy – Intelligent Data Analysis Neptune Engineering – Intelligent Diagnostics HASS – Back-office system/DBase Hart Biological – Back-office system/Dbase, process manufacturing Etc.

4 4 RAM Energy Founded 2000 Clients in Oil/Gas, Energy, Process, Manufacturing, Haulage Industry Products Energester +Enziro  Ester based synthetic lubricants and greases, enzymatic cleaning solutions, absorbents and blasting media  Better lubrication, heat dissipation and vibration reduction than oil or grease in isolation and conventional additives

5 5 RAM Energy Problem  Demonstrate effectiveness and cost efficiency  Data collected by RAM Energy very large major differences across the various sectors  Assist RAM Energy in structuring their data collection and storage in general  Heavy haulage industry

6 6 RAM Energy Trials  RAM energy carried out select trials with clients. These included: Monitored consumption prior to Energester use Monitored consumption post Energester use Use of control vehicles (no Energester use) Temperature data collected

7 7 RAM Energy Haulage Data collected via diesel receipts  Information consisted of Card number (allocated to reg n number) Vehicle registration Date Fuel Mileage

8 8 Registration NumberDateReg EnteredFuel AddedMileage J577PWL20020901DX51MYT276.19128504 J577PWL20020902DX51MTY296.51129130 J577PWL20020904DX51MYT288.88999 J577PWL20020905J577PWL235.95666 J577PWL20020907J577PWL3461 J577PWL20020907J577PWL234.861 J577PWL20020908DX51NYT21199999 J577PWL20020909DX51MYT447.7311 J577PWL2002091051286.244717 J577PWL20020910DX51MYT253.07135300 J577PWL20020911DX51MYT2811 J577PWL2002091251220.661000 J577PWL20020912DX51MYT2601 J577PWL20020913DU02PBY3251 J577PWL20020914DU02PBY255.59109705 J577PWL20020915DU02RBY267.17110296 J577PWL200209152267.62120889 J577PWL20020916DU02PBY182.16111563 J577PWL20020916DU52PBY260.02112043 J577PWL200209172263.912646 J577PWL20020917DU02PBY224.81113223 J577PWL200209182251.093773 J577PWL20020918DU02PBY224.67114513

9 9 RAM Energy Analysis  Performed using Excel spreadsheets  Discrete mpg (mileage since last fill/diesel input)  Some cumulative mpg using total mileage/total diesel input to date)  Attempt to normalise using mean temperature records  Some regression analysis

10 10

11 11 RAM Energy Results No seasonal adjustment With seasonal After Energester42.9443.46 Before Energester42.6642.64 Percentage gain0.64%1.92%

12 12 RAM Energy Problems Missing data consisted of  Driver information (who?)  Loading information (full/empty)  Length of journey  Type of journey (long haul vs short haul)  Urban or motorway conditions  Etc.

13 13 RAM Energy Conclusion Results very poor and inconclusive

14 14 Database  Excel sheets were converted to an Access database with deletion of unnecessary rows and columns.  The Access database was then imported into SQL Server for data query and subsequent analysis

15 15 Data Cleansing Brief outline of most obvious problems with the data  1. Card Number  2. Registration Number  3. Date  4. Fuel Added  5. Mileage

16 16 Card Number There were duplicate Card Numbers for (presumably) the same Card, e.g. 85944 and 0085944 In a few cases, for a given Registration Number, there appear additional Card Numbers, e.g. for ‘N151EUB’ there are the Card Numbers: 38195 0038195 56408

17 17 Registration Number Registration numbers seemed to be always entered correctly However, the field Reg Entered did not always tally with this RAM recommendation to ignore

18 18 Date Dates entered very consistent  preserved the ordering  distance between dates  the actual date An important question was: CAN WE PRESUME THE DATE IS ALWAYS ENTERED CORRECTLY ? If this was so, then this provided us with a convenient check on the Mileage, as Date and Mileage should both increase together.

19 19 Fuel Outlier identification  Very small and very large values easily detected over large dataset  Take mean of the sample and flag as outliers data more than 3 or 4 SD’s away from the mean  Very small values e.g. 0 or 1 assumed as bogus values  9999, 999, etc. taken to be bogus values  Some small and large values mistyped, with either the decimal place occurring too soon (e.g. 38.6 instead of 386) or extra digits added (e.g. 3860 instead of 386)

20 20 Fuel Difficult errors  e.g. 693392.. could be 69392 ? What if 693399 ?  Data must be flagged as erroneous

21 21 Mileage Some values were entered as {0,1,999,9999,2,3,5,10,111,1111,123,789, etc} If we can presume that the Date is a sensible value, then in a dataset where there are only a few missing or obviously incorrect values for the Mileage, these values can be amended as follows

22 22 Mileage DayMileageSpurious? 11300 12400 13500 ? 14450 ? We do not know if the day 13 entry is wrong, or day 14. So we can look ahead:

23 23 Mileage DayMileageSpurious? 11300 12400 13500 14450 ? 15510 DayMileageSpurious? 11300 12400 13500 ? 14450 15470 Or

24 24 Mileage Trans Quantity (Fuel Added)Odometer (Mileage) 182.0455525 2360 2901 268.3357589 Trans Quantity (Fuel Added)Odometer (Mileage) 182.0455525 236+ 290 + 268.33 = 794.3357589 Collapsed to:

25 25 Mileage Small and very large values could be ignored Problem was determining whether any of the remaining data was valid – data validation Evaluating the degree of correlation between the increasing Date, and the supposed increasing Mileage Useful approaches for estimating rank- orderedness and correlation between lists  Spearman’s coefficient of rank correlation  Kendall’s Tau

26 26 Data Cleansing

27 27 Ram Energy Data Validator

28 28

29 29

30 30 Bayesian - Approach In Bayesian approach to statistical inference, express uncertain beliefs about things in terms of probability  E.g. that there is a 50% chance that the average fuel consumption of a vehicle will be less than 30mpg Can use probabilities in this way to describe uncertainty about things we do not know  E.g. amount of fuel in a vehicle’s tank at 10.00am yesterday

31 31 Bayesian - Approach Once we accept this view of probability, the principle for learning from data is simple Before we see the data, we have a probability distribution based on our knowledge up to that point  prior distribution When we see the data our probability distribution changes, in the light of new information in the data  posterior distribution.

32 32 Bayesian - Approach Calculation used to get from the prior distribution to the posterior distribution  Uses Bayes’ theorem  Hence Bayesian statistics Very straightforward interpretation of the results when using this method Posterior distribution tells us how likely it is that various things are true, after we have used the evidence in the data

33 33 Bayesian - Approach Different observers can have different prior beliefs and this means that their posterior distributions will also be different  make prior distribution represent very little information  in practice prior tends to have little effect on posterior One advantage of this approach is that it is straightforward to calculate what we expect various things to be after seeing the data  For example, can calculate a posterior probability distribution for the cost savings of applying the fuel additive to a whole vehicle fleet

34 34 Bayesian - Model The basic model used is a regression, with fuel used as the dependent variable and distance travelled as one of the explanatory variables Each observation corresponds to the time between two successive additions of fuel to the fuel tank Expect zero fuel to be used if zero distance were travelled, amount of fuel used is not necessarily proportional to the distance travelled For example, fuel efficiency may be greater on longer journeys

35 35 Bayesian - Model Simplest form of the model, assume that fuel used is proportional to distance travelled Constant of proportionality which is the slope of the line on a graph Various other forms of relationship were also investigated. While distance travelled is most obvious explanatory variable, there are several other variables and factors which must be taken into account

36 36 Bayesian - Factors Vehicle Types  Type of vehicle has effect  Individual vehicles of same type may also have different characteristics  Effect of individual vehicles (within a type) was regarded as a random effect  Vehicles seen as a sample from all vehicles of that type

37 37 Bayesian - Factors Drivers  Driver identified by card number  Drivers closely associated with vehicles  In this case, difficult to separate effects of vehicles from the effects of drivers  However, if this were not the case, then it would be possible to make inferences about individual drivers as well as individual vehicles

38 38 Bayesian - Factors Time of year  Fuel efficiency may be affected by ambient temperature/meteorological variables  Ideally use meteorological data  Obtained data for this purpose  But, as a first step, a simple substitute is to use the time of year, e.g. month

39 39 Bayesian - Factors Presence of fuel additive  The main question of interest is, “How does the use of the fuel additive affect fuel consumption?

40 40 Bayesian - Complications Fuel  How full the fuel tank was before or after fuel was added  Precisely how much fuel was used between fills True tank content regarded as a latent or “hidden” variable  Such variables can be built into a Bayesian analysis

41 41 Bayesian - Complications Data entry errors  Graph of odometer readings against date for a single vehicle shows the general pattern - spurious values  This built into the model by allowing certain prior probabilities for errors of different types  The analysis can thus “recognise” errors by calculating posterior probabilities that a reading is an error of the various types  Those values which have large posterior probabilities of being erroneous are, in effect, ignored by the rest of the analysis.

42 42 Bayesian - Conclusions Prototype Bayesian models were successfully run Demonstrated feasibility of approach for this problem However:  Need to overcome problems of missing data  Uncertainty over when additive would be expected to have an effect  Pattern of this effect  Confounding of additive effect with the effects of other factors such as the changing seasons

43 43 Bayesian Results Posterior probability density for the effect of the additive, in litres per mile

44 44 Conclusions Recommendations:  Design of better trials and data acquisition  Collection of ambient temperatures, etc. Future Directions  Fraud detection  Efficiency of individual drivers/vehicles  Patterns of work, optimisation


Download ppt "1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik."

Similar presentations


Ads by Google