Lufthansa Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz.

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
AGIFORS--RM Study Group New York City, March 2000 Lawrence R. Weatherford, PhD University of Wyoming Unconstraining Methods.
Confidential 1 DCPs in Forecasting Edward Kambour, Senior Scientist Roxy Cramer, Scientist.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Model calibration using. Pag. 5/3/20152 PEST program.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
BA 555 Practical Business Analysis
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Regression Diagnostics Checking Assumptions and Data.
Inferences About Process Quality
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Simple Linear Regression Analysis
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
Objectives 1.2 Describing distributions with numbers
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Error Analysis Accuracy Closeness to the true value Measurement Accuracy – determines the closeness of the measured value to the true value Instrument.
Hydrologic Modeling: Verification, Validation, Calibration, and Sensitivity Analysis Fritz R. Fiedler, P.E., Ph.D.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Chapter 7 Random-Number Generation
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
HY436: Mobile Computing and Wireless Networks Data sanitization Tutorial: November 7, 2005 Elias Raftopoulos Ploumidis Manolis Prof. Maria Papadopouli.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Chapter 8: Confidence Intervals based on a Single Sample
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Founded 1348Charles University
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Lufthansa Looking for Feedback Performance Measurement in Revenue Management Stefan Pölt Lufthansa German Airlines AGIFORS Reservations & Yield Management.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Machine Learning 5. Parametric Methods.
Estimation Kline Chapter 7 (skip , appendices)
Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
Statistics Presentation Ch En 475 Unit Operations.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
BIOSTATISTICS Hypotheses testing and parameter estimation.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Confidence Interval Estimation
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Chapter 11 Simple Regression
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Stats Club Marnie Brennan
Confidence Interval Estimation
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Chapter 8 Estimation.
Presentation transcript:

Lufthansa Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 2 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and TheoryOutlier Detection MethodsAnalysis MethodSome Words on Quality MeasurementResultsSummaryLiterature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 3 Lufthansa Booking data in RM systems can be influenced by many disturbances Definition: Outliers are data points which differ in their appearance from the majority of the data. (Rousseeow, 1990) Caused by: system errors schedule changes special events Two approaches to cope with outliers: robust approach: –use robust methods/predictors diagnostic approach: –identify outliers –trimm or ignore them –apply classical methods/predictors Best practice for chain processes

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 4 Lufthansa If ignored, outliers can affect the quality of the forecasting process significantly To measure the robustness of a forecast method, Hodges introduced the term breakdown point. (Hodges 1967) The breakdown point can be loosely defined as the smallest fraction of outliers that seriously offsets the estimator from the true one. (Rousseeuw 1991) The breakdown point of any regression method based on the least squares technique is 1/n, which means a single outlier in a set of n data points can degenerate the LS estimate.

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 5 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 6 Lufthansa Z-Score Testing calculate empirical average  and variance  based on historical bookings for each DCP check whether number of historical bookings > minimum observations tag as outlying if outside the following interval upper threshold:  +maxSigmaPos*  lower threshold:  -maxSigmaNeg*  trimm outlying data to threshold value before updating  and 

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 7 Lufthansa Z-Score Testing 0 0,05 0,1 0,15  bkgs density function of normal distribution lower bound upper bound

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 8 Lufthansa Determination Coefficient Testing on Residual Regression update exponentially smoothed bookings for each dcp -> reference curve check whether number of historical bookings > minimum observations calculate residuals  bkd(dcp) from actual bookings and reference curve calculate linear regression curve reg(dcp) on residuals  bkd(dcp)

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 9 Lufthansa Determination Coefficient Testing on Residual Regression  bkd(dcp) reg(dcp) dcp

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 10 Lufthansa calculate the determination coefficient Determination Coefficient Testing on Residual Regression  (reg (dcp) - reg)  (  bkd (dcp) -  bkd ) R = if R 2 < minR 2 tag dcp with largest vertical distance to regression curve as outlying and take it out of the set iterate with cleaned data set stop if R 2 > minR 2 or number of outlier > maxOutlier reset outlier taggings if more than maxOutlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 11 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 12 Lufthansa The simulation is performed on real booking data 42 flight numbers (2 multi-leg flights) data type:actual bookings data source:PROS IV data base departure time range:01Jun May97 booking classes:FA CDZ HBLGYKTWE evaluated DCPs:1-15 total flight departes: total DCPs:

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 13 Lufthansa Analysis method: artificial outlier implantation 1) Preprocessing: outlier cleaning with very conservative parameters (high outlier tagging rates) 2)Different manipulations are performed with predefined probabilities XLAenlarge all DCPs x 3.00P XLA = 0.01 XSAshrink all DCPsx 0.33P XSA = 0.01 XL1enlarge single DCPx 3.00P XL1 = 0.01 XS1shrink single DCPx 0.33P XS1 = 0.01 X-Yswap booking classes X and YP X-Y = )Artificially created outliers are tagged. 4)Apply outlier detection method 5)Evaluation: count number of recognized outliers and non-outliers

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 14 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 15 Lufthansa observables:True PositivesTP True NegativesTN False PositivesFP False NegativesFN TP sensitivity 1 :TP + FN=: sens (masking) TN specificity 1 :TN + FP=: spec (swamping) TP + TN efficiency 1 :TN + FN + TP + FP=: eff TP + FP temperament:TN + FN + TP + FP=: temp The quality measures known in the literature are not sufficient in the RM environment. 1 (Walczak, 1998)

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 16 Lufthansa Quality Measures for Outlier Detection Methods For an outlier detection method on booking data it is most important to detect almost all outliers. Few data points which are erroneously taken out of the valid set, have less impact. weighting of error types TP and TN dynamical adaption of weights to degree of contamination axioms for a quality measure  let A,B  Â denote the complex set of correct classifications, 0 <=  (A) <= 1  (A) = 0  A =   (A) = 1  A= Â A  B   (A) <  (B)  ( A  B) =  (A) +  (B) -  (A  B)

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 17 Lufthansa  TN + (1-  )TP  (TN+FP) + (1-  ) (TP+FN) TP + FN TN + FN + TP + FP (outlier rate)  TN + (1-  )TP  (TN+FP) + (1-  ) (TP+FN) TP + FP TN + FN + TP + FP (temperament) Contamination and Temperament Weighted Efficiency meet the conditions CWE = TWE = with  =

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 18 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 19 Lufthansa temperament, z-score testing Sensitivity Analysis on Cleaned Booking Data - temperament for z-score testing -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 20 Lufthansa Sensitivity Analysis on Cleaned Booking Data - sensitivity for z-score testing - sensitivity, z-score testing

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 21 Lufthansa specificity, z-score testing Sensitivity Analysis on Cleaned Booking Data - specificity for z-score testing -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 22 Lufthansa efficiency, z-score testing Sensitivity Analysis on Cleaned Booking Data - efficiency for z-score testing -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 23 Lufthansa Max: (0.9, 0.6, ) CWE, z-score testing Sensitivity Analysis on Cleaned Booking Data - contamination weighted efficiency for z-score testing -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 24 Lufthansa Max: (0.9, 0.6, ) TWE, z-score testing Sensitivity Analysis on Cleaned Booking Data - temperament weighted efficiency for z-score testing -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 25 Lufthansa temperament, DCT Sensitivity Analysis on Cleaned Booking Data - temperament for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 26 Lufthansa sensitivity, DCT Sensitivity Analysis on Cleaned Booking Data - sensitivity for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 27 Lufthansa specificity, DCT Sensitivity Analysis on Cleaned Booking Data - specificity for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 28 Lufthansa efficiency, DCT Sensitivity Analysis on Cleaned Booking Data - efficiency for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 29 Lufthansa Max: (0.45, 14, ) CWE, DCT Sensitivity Analysis on Cleaned Booking Data - contamination weighted efficiency for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 30 Lufthansa Max: (0.5, 14, ) TWE, DCT Sensitivity Analysis on Cleaned Booking Data - temperament weighted efficiency for DCT - min R 2 max outlier

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 31 Lufthansa Raw data analysis delivers more realistic results Optimal Parameters on Cleaned and Raw Booking Data z-score testing (ZST) cleaned dataraw data CWE0.9 / 0.6 -> / 0.8 -> TWE0.9 / 0.6 -> / 0.9 -> determination coefficient testing (DCT) cleaned dataraw data CWE0.45 / 14 -> / 14 -> TWE0.50 / 14 -> / 13 -> 0.747

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 32 Lufthansa Proper parameter calibration is more important than method choice. Comparison on raw data

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 33 Lufthansa Z-score testing on booking changes is more efficient than on booking values. Optimal Parameters on Raw Booking Data z-score testing (ZST) on bookingson booking changes CTW1.5 / 0.8 -> / 1.1 -> DTW2.2 / 0.9 -> / 1.5 -> 0.820

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 34 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 35 Lufthansa We defined new quality measures for outlier detection models which enable a parameter optimization and the comparison of different methods. Symmetric acceptance ranges for z-score testing are of disadvantage –potential for improvement by only adjusting parameters –revenue impact unknown, but positive –low risk Clear superiority of z-score testing on cleaned booking data Slight superiority of z-score testing on raw booking data Parameter optimization incorporates higher potential for improvement than choice of method. Z-score testing can be improved if applied on booking changes Outlier Detection Methods on Booking Data - Summary -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 36 Lufthansa Outlier Detection Methods on Booking Data - Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 37 Lufthansa Hodges 1967 J.L. Hodges, Proc. Fifth Berkeleley Symp. Math. Stat. Probab., 1967, 1, Rousseeuw 1987 P.J. Rousseeuw, A.M. Lerroy, Robust Regression and Outlier Detection, Wiley, New York, 1987 Rousseeuw 1990 P.J. Rousseeuw, Unmasking Multivariate Outliers and Leverage Points (with discussion), Journal of the American Statistical Association, 1990, 85, Outlier Detection Methods on Booking Data - Literature -

Outlier Detection Methods on Booking Data Ulrich Oppitz, May 2001, Page 38 Lufthansa Rousseeuw 1991, P.J. Rousseeuw, Journal of Chemometrics, 1991, 5, 1-20 Walczak 1998, B. Walczak, D.L. Massart, Multiple Outlier Detection Revisited, Chemometrics and Intelligent Laboratory Systems, 1998, 41, 1-15 Outlier Detection Methods on Booking Data - Literature, ctd. -

Lufthansa Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz