Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau.

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Robust Multi-Kernel Classification of Uncertain and Imbalanced Data

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Machine Learning Neural Networks

Lecture 14 – Neural Networks

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Dr. Yukun Bao School of Management, HUST Business Forecasting: Experiments and Case Studies.

Forecasting 5 June Introduction What: Forecasting Techniques Where: Determine Trends Why: Make better decisions.

Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:

CES 514 – Data Mining Lecture 8 classification (contd…)

Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.

Chapter 13 Forecasting.

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Business Forecasting Chapter 5 Forecasting with Smoothing Techniques.

Slides 13b: Time-Series Models; Measuring Forecast Error

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

G. Peter Zhang Neurocomputing 50 (2003) 159–175 link Time series forecasting using a hybrid ARIMA and neural network model Presented by Trent Goughnour.

LSS Black Belt Training Forecasting. Forecasting Models Forecasting Techniques Qualitative Models Delphi Method Jury of Executive Opinion Sales Force.

Samuel H. Huang, Winter 2012 Basic Concepts and Constant Process Overview of demand forecasting Constant process –Average and moving average method –Exponential.

Operations and Supply Chain Management

Time-Series Analysis and Forecasting – Part V To read at home.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.

Data Mining Techniques in Stock Market Prediction

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.

1 DSCI 3023 Forecasting Plays an important role in many industries –marketing –financial planning –production control Forecasts are not to be thought of.

Definition of Time Series: An ordered sequence of values of a variable at equally spaced time intervals. The variable shall be time dependent.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

DAVIS AQUILANO CHASE PowerPoint Presentation by Charlie Cook F O U R T H E D I T I O N Forecasting © The McGraw-Hill Companies, Inc., 2003 chapter 9.

Operations Research II Course,, September Part 6: Forecasting Operations Research II Dr. Aref Rashad.

Time Series Analysis and Forecasting

It’s About Time Mark Otto U. S. Fish and Wildlife Service.

EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.

15-1 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Forecasting Chapter 15.

Neural Network Implementation of Poker AI

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

Time Series Prediction and Support Vector Machines ICONS Presentation Spring 2006 N. Sapankevych 16 April 2006.

Welcome to MM305 Unit 5 Seminar Prof Greg Forecasting.

Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.

Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.

Data Mining and Decision Support

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

4 - 1 Course Title: Production and Operations Management Course Code: MGT 362 Course Book: Operations Management 10 th Edition. By Jay Heizer & Barry Render.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

ISEN 315 Spring 2011 Dr. Gary Gaukler. Forecasting for Stationary Series A stationary time series has the form: D t =  +  t where  is a constant.

LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.

Managerial Decision Modeling 6 th edition Cliff T. Ragsdale.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.

Chapter 15 Forecasting. Forecasting Methods n Forecasting methods can be classified as qualitative or quantitative. n Such methods are appropriate when.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Welcome to MM305 Unit 5 Seminar Dr. Bob Forecasting.

Welcome to MM305 Unit 5 Seminar Forecasting. What is forecasting? An attempt to predict the future using data. Generally an 8-step process 1.Why are you.

Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.

CS 9633 Machine Learning Support Vector Machines

An Empirical Comparison of Supervised Learning Algorithms

Forecasting Methods Dr. T. T. Kachwala.

Demand Forecasting Production and Operations Management

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

“The Art of Forecasting”

Machine Learning Today: Reading: Maria Florina Balcan

Lithography Diagnostics Based on Empirical Modeling

Presentation transcript:

Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau

Overview Background Background Research Question Research Question Data Sources Data Sources Methodology Methodology Implementation Implementation Results Results Conclusion Conclusion

Background Information distortion in the supply chain Difficult for manufacturers to forecast

Current solutions Exponential Smoothing Exponential Smoothing Moving Average Moving Average Trend Trend Etc.. Etc.. Wide range of software forecasting solutions Wide range of software forecasting solutions M3 Competition research tests most forecasting solutions and finds the simplest work best M3 Competition research tests most forecasting solutions and finds the simplest work best

Artificial Intelligence Universal Approximators Universal Approximators Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Recurrent Neural Networks (RNN) Recurrent Neural Networks (RNN) Support Vector Machines (SVM) Support Vector Machines (SVM) Theorectically should be able to match or outperform any traditional forecasting approach. Theorectically should be able to match or outperform any traditional forecasting approach.

Neural Networks Learns by adjusting weights of connections Learns by adjusting weights of connections Based on empirical risk minimization Based on empirical risk minimization Generalization can be improved by: Generalization can be improved by: Cross Validation based early stopping Cross Validation based early stopping Levenberg-Marquardt with Bayesian Regularization Levenberg-Marquardt with Bayesian Regularization

Support Vector Machine Learns be separating data in a different feature space with support vectors Learns be separating data in a different feature space with support vectors Feature space can often be a higher or lower dimensionality space than the input space Feature space can often be a higher or lower dimensionality space than the input space Based on structural risk minimization Based on structural risk minimization Optimality guaranteed Optimality guaranteed Complexity constant controls the power of the machine Complexity constant controls the power of the machine

Support Vector Machine CV 10-fold Cross Validation based optimization of Complexity Constant 10-fold Cross Validation based optimization of Complexity Constant More effective than NN because of guaranteed optimality More effective than NN because of guaranteed optimality

SVM Complexity Example SVM Complexity Constant optimization based on 10-Fold Cross Validation SVM Complexity Constant optimization based on 10-Fold Cross Validation

Research Question For a manufacturer at the end of the supply chain who is subject to demand distortion: For a manufacturer at the end of the supply chain who is subject to demand distortion: H1: Are AI approaches better on average than traditional approaches (error) H1: Are AI approaches better on average than traditional approaches (error) H2: Are AI approaches better than traditional approaches (rank) H2: Are AI approaches better than traditional approaches (rank) H3: Is the best AI approach better than the best traditional H3: Is the best AI approach better than the best traditional

Data Sources 1. Chocolate Manufacturer (ERP) 2. Toner Cartridge Manufacturer (ERP) 3. Statistics Canada Manufacturing Survey

Methodoloy Experiment Experiment Using top 100 from 2 manufacturers and random 100 from StatsCan Using top 100 from 2 manufacturers and random 100 from StatsCan Comparison based on out-of-sample testing set Comparison based on out-of-sample testing set

Implementation Experiment programmed in MATLAB Experiment programmed in MATLAB Using existing toolbox where possible (eg, NN, ARMA, etc) Using existing toolbox where possible (eg, NN, ARMA, etc) Programming missing ones Programming missing ones SVM implemented using mySVM called from MATLAB SVM implemented using mySVM called from MATLAB

Experimental Groups CONTROL GROUP Traditional Techniques TREATMENT GROUP Artificial Intelligence Techniques  Moving Average  Trend  Exponential Smoothing  Theta Model (Assimakopoulos & Nikolopoulos 1999)  Auto-Regressive and Moving Average (ARMA) (Box and al. 1994)  Multiple Linear Regression (Auto- Regressive)  Neural Networks  Recurrent Neural Networks  Support Vector Machines

Super Wide model Time series are short Time series are short Very noisy because of supply chain distortion Very noisy because of supply chain distortion Super Wide model combined data from many products Super Wide model combined data from many products Much larger amount of data to learn from Much larger amount of data to learn from Assumes similar patterns occur in the group of products. Assumes similar patterns occur in the group of products.

Result Table (Chocolate) RankCntrl./Treat.MAEMethodType 1Treatment SVM CV_WindowSuperWide 2Treatment SVM CVSuperWide 3Control MLRSuperWide 4Treatment ANNBPCVSuperWide 5Control ES Init 6Control ES20 7Control Theta ES Init 8Control MA6 9Control MA 10Control ES Avg 11Control Theta ES Average 12Control MLR 13Treatment ANNLMBRSuperWide 14Treatment RNNLMBR 15Treatment ANNLMBR 16Treatment SVM CV 17Treatment SVM CV_Window 18Treatment ANNBPCV 19Treatment RNNBPCV 20Control ARMA 21Control TR 22Control TR6

Results Table (Toner) RankCntrl./Treat.MAEMethodType 1Treatment SVM CVSuperWide 2Treatment SVM CV_WindowSuperWide 3Control ES20 4Control MA6 5Control ES Init 6Treatment SVM CV_Window 7Control MA 8Control MLRSuperWide 9Treatment SVM CV 10Control Theta ES Init 11Control ES Avg 12Control Theta ES Average 13Control MLR 14Treatment ANNLMBRSuperWide 15Treatment RNNBPCV 16Treatment RNNLMBR 17Treatment ANNLMBR 18Treatment ANNBPCV 19Treatment ANNBPCVSuperWide 20Control ARMA 21Control TR 22Control TR6

Results Table (StatsCan) RankCntrl./Treat.MAEMethodType 1Treatment SVM CV_WindowSuperWide 2Treatment SVM CVSuperWide 3Control MLR 4Treatment SVM CV_Window 5Treatment SVM CV 6Control Theta ES Init 7Control ES Init 8Control ES Average 9Control MA 10Control Theta ES Average 11Control MLRSuperWide 12Control MA6 13Treatment RNNLMBR 14Treatment ANNLMBR 15Control ES20 16Treatment ANNBPCVSuperWide 17Treatment ANNLMBRSuperWide 18Treatment RNNBPCV 19Treatment ANNBPCV 20Control ARMA 21Control TR 22Control TR6

Results Discussion AI provides a lower forecasting error on average. (H1=Yes) AI provides a lower forecasting error on average. (H1=Yes) However, this is only because of the extremely poor performance of trend based forecasting However, this is only because of the extremely poor performance of trend based forecasting Traditional ranked better than AI. (H2=No) Traditional ranked better than AI. (H2=No) Extreme trend error has no impact on rank. Extreme trend error has no impact on rank. SVM Super Wide performed better than the best traditional (ES). (H3=Yes) SVM Super Wide performed better than the best traditional (ES). (H3=Yes) However, exponential smoothing was found to be the best and no non-super-wide AI technique reliably performed better. However, exponential smoothing was found to be the best and no non-super-wide AI technique reliably performed better.

Results SVM Super Wide details SVM Super Wide performed better than all others SVM Super Wide performed better than all others Isolated to SVM / Super Wide combination only Isolated to SVM / Super Wide combination only Other Super Wide did not reliably perform better than ES Other Super Wide did not reliably perform better than ES Other SVM models did not perform better than ES Other SVM models did not perform better than ES Dimensionality augmentation/reduction (non-linearity) is important Dimensionality augmentation/reduction (non-linearity) is important Super Wide SVM performed better than Super Wide MLR Super Wide SVM performed better than Super Wide MLR

Conclusion When unsure, us Exponential Smoothing it is the simplest and second best. When unsure, us Exponential Smoothing it is the simplest and second best. Super Wide SVM provides the best performance Super Wide SVM provides the best performance Cost-benefit analysis by a manufacturer should help decide if the extra effort is justified. Cost-benefit analysis by a manufacturer should help decide if the extra effort is justified. If implementations of this technique proves useful in practice, eventually it should be built into ERP systems. Since it may not be feasible to build for SME. If implementations of this technique proves useful in practice, eventually it should be built into ERP systems. Since it may not be feasible to build for SME.

Implications Useful for forecasting models which should include more information sources / more variables (Economic indicators, product group performances, marketing campaigns) because: Useful for forecasting models which should include more information sources / more variables (Economic indicators, product group performances, marketing campaigns) because: Super Wide = More observations Super Wide = More observations SVM+CV = Better Generalization SVM+CV = Better Generalization Not possible with short and noisy time series on their own. Not possible with short and noisy time series on their own.