Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau
Overview Background Background Research Question Research Question Data Sources Data Sources Methodology Methodology Implementation Implementation Results Results Conclusion Conclusion
Background Information distortion in the supply chain Difficult for manufacturers to forecast
Current solutions Exponential Smoothing Exponential Smoothing Moving Average Moving Average Trend Trend Etc.. Etc.. Wide range of software forecasting solutions Wide range of software forecasting solutions M3 Competition research tests most forecasting solutions and finds the simplest work best M3 Competition research tests most forecasting solutions and finds the simplest work best
Artificial Intelligence Universal Approximators Universal Approximators Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Recurrent Neural Networks (RNN) Recurrent Neural Networks (RNN) Support Vector Machines (SVM) Support Vector Machines (SVM) Theorectically should be able to match or outperform any traditional forecasting approach. Theorectically should be able to match or outperform any traditional forecasting approach.
Neural Networks Learns by adjusting weights of connections Learns by adjusting weights of connections Based on empirical risk minimization Based on empirical risk minimization Generalization can be improved by: Generalization can be improved by: Cross Validation based early stopping Cross Validation based early stopping Levenberg-Marquardt with Bayesian Regularization Levenberg-Marquardt with Bayesian Regularization
Support Vector Machine Learns be separating data in a different feature space with support vectors Learns be separating data in a different feature space with support vectors Feature space can often be a higher or lower dimensionality space than the input space Feature space can often be a higher or lower dimensionality space than the input space Based on structural risk minimization Based on structural risk minimization Optimality guaranteed Optimality guaranteed Complexity constant controls the power of the machine Complexity constant controls the power of the machine
Support Vector Machine CV 10-fold Cross Validation based optimization of Complexity Constant 10-fold Cross Validation based optimization of Complexity Constant More effective than NN because of guaranteed optimality More effective than NN because of guaranteed optimality
SVM Complexity Example SVM Complexity Constant optimization based on 10-Fold Cross Validation SVM Complexity Constant optimization based on 10-Fold Cross Validation
Research Question For a manufacturer at the end of the supply chain who is subject to demand distortion: For a manufacturer at the end of the supply chain who is subject to demand distortion: H1: Are AI approaches better on average than traditional approaches (error) H1: Are AI approaches better on average than traditional approaches (error) H2: Are AI approaches better than traditional approaches (rank) H2: Are AI approaches better than traditional approaches (rank) H3: Is the best AI approach better than the best traditional H3: Is the best AI approach better than the best traditional
Data Sources 1. Chocolate Manufacturer (ERP) 2. Toner Cartridge Manufacturer (ERP) 3. Statistics Canada Manufacturing Survey
Methodoloy Experiment Experiment Using top 100 from 2 manufacturers and random 100 from StatsCan Using top 100 from 2 manufacturers and random 100 from StatsCan Comparison based on out-of-sample testing set Comparison based on out-of-sample testing set
Implementation Experiment programmed in MATLAB Experiment programmed in MATLAB Using existing toolbox where possible (eg, NN, ARMA, etc) Using existing toolbox where possible (eg, NN, ARMA, etc) Programming missing ones Programming missing ones SVM implemented using mySVM called from MATLAB SVM implemented using mySVM called from MATLAB
Experimental Groups CONTROL GROUP Traditional Techniques TREATMENT GROUP Artificial Intelligence Techniques Moving Average Trend Exponential Smoothing Theta Model (Assimakopoulos & Nikolopoulos 1999) Auto-Regressive and Moving Average (ARMA) (Box and al. 1994) Multiple Linear Regression (Auto- Regressive) Neural Networks Recurrent Neural Networks Support Vector Machines
Super Wide model Time series are short Time series are short Very noisy because of supply chain distortion Very noisy because of supply chain distortion Super Wide model combined data from many products Super Wide model combined data from many products Much larger amount of data to learn from Much larger amount of data to learn from Assumes similar patterns occur in the group of products. Assumes similar patterns occur in the group of products.
Result Table (Chocolate) RankCntrl./Treat.MAEMethodType 1Treatment SVM CV_WindowSuperWide 2Treatment SVM CVSuperWide 3Control MLRSuperWide 4Treatment ANNBPCVSuperWide 5Control ES Init 6Control ES20 7Control Theta ES Init 8Control MA6 9Control MA 10Control ES Avg 11Control Theta ES Average 12Control MLR 13Treatment ANNLMBRSuperWide 14Treatment RNNLMBR 15Treatment ANNLMBR 16Treatment SVM CV 17Treatment SVM CV_Window 18Treatment ANNBPCV 19Treatment RNNBPCV 20Control ARMA 21Control TR 22Control TR6
Results Table (Toner) RankCntrl./Treat.MAEMethodType 1Treatment SVM CVSuperWide 2Treatment SVM CV_WindowSuperWide 3Control ES20 4Control MA6 5Control ES Init 6Treatment SVM CV_Window 7Control MA 8Control MLRSuperWide 9Treatment SVM CV 10Control Theta ES Init 11Control ES Avg 12Control Theta ES Average 13Control MLR 14Treatment ANNLMBRSuperWide 15Treatment RNNBPCV 16Treatment RNNLMBR 17Treatment ANNLMBR 18Treatment ANNBPCV 19Treatment ANNBPCVSuperWide 20Control ARMA 21Control TR 22Control TR6
Results Table (StatsCan) RankCntrl./Treat.MAEMethodType 1Treatment SVM CV_WindowSuperWide 2Treatment SVM CVSuperWide 3Control MLR 4Treatment SVM CV_Window 5Treatment SVM CV 6Control Theta ES Init 7Control ES Init 8Control ES Average 9Control MA 10Control Theta ES Average 11Control MLRSuperWide 12Control MA6 13Treatment RNNLMBR 14Treatment ANNLMBR 15Control ES20 16Treatment ANNBPCVSuperWide 17Treatment ANNLMBRSuperWide 18Treatment RNNBPCV 19Treatment ANNBPCV 20Control ARMA 21Control TR 22Control TR6
Results Discussion AI provides a lower forecasting error on average. (H1=Yes) AI provides a lower forecasting error on average. (H1=Yes) However, this is only because of the extremely poor performance of trend based forecasting However, this is only because of the extremely poor performance of trend based forecasting Traditional ranked better than AI. (H2=No) Traditional ranked better than AI. (H2=No) Extreme trend error has no impact on rank. Extreme trend error has no impact on rank. SVM Super Wide performed better than the best traditional (ES). (H3=Yes) SVM Super Wide performed better than the best traditional (ES). (H3=Yes) However, exponential smoothing was found to be the best and no non-super-wide AI technique reliably performed better. However, exponential smoothing was found to be the best and no non-super-wide AI technique reliably performed better.
Results SVM Super Wide details SVM Super Wide performed better than all others SVM Super Wide performed better than all others Isolated to SVM / Super Wide combination only Isolated to SVM / Super Wide combination only Other Super Wide did not reliably perform better than ES Other Super Wide did not reliably perform better than ES Other SVM models did not perform better than ES Other SVM models did not perform better than ES Dimensionality augmentation/reduction (non-linearity) is important Dimensionality augmentation/reduction (non-linearity) is important Super Wide SVM performed better than Super Wide MLR Super Wide SVM performed better than Super Wide MLR
Conclusion When unsure, us Exponential Smoothing it is the simplest and second best. When unsure, us Exponential Smoothing it is the simplest and second best. Super Wide SVM provides the best performance Super Wide SVM provides the best performance Cost-benefit analysis by a manufacturer should help decide if the extra effort is justified. Cost-benefit analysis by a manufacturer should help decide if the extra effort is justified. If implementations of this technique proves useful in practice, eventually it should be built into ERP systems. Since it may not be feasible to build for SME. If implementations of this technique proves useful in practice, eventually it should be built into ERP systems. Since it may not be feasible to build for SME.
Implications Useful for forecasting models which should include more information sources / more variables (Economic indicators, product group performances, marketing campaigns) because: Useful for forecasting models which should include more information sources / more variables (Economic indicators, product group performances, marketing campaigns) because: Super Wide = More observations Super Wide = More observations SVM+CV = Better Generalization SVM+CV = Better Generalization Not possible with short and noisy time series on their own. Not possible with short and noisy time series on their own.