Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure Advisor : Dr. Hsu Presenter : ching-wen Hong Author : Feng Yu Lin, Sally McClean

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline  Motivation  Objection  Classifier technique  The five steps of Data Mining: the SAS SEMMA methodology  Sampling  Data exploration  Data manipulation  Modelling and results  Assessment  Conclusion  My opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Due to recent changes in the world economy and as more firms, large or small, seem to fail now more than ever corporate failure prediction is of increasing importance. 

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objection  This paper uses a data mining approach to the prediction of corporate failure.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Classifier technique  The classifiers include numerous statistical methods and machine learning methods.  The statistical methods include discriminant analysis (DA) and logistic regression (LG).  The machine learning methods include artificial neural networks (NN) and decision tree method (C5.0).  Another one, the hybrid method is proposed by the author.

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 The five steps of Data Mining: the SAS SEMMA methodology  Sampling: The data sampling consists of company financial data from the UK.  Explore: Preprocessing of data  Manipulate: Feature selection  Model: The classification models used to the prediction of corporate failure  Assess: Comparison of prediction accuracy

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Sampling  The data sampling consists of company financial data from the UK. The financial data were accessed from Datastream/ICV.  The companies are divided into two groups:one is the failed companies group and the other is the nonfailed companies group.  This training sample consists of 690 nonfailed companies and 106 failed companies.(1980-1990)  The test dataset consists of 289 nonfailed companies and 48 failed companies.(1991-1999)

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Data exploration  Exploration or preprocessing of data is very important and is sometimes the most time- consuming part.  Exploration of data is included: (1)The data is presented in a ready to use state. (2)The preprocessing of missing data. (3)We must filter out the redundant records such as duplicated data. We decide to delete x719,x734,x735,x761,x766and x792, since these variables include too many missing values.

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Data manipulation  One main purpose of data manipulation is to feature selection.  We will use two feature selection methods. The first one is based on financial theory and human judgement( Feature selection І ), the second is based on ANOVA ( Feature selection Ⅱ ),.

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Feature selection І- human judgement based on financial theory  Laurent[21],Ezzamuel et al.[13] and Clarke[10] attempted to reduce the wide variety of financial ratios into several major groups (e.g. profitability, liquidity,etc), the suggestion being that a researcher need only select one ratio from each of the groups to obtain an indication of the companies overall performance.  The features selected are from these categories: return on capital employed(ROCE);Turnover to total assets employed(turnover/TA);capital gearing(CG);working capital(WC)ratio.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Feature selection Ⅱ - ANOVA statistical method  To classify failed companies and the nonfailed companies effectively, the predictors should be chosen to enable us to distinguish between these two groups.  We use ANOVA to select the variables that are significantly different between these two groups.  Table 3, the ANOVA output of the training set, we can get the variables indicated by *, which indicate a significantly different between the failed and nonfailed group.

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Modelling and results  The classification models used in this study are:  Statistics:DA,LG;  Machine learning methods:decision trees,NN.

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Hybrid models combining different classifiers  A hybrid method usually integrates two or more technologies. The purpose of integrating technologies is to strengthen the best features of each.  A hybrid method that combines the best features of several classification models is developed to increase the prediction performance.

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16  Our purpose is to minimize the number of wrong predictions: Min δ=∑ i=1 n (Ti*Oi) s.t. Oi=f(w 1 V i1 + w 2 V i2 + w 3 V i3 +…+ w m V im -θ),i=1,…,n n=the number of classifiers. m=the number of companies in the sample. Ti the actual outcome of ith company in the test sample. Oi the predicted outcome of ith company in the test sample based on the hybrid method Oi=1 if w 1 V i1 + w 2 V i2 + w 3 V i3 +…+ w m V im >θ, Oi=0 else Vij the predicted output of ith company in the test sample by classifier j. wj is the weight of classifier j,Θ is the threshold. *returns 1 if both sides of *are different, *returns 0 if both sides of *are same.

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17  The problem of finding such a combination of classifiers is to try to find (1)the combination of weight,(2)their respective classifiers and (3) the threshold θ such that the miss hit δ between actual output Ti and Oi is as small as possible.

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 The hybrid algorithm  1.Compte the total hit ratio (total accuracy) for the training sample itself for all m independent classifiers.( w 1, w 2,…, w m )  2.Take the outcome prediction Vij of classifier j for all the companies i in the test sample for j=1,…,m, i=1,…,n  3.Take all the population of classifiers, or subsets of it. Compute w 1 V i1 + w 2 V i2 + w 3 V i3 +…+ w m V im for each companies i in the test sample.

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19  4. if w 1 V i1 + w 2 V i2 + w 3 V i3 +…+ w m V im >θ,then Oi=1, else Oi=0. Adjust the parameter θ, where 0<θ< w 1 + w 2 + w 3 +…+ w m, such that the misclassification δ is as small as possible.

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20  We present three hybrid classifiers. Hybrid1-DA+LG+NN+C5.0 Hybrid2-DA+NN+C5.0 Hybrid3-LG+C5.0

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Assessment

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 23 Conclusions  For all the models,DA,LG,NN,C5.0 and hybrid classifiers, we found the ANOVA feature selection is better than human judgement feature selection except for DA.  The machine learning methods (NN and decision trees) show better performance than the statistical approach.  We present three hybrid classifiers:Hybrid1-DA+LG +NN+C5.0;Hybrid2-DA+NN+C5.0; Hybrid3- LG + C5.0.The empirical tests show that a hybrid classifiers produces higher prediction accuracy than individual classifiers.

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 24 My opinion  Advantage: A hybrid method that combines the best features of several classification models is developed to increase the prediction performance.  Disadvantage: (1) The hybrid algorithm is time-consuming in computing w 1 V i1 + w 2 V i2 + w 3 V i3 +…+ w m V im,i=1,…,n for subsets of classifiers. (2) The hybrid method is not good in the application.


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure."

Similar presentations


Ads by Google