Download presentation

Presentation is loading. Please wait.

Published byKasey Burtis Modified over 2 years ago

1
Analytical Benchmarking Meets Data Mining: The SmartDEA Framework, SmartDEA Software, and Case Studies for Industry Gürdal Ertek 1 Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday

2
Istanbul, Turkey Singapore 2

3
Young, high-profile private University Outskirts of Istanbul, Turkey First students accepted in

4
Established by the Sabanci Foundation 4

5
Sabancı Group 5

6
Sabancı Family: Sakıp Sabancı, Güler Sabancı,

7
~3000 undergrad & ~500 grad students 7

8
Highest research income per faculty member among Turkish universities 8

9
Young, high-profile private University Established by the Sabanci Foundation Sabancı Group Sabancı Family: Sakıp Sabancı, Güler Sabancı, 200+ First students accepted in 1999 ~3000 undergrad & ~500 grad students Highest research income per faculty member 9

10
Dr. Gürdal Ertek Assistant Professor at Sabancı University, Istanbul, Turkey, since 2002 Ph.D. from School of Industrial and Systems Georgia Institute of Technology, Atlanta, GA, USA Research areas include – warehousing & material handling – data visualization & data mining 10

11
Analytical Benchmarking Meets Data Mining: The SmartDEA Framework, SmartDEA Software, and Case Studies for Industry 11 Gürdal Ertek Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday

12
Motivation Analytical Benchmarking – application of mathematics and computation based methods for benchmarking a group of entities – aims at developing objective and automated methods of benchmarking. Overwhelming majority of literature focuses on – developing new benchmarking methodologies An important aspect forgotten: – post-analysis of the benchmarking results 12

13
Motivation Data Mining – growing field of computer science – aims at discovering the hidden patterns and coming up with actionable insights. Overwhelming majority of literature focuses on – developing more efficient and effective computational algorithms. Important aspects not drawing deserved attention: – the quest for practical actionable knowledge – data mining can be used for post-analysis of results of other methodologies & algorithms 13

14
This Seminar Goals – SmartDEA Solver framework for integrating analytical benchmarking with data mining – How DEA results should be structured – Meaningful interpretation of DEA results Case study applications – Automotive – Wind energy – Apparel retail 14

15
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA) How can DEA & information visualization be used together? (Case Study 1) Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2) How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3) 15

16
Presentation Contents Background on Data Envelopment Analysis (DEA) SmartDEA framework Case Studies – Automotive – Wind Energy – Apparel Retail 16

17
Background 17

18
Sample DEA Analysis 18

19
Data Envelopment Analysis (DEA) Data Entities = DMUs (n DMUs) Comparison of DMUs Inputs and outputs (m inputs, s outputs) Results Efficiency score between 0 and 1 Reference sets Projections 19

20
Basic DEA Models 20

21
Basic DEA Models CRR-Input model CRR-Output model 21

22
Basic DEA Models BCC-Input model BCC-Output model 22

23
Basic DEA Models 23

24
Analyzing the solutions of DEA through information visualization and data mining techniques: SmartDEA Framework Alp Eren Akcay, Gürdal Ertek, Gulcin Buyukozkan Gurdal Ertek 24

25
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA) How can DEA & information visualization be used together? Which visualization techniques are appropriate for analyzing DEA results? How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? 25

26
Goal To build a framework for making analytical benchmarking and performance evaluations To design and develop a convenient DEA software, SmartDEA 26

27
Contribution To develop a general framework To help DEA analysts to generate important and interesting insights systematically To integrate the results for information visualization techniques 27

28
Framework Integration of DEA results with data mining and information visualization 28

29
Proposed framework 1.integrates data mining and information visualization with DEA, 2.generates clean data for mining (data auditing at the DEA modeling stage), 3.allows the incorporation of other data into the process, 4.can accommodate multiple DEA models within same analysis. 29

30
Notation 30

31
Notation 31

32
Notation 32

33
Notation 33

34
Notation 34

35
Notation 35

36
Notation 36

37
Notation 37

38
Notation 38

39
SmartDEA: the developed software 39

40
Modeling Process C# language Results in file format of MS Excel Imported data requires a certain format 40

41
Modeling Process 1- Importing Excel File: – Data requires a certain format 41

42
Modeling Process 2- Selecting the spreadsheet: 42

43
Modeling Process 3- Constructing the model: 43

44
Modeling Process 4-Selecting the DEA Model: 44

45
Modeling Process 5- Solving and generating the solution file: 45

46
46

47
Case Study 1: Integrating DEA with Information Visualization for Benchmarking Dealers in the Automotive Industry Dr. Gürdal Ertek, Tuna Çaprak

48
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? How can DEA & information visualization be used together? (Case Study 1) Which visualization techniques are appropriate for analyzing DEA results? How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? 48

49
A New Approach for Benchmarking and Managing TOFAŞ Dealers Tuna Çaprak Leaders for Industry Program 07-08, Sabancı University Gürdal Ertek, Ph.D. Faculty of Engineering and Natural Sciences, Sabancı University Tuna Çaprak Leaders for Industry Program 07-08, Sabancı University Gürdal Ertek, Ph.D. Faculty of Engineering and Natural Sciences, Sabancı University 49

50
A New Approach for Benchmarking and Managing TOFAŞ Dealers 50

51
Data Envelopment Analysis (DEA) Benchmark Independent Decision Making Units (DMUs) Express Efficiency with a Single Score Between 0 and 1 Consider Multidimensional Input / Output Relations 51

52
Information Visualization (InfoViz) Reveal Hidden Structures Derive Actionable Insights Identify Patterns 52

53
Information Visualization (InfoViz) Reveal Hidden Structures Derive Actionable Insights Identify Patterns Develop Competitive Strategies Develop Competitive Strategies 53

54
Model 1: Measuring Efficiency 54

55
Model 1: Measuring Efficiency I N P U T S O U T P U T Dealer Expenses Spare Parts Area No of Employees Revenue (Total) Dealer DMU 55

56
Model 2: Measuring Efficiency for TOFAŞ 56

57
Model 2: Measuring Efficiency for TOFAŞ Amount Purchased from TOFAŞ (YTL) I N P U T S O U T P U T Dealer Expenses Spare Parts Area No of Employees Dealer DMU 57

58
Other Data of Interest on DMUs Share of TOFAŞ IsRentEstimated Cities No of Services 58

59
ANALYSIS and DISCUSSIONS 59

60
Visualization of results Miner 3D 60

61
Visualization of results Omniscope 61

62
Future Work Further Data Analysis Technical Report and Paper Incorporation of City Growths 62

63
Special Thanks to … Prof. Muhittin Oral Hasan Erdoğan 63 Sinan Südütemiz

64
Case Study 2: Insights into the Efficiencies of On-Shore Wind Turbines: A Data-Centric Analysis Dr. Gürdal Ertek, Murat Mustafa Tunç Ece Kurtaraner, Doğancan Kebude

65
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? How can DEA & information visualization be used together? Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2) How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? 65

66
Outline Wind Turbines Our Study – Methodology : Data Envelopment Analysis (DEA) Visual Data Analysis Hypothesis Testing – Analysis and Results – Insights 66

67
Wind Turbines Mechatronic devices that convert wind energy into electrical energy via mechanical energy. Features: Diameter Air dynamics Tower height Controlling devices Location (On-shore / Off-shore) 67

68
Importance of Wind Turbines Green Energy Worldwide installed wind power capacity – In 1990: 2,160 MW – In 2011: 238,351 MW (Global Wind Energy Council) 16% of Europes electricity by 2020 (The European Wind Energy Association) 68

69
Wind Energy in Turkey 40 GW wind energy potential in next 20 years 69 Image Source:

70
Wind Energy in Turkey MİLRES: 500kW wind turbine to be designed and made in Turkey, – In 2013 output of 500 kW – In 2015 output of 2 MW – Largest budget civilian R&D project in the history of the Turkish Republic 70

71
Our Study Technical data of wind turbines are collected and analysed by following methodologies: Data Envelopment Analysis (DEA) Visual Data Analysis Hypothesis Testing Aim: Decision of the efficient wind turbines Understanding of how to make an unefficient turbine efficient by referencing the efficient ones Benchmarking of commercial wind turbines visually and statistically. 71

72
Literature First example of: – Benchmarking of commercial wind turbines – Visualisation as a directed graph of reference sets in DEA results Use of DEA and visualization together: – Ertek et al. (2007) Benchmarking the Turkish apparel retail industry. – Ulus et al. (2006) Financial benchmarking of transportation companies in the New York Stock Exchange (NYSE). 72

73
Efficiency comparision of Decision Making Units (DMU) according to – Inputs (lower) – Outputs (higher) For each DMU – Efficiency score (between 0 and 1) – Reference sets – Projections 73 Methodologies Data Envelopment Analysis

74
Methodologies Visual Data Analysis To distinguish different patterns in data and achieve new and useful insights. (Keim, 2002) Orange Canvas (software) – Scatter plot Miner 3d (software) – Surface plot 74

75
Database 1. Vestas (Denmark) 2. Sinovel (China) 3. Goldwind (China) 4. Gamesa (Spain) 5. Enercon (Germany) 6. GE (USA) 7. Suzlon (India) 8. Guodian (China) 9. Siemens (Germany) 10. Ming Yang (China) 75 Top 10 companies in worldwide market share

76
DEA Model 76 Model A : 74 on-shore wind turbine models Model B : 32 on-shore wind turbine models (low-wind) Inputs: - Diameter (m) - Nominal wind speed (m/s) Outputs: - Nominal Output (V) Other features: -Cut-in wind speed (low/medium/high) -Company

77
DEA Model BCC Output Oriented Smart DEA Solver software Developed in Sabancı University Reads data from MS Excel and generate results Visual analysis with Orange Canvas and Miner3D using efficiency scores 77

78
Analysis and Results 78

79
1 - Efficiency vs Companies 79

80
2 - Efficiency vs Nominal Output 80

81
3 - Efficiency vs Cut-in Wind Speed 81

82
4 - Efficiency vs Diameter 82

83
5 - Reference Analysis Which efficient turbine models should inefficient ones take as references? – X axis: Efficient turbine model that should taken as reference – Y axis: DMU name – Size of circle: Weight of reference 83

84
84

85
6 - Reference sets for Model B with yEd software 85

86
7 - Projection Analysis At which percentage should the models change their inputs and outputs to become efficient? – X-axis : Percentage change – Y-axis : Efficiency – Colors: Inputs and outputs 86

87
87

88
8 - Miner 3D Surface Plot Analysis 88

89
Miner 3D Surface Plot Analysis

90
Insights Efficiency according to companies: – Enercon and GE are the most efficient companies – The efficiencies of turbines of Goldwind, Ming Yang, Mitsubishi and Siemens are under 60% Efficiency according to nominal output: – Lower or higher values of nominal output is not effect efficiency – But, outputs around 1.5 MW have higher efficiencies 90

91
Insights Efficiency according to cut-in wind speed: – 2 and 2.5 m/s have lower; 3, 3.5 and 4 m/s have higer number of models – 3 m/s and over have higher efficiency scores compared to 2 and 2.5 m/s Efficiency according to diameter: – Model with the smallest diameter is the most efficient turbine – Efficiency score of models with diameter between 70m and 85m are higher than expected 91

92
Insights Reference analysis: – DMUs 15, 20, 27, 61, 81 are the ones that taken as a reference at most Projection analysis: – Some of the models should both decrease inputs and increase outputs to become efficient – For most of the models its enough to increase outputs Miner 3D surface plot analysis: – Input and outputs parameters of the models in light colored regions are ideal for higher efficiency 92

93
Hypothesis Testing Kruskal – Wallis Test confirmed that: – Efficiency scores and cut-in wind speed is significantly different depending on the companies. 93

94
References Cooper, W. W., Seiford, L. M., Tone, K. (2006), Introduction to Data Envelopment Analysis and its Uses, Springer, New York. Ertek, G., Can, M.A., Ulus, F. (2007) Benchmarking the Turkish apparel retail industry through data envelopment analysis (DEA) and data visualization. In: EUROMA th International Annual EurOMA Conference: Managing Operations in an Expanding, Ankara, Turkey Keim, D. A. (2002), Information visualization and data mining, IEEE Transactions on Visualization and Computer Graphics, Vol.8, No.1, pp Ulus, Firdevs and Köse, Özlem and Ertek, Gürdal and Şen, Simay (2006)Financial benchmarking of transportation companies in the New York Stock Exchange (NYSE) through data envolopment analaysis (DEA) and Visulation. In: 4th International Logistics and Supply Chain Congress, İzmir, Turkey, İzmir Weill, L. (2004), Measuring cost efficiency in European banking: a comparison of frontier techniques, Journal of Productivity Analysis, Vol.21, No.2, pp

95
Q&A Dr. Gürdal Ertek Murat Mustafa Tunç Ece Kurtaraner Doğancan Kebude 95

96
Case Study 3: Re-Mining Association Mining Results Through Visualization, Data Envelopment Analysis, and Decision Trees Gurdal Ertek, Murat Mustafa Tunc 96

97
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? How can DEA & information visualization be used together? Which visualization techniques are appropriate for analyzing DEA results? How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3) 97

98
Book Chapter Published in Computational Intelligence Applications in Industrial Engineering – A book edited by Prof. Cengiz Kahraman – Published by Atlantis & Springer 98

99
Outline Introduction Literature Methodology Case Study – Data Analysis – Data Visualization – Data Envelopment Analysis – Decision Trees – Classification Conclusion 99

100
Introduction How the results of association mining analysis further analyzed using – Data visualization – Data Envelopment Analysis (DEA) – Decision Trees Visual Re-Mining of an item considering both – Positive assocations – Negative associations 100

101
Association Mining Inputs: – Transaction data that contains a subset of items Outputs: – List of item-set that appear together frequently Primary metrics: – Support is the percentage of transactions that the items appear in – Confidence is the conditional probability that item B appearing in transaction given that item A readily appears 101

102
Association Mining 102 A classical application is market basket analysis

103
Graph Visualization Refers to the drawing of graphs, that consists – Nodes – Arcs – Special algorithms In order to obtain actionable insights 103

104
Re-mining Mining of a newly formed data constructed upon the results of data mining process The goal is – to obtain new insights that couldnt have been discovered otherwise, and – to characterize, describe, and explain the results of the original data mining process 104

105
Data Envelopment Analysis Benchmark a group of entities through efficient scores Entities are called Decision Making Units (DMUs) Efficiency score increases, if – DMU generates higher output using same input, or – DMU uses less input for the same output 105

106
Graph Metrics 106 Degree shows the number of connections Betweenness centrality represents total number of shortest paths Closeness centrality shows the distance between the node and every other node Eigenvector centrality shows the distance between the node and every other special node

107
Graph Metrics 107 Page rank is the value that increases if node is closely related with special nodes Clustering coefficient represents the tendency of aggregation for several nodes

108
Decision Trees Main goal: To identify the nodes that differs considerably from its root node Each node is split (branced) according to a criterion Our study uses ID3 algorithm Branches are created in Orange software 108

109
Classification 109 Dataset is divided into two groups, namely learning dataset and test dataset Classification algorithms are called learners – Naive Bayes – k-Nearest Neighbor (kNN) – C4.5 – Support Vector Machines (SVM) – Decision Trees The prediction success of each learner is measured through classification accuracy (CA)

110
Methodology Perform positive association mining 2.Find negatively association item pairs from 1. 3.Compute the percentage of positive associations 4.Construct two association graphs, (1) shows only positive assoc., (2) shows only negative 5.Compute graph metrics for each node

111
Methodology Construct the dataset for re-mining 7.Apply grid layout for graphs, then visually analyze them. 8.Construct a DEA model, to combine the insights and to find the most important items 9.Construct a classification model and decision trees 10. Apply multiple learners and evaluate classification accuracy

112
Case Study Based on real company data in apperal retail industry – Merchandise group in men clothes line – 2007 season 112

113
Case Study Company headquartered in Istanbul – 300+ stores in Turkey – 30+ stores in more than 10 countries 113

114
Case Study As of Nov. 2010, the U.S. retail industry exceeded $377.5 billion 114

115
Data Analysis Step 1: Positive association mining – Min. support value : 100 – Result: 3930 frequent item pairs involving 538 items Step 2: Negative association mining – Result: 2433 item pairs involving 537 items Step 3: Percentage of positive associations of each item 115

116
Data Analysis Step 3: Percentage of positive associations of each item 116

117
Data Analysis 117 Step 4: Positive and negative association graphs

118
Data Analysis Step 5: Graph metrics were computed using NodeXL add-in for MS Excel Step 6: Dataset formed for re-mining – Each row is item involding positive association – Columns include unique item number support count (SupC) StartWeek EndWeek LifeTime 118 MaxPrice MinPrice PriceDiff MerchSubGroup Category PercOfPositiveAssoc Graph Metrics

119
Data Visualization Step 7: Grid layout applied for visualization Color denotes PercOfPositiveAssoc – Lighter items are mostly negative associated – Darker items are mostly positive associated 119

120
Data Visualization 120

121
Data Visualization Second graph: – Node size represents end-of-season sales prices (MinPrice) – Larger nodes denote higher MinPrice (more typically high-priced items) – Smaller nodes denote lower MinPrice 121

122
Data Visualization 122

123
Data Visualization Third graph: – Node shape represents category We want to answer if the items have a particular category type – Upper left region – Darker nodes – Larger nodes 123

124
Data Visualization 124

125
Data Envelopment Analysis (DEA) To analytically integrate the insights found in visualizations above Input: – Uniform for each item Output: – Support Count (SupC) – PercOfPositiveAssoc – MinPrice Output oriented BCC model 125

126
Data Envelopment Analysis ItemEff1*Eff2**Input_AuxiliaryInput_LifeTimePercOfPositiv eAssoc SupCMinPrice 059Yes Yes NoYes NoYes Yes NoYes NoYes OUTPUTINPUTOUPUTINPUT

127
Conclusions Our methodology combines – Association mining – Graph theory – Classification – Data Envelopment Analysis – Re-mining Positive associations are related to graph metric values and items attributes 127

128
References A. Demiriz, G. Ertek, T. Atan and U. Kula, Re-mining item associations: Methodology and a case study in apparel retailing, Decision Support Systems, 52(1), pp (2011). J.R. Quinlan,Induction of decision trees, Machine Learning, 1(1), pp (1986). Orange. E.Alpaydin, Introduction to Machine Learning,The MIT Press(2010). A. Demiriz, G. Ertek, T. Atan and U. Kula, Re-mining item associations: Methodology and a case study in apparel retailing, Decision Support Systems, 52(1), pp (2011). E.M.Bonsignore, C. Dunne, D.Rotman, M. Smith, T. Capone, D.L. Hansen andB. Shneiderman, First Steps to NetViz Nirvana: Evaluating Social Network Analysis with NodeXL,inInternational Symposium on Social Intelligence and Networking (2009). R. Agrawal, T. Imielinski and A.N. Swami, Mining association rules between sets of items in large databases,in SIGMOD Conference,P. Buneman and S.Jajodia, (Eds) (1993). 128

129
References NodeXL. A.E. Akcay, G. Ertek and G. Buyukozkan, Analyzing the solutions of DEA through information visualization and data mining techniques: SmartDEA framework, Expert Systems with Applications (2012). R.D. Banker, A. Charnesand W.W. Cooper, Some models for estimating technical and scale inefficiencies in data envelopment analysis,Management Science. 30(9), pp. 1078–1092. (1984). G. Ertek and A. Demiriz, A framework for visualizing association mining results, Lecture Notes in Computer Science (LNCS), 4263, pp (2006) G. Ertek, M. Kaya, C.Kefeli, O. Onurand K. Uzer, Scoring and Predicting Risk Preferences,in Behavior Computing: Modeling, Analysis, Mining and Decision, Cao, L., Yu, P. S. (Eds), Springer(2012). C. Borgeltand R. Kruse, Graphical models: methods for data analysis and mining, Wiley (2002). E.N. Cinicioglu, G. Ertek, D. Demirerand H.E. Yoruk,A framework for automated association mining over multiple databases, in Innovations in Intelligent Systems and Applications (INISTA), International Symposium, IEEE, (2011). 129

130
References A. Savasere, E. Omiecinski and S. Navathe, Mining for strong negative associations in a large database of customer transactions, in Data Engineering, Proceedings., 14th International Conference, IEEE (1998). P.N. Tan, V. Kumar and H.Kuno, in Western Users of SAS Software Conference (2001). I. Herman, G. Melanconand M.S. Marshall, Graph visualization and navigation in information visualization: A survey, Visualization and Comp. Graphics, 6 (2000) M. Van Kreveld and B. Speckmann, Graph Drawing,Lecture Notes in Computer Science (LNCS), 7034 (2012). R. Spence, Information Visualization, ACM Press (2001). H. Ltifi, B. Ayed, A.M. Alimiand S. Lepreux,Survey of information visualization techniques for exploitation in KDD, in Int. Conf. Comp. Sys.and App.(2009). C. Chen, Information Visualization, Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010). W.W. Cooper, L.M. Seiford and K. Tone, Introduction to Data Envelopment Analysis and Its Uses: With DEA Solver Software and References,Springer (2006). S. Gattoufi, M. Oral and A. Reisman, Data envelopment analysis literature: A bibliography update ( ), Journal of Socio-Econ. Planning Sci., 38, pp (2004). 130

131
Analytical Benchmarking Meets Data Mining: The SmartDEA Framework, SmartDEA Software, and Case Studies for Industry 131 Gürdal Ertek Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday

132
Research Questions How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA) How can DEA & information visualization be used together? (Case Study 1, Automative) Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2, Wind) How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3, Apparel Retail) 132

133
Questions? 133

134
Thank you Terima Kasih Teşekkürler :-) 134

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google