1 Top 10 Algorithms in Data Mining Xindong Wu ( 吴信东 ) Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic University; 合肥工业大学计算机与信息学院.

Slides:



Advertisements
Similar presentations
Adjusting Active Basis Model by Regularized Logistic Regression
Advertisements

1 Data Mining: Opportunities and Challenges Xindong Wu University of Vermont, USA; Hefei University of Technology, China ( 合肥工业大学计算机应用长江学者讲座教授 )
CS583 – Data Mining and Text Mining
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Rakesh Agrawal Ramakrishnan Srikant
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Multi-dimensional Sequential Pattern Mining
Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.
New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College.
Machine Learning in R and its use in the statistical offices
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Slides are based on Negnevitsky, Pearson Education, Lecture 14 Data mining and knowledge discovery n Introduction, or what is data mining? n Data.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
An Overview of Our Course:
1 USTC, January 9, 2007 How to Write and Publish Research Papers for the Premier Forums in Knowledge & Data Engineering Xindong Wu Department of Computer.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: Concepts and Techniques — Chapter 1 — — Introduction —
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Data Mining Information Retrieval Web Search
August 29, 2015 Data Mining: Concepts and Techniques 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind.
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
1 Mining with Noise Knowledge: Error Awareness Data Mining Xindong Wu Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Overview of CS Class Jiawei Han Department of Computer Science
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
HOW TO HAVE A GOOD PAPER Tran Minh Quang. What and Why Do We Write? Letter Proposal Report for an assignments Research paper Thesis ….
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Introduction Peixiang Zhao.
Algorithms Emerging IT Fall 2015 Team 1 Avenbaum, Hamilton, Mirilla, Pisano.
February 1, 2016 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 1 — — Introduction — Prof. Jianlin Cheng Department.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Information Security Prof. David Zhang Department of Computing Hong Kong Polytechnic University Tel: PQ809
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016.
Slides related to: Data Mining: Concepts and Techniques — Chapter 1 and 2 — — Introduction and Data preprocessing — Jiawei Han and Micheline Kamber.
CS583 – Data Mining and Text Mining
Data Mining: Concepts and Techniques — Chapter 1 — — Introduction —
Term Project Proposal By J. H. Wang Apr. 7, 2017.
International Conference on Mathematical Modelling and Computational Methods in Science and Engineering, Alagappa University, Karaikudi, India February.
Large Graph Mining: Power Tools and a Practitioner’s guide
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining 101 with Scikit-Learn
Course Summary (Lecture for CS410 Intro Text Info Systems)
Data Mining: Concepts and Techniques (3rd ed
Machine Learning University of Eastern Finland
Beijing University of Technology, China, September 28, 2006
Special Topics in Data Mining Applications Focus on: Text Mining
شبکه های اجتماعی Social Networks
Exploring the Power of Links in Data Mining
Overview Advanced AI 1994 “AI” Turing Award Lectures AI and the Web
Algorithms for Data Analytics
Information retrieval and PageRank
Objectives Data Mining Course
Dept. of Computer Science University of Liverpool
CSCE 4143 Section 001: Data Mining Spring 2019.
Election Donor Records Linkage
Presentation transcript:

1 Top 10 Algorithms in Data Mining Xindong Wu ( 吴信东 ) Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic University; 合肥工业大学计算机与信息学院

2 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar “ Top 10 Algorithms in Data Mining ” by the IEEE ICDM Conference 1. The 3-step identification process identified candidates in 10 data mining topics 3. The top 10 algorithms 4. Follow-up actions

3 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar The 3-Step Identification Process 1. Nominations. ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners were invited in September 2006 to each nominate up to 10 best-known algorithms nEach nomination was asked to come with the following information: (a) the algorithm name, (b) a brief justification, and (c) a representative publication reference nEach nominated algorithm should have been widely cited and used by other researchers in the field, and the nominations from each nominator as a group should have a reasonable representation of the different areas in data mining nAll except one in this distinguished set of award winners responded.

4 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar The 3-Step Identification Process (2) 2.Verification. Each nomination was verified for its citations on Google Scholar in late October 2006, and those nominations that did not have at least 50 citations were removed. n18 nominations survived and were then organized in 10 topics. 3.Voting by the wider community. –(a) Program Committee members of KDD-06, ICDM '06, and SDM '06 and –(b) ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners –The top 10 algorithms are ranked by their number of votes, and when there is a tie, the alphabetic order is used.

5 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar Agenda 1. The 3-step identification process identified candidates (in 10 data mining topics) 3. The top 10 algorithms 4. Follow-up actions

6 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar 18 Identified Candidates n Classification –#1. C4.5: Quinlan, J. R C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. –#2. CART: L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, –#3. K Nearest Neighbours (kNN): Hastie, T. and Tibshirani, R Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI). 18, 6 (Jun. 1996), –#4. Naive Bayes: Hand, D.J., Yu, K., Idiot's Bayes: Not So Stupid After All? Internat. Statist. Rev. 69, n Statistical Learning –#5. SVM: Vapnik, V. N The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc. –#6. EM: McLachlan, G. and Peel, D. (2000). Finite Mixture Models. J. Wiley, New York. n Association Analysis –#7. Apriori: Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In VLDB '94. –#8. FP-Tree: Han, J., Pei, J., and Yin, Y Mining frequent patterns without candidate generation. In SIGMOD '00. n Link Mining –#9. PageRank: Brin, S. and Page, L The anatomy of a large-scale hypertextual Web search engine. In WWW-7, –#10. HITS: Kleinberg, J. M Authoritative sources in a hyperlinked environment. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1998.

7 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar 18 Candidates (2) n Clustering –#11. K-Means: MacQueen, J. B., Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, –#12. BIRCH: Zhang, T., Ramakrishnan, R., and Livny, M BIRCH: an efficient data clustering method for very large databases. In SIGMOD '96. n Bagging and Boosting –#13. AdaBoost: Freund, Y. and Schapire, R. E A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), n Sequential Patterns –#14. GSP: Srikant, R. and Agrawal, R Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the 5th International Conference on Extending Database Technology, –#15. PrefixSpan: J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In ICDE '01. n Integrated Mining –#16. CBA: Liu, B., Hsu, W. and Ma, Y. M. Integrating classification and association rule mining. KDD- 98. n Rough Sets –#17. Finding reduct: Zdzislaw Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Norwell, MA, n Graph Mining –#18. gSpan: Yan, X. and Han, J gSpan: Graph-Based Substructure Pattern Mining. In ICDM '02.

8 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar Agenda 1. The 3-step identification process identified candidates 3. The top 10 algorithms 4. Follow-up actions

9 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar The Top 10 Algorithms n #1: C4.5, presented by Hiroshi Motoda n #2: K-Means, presented by Joydeep Ghosh n #3: SVM, presented by Qiang Yang n #4: Apriori, presented by Christos Faloutsos n #5: EM, presented by Joydeep Ghosh n #6: PageRank, presented by Christos Faloutsos n #7: AdaBoost, presented by Zhi-Hua Zhou n #7: kNN, presented by Vipin Kumar n #7: Naive Bayes, presented by Qiang Yang n #10: CART, presented by Dan Steinberg

10 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar Agenda 1. The 3-step identification process identified candidates 3. The top 10 algorithms 4. Follow-up actions

11 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar Open Votes for Top Algorithms n Top 3 Algorithms: –C4.5: 52 votes –SVM: 50 votes –Apriori: 33 votes n Top 10 Algorithms –The top 10 algorithms voted from the 18 candidates at the panel are the same as the voting results from the 3-step identification process.

12 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar Follow-Up Actions n A survey paper on Top 10 Algorithms in Data Mining (X. Wu, V. Kumar, J.R. Quinlan, et al., Knowledge and Information Systems, 14(1), 2008, 1~37) –Written by the original authors and presenters n How to make a good use of these top 10 algorithms? –Curriculum development –A textbook on The Top 10 Algorithms in Data Mining, Chapman and Hall/CRC Press, April 2009 n Various questions on these 10 algorithms? –Why not this algorithm or that topic? n Will the votes change in the future? –Sure, let’s work together to make positive changes!