Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Similar presentations


Presentation on theme: "Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology."— Presentation transcript:

1 Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

2 Data Mining Resources on the Web 1. A comprehensive site for many resources of KDD http://www.kdnuggets.com/ 2. tutorial type articles on currently hot topics http://www.sigkdd.org/ 3. The KDD Cup(1997~2010) http://www.sigkdd.org/kddcup/index.php 4, UCI Dataset http://archive.ics.uci.edu/ml/ 5. Conferences, Journals, and Organizations SIGKDD,ICDM,SIGMOD,SDM,PAKDD IEEE Transactions on Knowledge and Data Engineering Data Mining Group

3 Tools Clementine Clementine is a platform of data mining developed by ISL (Integral Solutions Limited) company. SPSS company integrated and developed Clementine after purchasing the ISL company in 1999. Now Clementine has become another highlight of SPSS company. Merger and acquisition of IBM and SPSS happened in 2010 It is a data mining and text analytics workbench used to build predictive models. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming. data miningtext analytics predictive models

4 Tools Clementine

5 Workflow1

6 Dataset1 1.Led7 1.attribute#1, attribute#2, ….. attribute#7, label 2.3200 instance 3.All attribute values are either 0 or 1 4.Whether the corresponding light is on or not for the decimal digit

7 Load the file

8 Operations

9 Partitions

10 C5.0

11 View the model

12 Model analysis

13 CHAID

14 View model

15 Dataset2 Listing of attributes: label: >50K, <=50K. Age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country

16 Flow

17 Setting

18 Partitions

19 C5.0 Analysis

20 CHAID Analysis

21 Data cleaning

22 Partition Flow

23 C5.0 and CHAID

24

25 Programming – Use C4.5 or Bayes classifier – Dataset

26 Programming Compare your result with the tool.

27 Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology


Download ppt "Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology."

Similar presentations


Ads by Google