SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ.

Slides:



Advertisements
Similar presentations
SLIQ and SPRINT for disk resident data
Advertisements

C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Mining High-Speed Data Streams
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Scalable Classification Robert Neugebauer David Woo.
Chapter 7 – Classification and Regression Trees
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Lecture Notes for Chapter 4 Introduction to Data Mining
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
1 Decision Tree Classification Tomi Yiu CS 632 — Advanced Database Systems April 5, 2001.
Classification and Prediction
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification II.
B+ - Tree & B - Tree By Phi Thong Ho.
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
B + -Trees (Part 2) COMP171. Slide 2 Review: B+ Tree of order M and of leaf size L n The root is either a leaf or 2 to M children n Each (internal) node.
… 907 … 011Train… 012Doll 106Car 200… … … Index File Data File (TOY) Blocking factor:
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification supplemental. Scalable Decision Tree Induction Methods in Data Mining Studies SLIQ (EDBT’96 — Mehta et al.) – builds an index for each.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
SPRINT : A Scalable Parallel Classifier for Data Mining John Shafer, Rakesh Agrawal, Manish Mehta.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
CS690L Data Mining: Classification
L6. Learning Systems in Java. Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Lecture Notes for Chapter 4 Introduction to Data Mining
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Analyzing Stock Quotes using Data Mining Techniques Name of Student: To Yi Fun University Number: First Presentation, Final Year Project, 2013.
CS4445/B12 Provided by: Kenneth J. Loomis. genrecritics-reviewsratingIMAXlikes comedythumbs-upRFALSEno comedythumbs-upRTRUEno comedyneutralRFALSEno actionthumbs-downPG-13TRUEno.
1 Decision Trees. 2 OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
SLIQ and SPRINT for disk resident data. Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Decision Tree Ivan Nikolić 15/3241 Prof. Dr. Veljko Milutinović Pronalaženje skrivenog znanja Beograd, decembar 2015.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Classification and Prediction
DECISION TREES An internal node represents a test on an attribute.
Decision Trees.
Processing Data in External Storage
External Methods Chapter 15 (continued)
Data Mining Classification: Basic Concepts and Techniques
Introduction to Data Mining, 2nd Edition by
Classification by Decision Tree Induction
12/2/2018.
Random inserting into a B+ Tree
Differential Privacy (2)
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Statistical Learning Dong Liu Dept. EEIS, USTC.
Decision Tree  Decision tree is a popular classifier.
Decision Tree  Decision tree is a popular classifier.
Arko Barman COSC 6335 Data Mining
Presentation transcript:

SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ

SLIQ (SUPERVISED LEARNING IN QUEST) Decision-tree classifier for data mining Design goals: Able to handle large disk-resident training sets No restrictions on training-set size 2/11

BUILDING TREE MakeTree(Training Data T) Partition(T) END_MakeTree Partition(Data S) if(all points in S are in the same class) return; Evaluate Splits for each attribute A; Use best split to partition S into S1 and S2; Partition(S1); Partition(S2); END_Partition 3/11

EVALUATING SPLIT POINTS 4/11

PRE-SORTING 5/11 Before we start to build a tree we need to sort data

FINDING SPLIT POINTS For each attribute A do evaluate splits on attribute A using attribute list Keep split with lowest GINI index 6/11

FINDING SPLIT POINTS Initialize class-histograms of left and right children; for each record in the attribute list do find the corresponding entry in Class List and the class and Leaf node evaluate splitting index for value(A) < record.value ; update the class histogram in the leaf 7/11

FINDING SPLIT POINTS 8/11

IMPLEMENTATION C++ Pre-Sorting is done on GPU (CUDA) 9/11

10/11

RESULTS 11/11