1 Input and Output Thanks: I. Witten and E. Frank.

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

Web Usage Mining Classification Fang Yao MEMS Humboldt Uni zu Berlin.
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
Machine Learning in Real World: C4.5
Data Representation. The popular table  Table (relation)  propositional, attribute-value  Example  record, row, instance, case  individual, independent.
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Naïve Bayes: discussion
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Primer Parcial -> Tema 3 Minería de Datos Universidad del Cauca.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Linear Regression Demo using PolyAnalyst Generating Linear Regression Formula Generating Regression Rules for Categorical classification.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Instance-based representation
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Machine Learning: finding patterns. 2 Outline  Machine learning and Classification  Examples  *Learning as Search  Bias  Weka.
© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Algorithms for Classification: The Basic Methods.
Fall 2004Data Mining1 IE 483/583 Knowledge Discovery and Data Mining Dr. Siggi Olafsson Fall 2003.
Probabilistic techniques. Machine learning problem: want to decide the classification of an instance given various attributes. Data contains attributes.
Chapter 6 Decision Trees
CES 514 – Data Mining Spring 2010 Sonoma State University.
Data Mining – Output: Knowledge Representation
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Decision Trees.
Data Mining Practical Machine Learning Tools and Techniques Chapter 1: What’s it all about? Rodney Nielsen Many of these slides were adapted from: I. H.
Data Mining Schemes in Practice. 2 Implementation: Real machine learning schemes  Decision trees: from ID3 to C4.5  missing values, numeric attributes,
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
1 Data preparation: Selection, Preprocessing, and Transformation Literature: Literature: I.H. Witten and E. Frank, Data Mining, chapter 2 and chapter 7.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
1 CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Knowledge Discovery via Data mining Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, Roma, Italy,
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN Data Mining: Concepts and.
Data Science What’s it all about? Rodney Nielsen Associate Professor Director Human Intelligence and Language Technologies Lab Many of these slides were.
CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Classification Algorithms Covering, Nearest-Neighbour.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 By Gun Ho Lee Intelligent Information Systems Lab Soongsil.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Output: Knowledge Representation WFH: Data Mining,
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Practical Machine Learning Tools and Techniques
Data Science Algorithms: The Basic Methods
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Figure 1.1 Rules for the contact lens data.
2018학년도 하계방학 ZEPPELINX, Inc.인턴십 모집
Classification Algorithms
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 11
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

1 Input and Output Thanks: I. Witten and E. Frank

2 The weather problem Conditions for playing an outdoor game OutlookTemperatureHumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo OvercastHotHighFalseYes RainyMildNormalFalseYes …………… If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes

3 Classification vs. Association rules Classification rule: predicts value of pre- specified attribute (the classification of an example) Associations rule: predicts value of arbitrary attribute or combination of attributes If outlook = sunny and humidity = high then play = no If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no then humidity = high If windy = false and play = no then outlook = sunny and humidity = high

4 Weather data with mixed attributes Two attributes with numeric values OutlookTemperatureHumidityWindyPlay Sunny85 FalseNo Sunny8090TrueNo Overcast8386FalseYes Rainy7580FalseYes …………… If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes

5 The contact lenses data AgeSpectacle prescriptionAstigmatismTear production rateRecommended lenses YoungMyopeNoReducedNone YoungMyopeNoNormalSoft YoungMyopeYesReducedNone YoungMyopeYesNormalHard YoungHypermetropeNoReducedNone YoungHypermetropeNoNormalSoft YoungHypermetropeYesReducedNone YoungHypermetropeYesNormalhard Pre-presbyopicMyopeNoReducedNone Pre-presbyopicMyopeNoNormalSoft Pre-presbyopicMyopeYesReducedNone Pre-presbyopicMyopeYesNormalHard Pre-presbyopicHypermetropeNoReducedNone Pre-presbyopicHypermetropeNoNormalSoft Pre-presbyopicHypermetropeYesReducedNone Pre-presbyopicHypermetropeYesNormalNone PresbyopicMyopeNoReducedNone PresbyopicMyopeNoNormalNone PresbyopicMyopeYesReducedNone PresbyopicMyopeYesNormalHard PresbyopicHypermetropeNoReducedNone PresbyopicHypermetropeNoNormalSoft PresbyopicHypermetropeYesReducedNone PresbyopicHypermetropeYesNormalNone

6 A complete and correct rule set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none

7 A decision tree for this problem

8 Predicting CPU performance Cycle time (ns) Main memory (Kb) Cache (Kb) ChannelsPerformance MYCTMMINMMAXCACHCHMINCHMAXPRP … PRP = MYCT MMIN MMAX CACH CHMIN CHMAX

9 Data from labor negotiations AttributeType123…40 Duration(Number of years)1232 Wage increase first yearPercentage2%4%4.3%4.5 Wage increase second yearPercentage?5%4.4%4.0 Wage increase third yearPercentage???? Cost of living adjustment{none,tcf,tc}nonetcf?none Working hours per week(Number of hours) Pension{none,ret-allw, empl-cntr}none??? Standby payPercentage?13%?? Shift-work supplementPercentage?5%4%4 Education allowance{yes,no}yes??? Statutory holidays(Number of days) Vacation{below-avg,avg,gen}avggen avg Long-term disability assistance{yes,no}no??yes Dental plan contribution{none,half,full}none?full Bereavement assistance{yes,no}no??yes Health plan contribution{none,half,full}none?fullhalf Acceptability of contract{good,bad}badgood

10 Decision trees for the labor data

11 Instance-based representation Simplest form of learning: rote learning Training instances are searched for instance that most closely resembles new instance The instances themselves represent the knowledge Also called instance-based learning Similarity function defines what’s “learned” Instance-based learning is lazy learning Methods: nearest-neighbor, k-nearest- neighbor, …

12 Learning prototypes/Case Based Reasoning Only those instances involved in a decision need to be stored

13 Representing clusters I Simple 2-D representationVenn diagram Overlapping clusters

14 Representing clusters II a b c d e f g h … Probabilistic assignmentDendrogram NB: dendron is the Greek word for tree