Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.

Slides:



Advertisements
Similar presentations
G53MLE | Machine Learning | Dr Guoping Qiu
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Machine Learning and Data Mining Clustering
Assuming normally distributed data! Naïve Bayes Classifier.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Decision Tree Learning
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Exercise Session 10 – Image Categorization
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Data mining and machine learning A brief introduction.
Bayesian Networks. Male brain wiring Female brain wiring.
Artificial Intelligence 7. Decision trees
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
Text Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Machine Learning Queens College Lecture 7: Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning. Probability Bayes Rule Choosing Hypotheses- Maximum a Posteriori Maximum Likelihood - Bayes Concept Learning Maximum Likelihood of.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Lecture 15: Text Classification & Naive Bayes
Revision (Part II) Ke Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Artificial Intelligence 9. Perceptron
Unsupervised Learning: Clustering
Hairong Qi, Gonzalez Family Professor
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka

Outline Supervised learning Naive Bayes classifier Unsupervised learning Clustering Lecture slides

Supervised and unsupervised learning Supervised learning – Each instance is assigned with a label – Classification, regression – Training data need to be created manually Unsupervised learning – Each instance is just a vector of attribute-values – Clustering – Pattern mining

Naive Bayes classifier Chapter 6.9 of Mitchell, T., Machine Learning (1997) Naive Bayes classifier – Output probabilities – Easy to implement – Assumes conditional independence between features – Efficient learning and classification

Thomas Bayes (1702 – 1761) The reverse conditional probability can be calculated using the original conditional probability and prior probabilities. Bayes’ theorem

Can we know the probability of having cancer from the result of a medical test? Bayes’ theorem

The probability of actually having cancer is not very high. Bayes’ theorem

Naive Bayes classifier Assume that features are conditionally independent Bayes’ theorem The denominator is constant. Conditional independence

Training data DayOutlookTemperatureHumidityWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo

Naive Bayes classifier Instance

Class prior probability Maximum likelihood estimation – Just counting the number of occurrences in the training data

Conditional probabilities of features Maximum likelihood

Class posterior probabilities Normalize

Smoothing Maximum likelihood estimation – Estimated probabilities are not reliable when n c is small m-estimate of probability : prior probability : equivalent sample size

Text classification with a Naive Bayes classifier Text classification – Automatic classification of news articles – Spam filtering – Sentiment analysis of product reviews – etc.

There were doors all round the hall, but they were all locked; and when Alice had been all the way down one side and up the other, trying every door, she walked sadly down the middle, wondering how she was ever to get out again.

Cannot be estimated reliably Ignore the position and apply m-estimate smoothing Conditional probabilities of words The probability of the second word of the document being the word “were”

Unsupervised learning No “correct” output for each instance Clustering – Merging “similar” instances into a group – Hierarchical clustering, k-means, etc.. Pattern mining – Discovering frequent patterns from a large amount of data – Association rules, graph mining, etc

Clustering Organize instances into groups whose members are similar in some way

Agglomerative clustering Define a distance between every pair of instances – E.g. cosine similarity Algorithm 1.Start with every instance representing a singleton cluster 2.The closest two clusters are merged into a single cluster 3.Repeat this process until all clusters are merged

Agglomerative clustering Example Dendrogram

Defining a distance between clusters single linkcomplete link group-averagecentroid

k-means algorithm Centroids Minimize Algorithm 1.Choose k centroids c 1,…c k randomely 2.Assign each instance to the cluster with the closest centroid 3.Update the centroids and go back to Step 2