Classification & Clustering MSE 2400 EaLiCaRA Dr. Tom Way.

Slides:



Advertisements
Similar presentations
Machine Learning (ML) Classification Tim Humphrey LexisNexis 21 June 2001.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
An Overview of Machine Learning
Indian Statistical Institute Kolkata
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Classification and Decision Boundaries
K nearest neighbor and Rocchio algorithm
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Neural Networks Marco Loog.
Induction of Decision Trees
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Chapter 9 – Classification and Regression Trees
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Classification Techniques: Bayesian Classification
Learning from observations
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Data Mining and Decision Support
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Copyright Paula Matuszek Kinds of Machine Learning.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
School of Computer Science & Engineering
Data Mining Lecture 11.
K Nearest Neighbor Classification
Advanced Artificial Intelligence Classification
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Classification & Clustering MSE 2400 EaLiCaRA Dr. Tom Way

Recall: Machine Learning a branch of artificial intelligence, is about the construction and study of systems that can learn from data. The ability of a computer to improve its own performance through the use of software that employs artificial intelligence techniques to mimic the ways by which humans seem to learn, such as repetition and experience. 2MSE 2400 Evolution & Learning

Statistics Applied Math Computer Science & Engineering Machine learning Machine Learning A very interdisciplinary field with long history. MSE 2400 Evolution & Learning3

Classification & Clustering The general idea… Machine learning techniques to group inputs into (hopefully) distinct categories… …and then to use those categories to identify group membership of new input in the future 4MSE 2400 Evolution & Learning

An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether to put a new patient in an intensive-care unit. Due to the high cost of ICU, those patients who may survive less than a month are given higher priority. Problem: to predict high-risk patients and discriminate them from low-risk patients. MSE 2400 Evolution & Learning5

Another application A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, –age –Marital status –annual salary –outstanding debts –credit rating –etc. Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved. MSE 2400 Evolution & Learning6

Definition of ML Classifier  Definition of Machine Learning from dictionary.com “The ability of a machine to improve its performance based on previous results.”  So, machine learning document classification is “the ability of a machine to improve its document classification performance based on previous results of document classification”. MSE 2400 Evolution & Learning7

You as an ML Classifier Topic 1 words: baseball, owners, sports, selig, ball, bill, indians, isringhausen, mets, minors, players, specter, stadium, power, send, new, bud, comes, compassion, game, headaches, lite, nfl, powerful, strawberry, urges, home, ambassadors, building, calendar, commish, costs, day, dolan, drive, hits, league, little, match, payments, pitch, play, player, red, stadiums, umpire, wife, youth, field, leads Topic 2 words: merger, business, bank, buy, announces, new, acquisition, finance, companies, com, company, disclosure, emm, news, us, acquire, chemical, inc, results, shares, takeover, corporation, european, financial, investment, market, quarter, two, acquires, bancorp, bids, communications, first, mln, purchase, record, stake, west, sale, bid, bn, brief, briefs, capital, control, europe, inculab 8

Use the previous slide’s topics & related words to classify the following titles 1.CYBEX-Trotter merger creates fitness equipment powerhouse 2.WSU RECRUIT CHOOSES BASEBALL INSTEAD OF FOOTBALL 3.FCC chief says merger may help pre-empt Internet regulation 4.Vision of baseball stadium growing 5.Regency Realty Corporation Completes Acquisition Of Branch properties 6.Red Sox to punish All-Star scalpers 7.Canadian high-tech firm poised to make $415-million acquisition 8.Futures-selling hits the Footsie for six 9.A'S NOTEBOOK; Another Young Arm Called Up 10.All-American SportPark Reaches Agreement for Release of Corporate Guarantees 9

Titles & Their Classifications 1.(2) CYBEX-Trotter merger creates fitness equipment powerhouse 2.(1) WSU RECRUIT CHOOSES BASEBALL INSTEAD OF FOOTBALL 3.(2) FCC chief says merger may help pre-empt Internet regulation 4.(1) Vision of baseball stadium growing 5.(2) Regency Realty Corporation Completes Acquisition Of Branch properties 6.(1) Red Sox to punish All-Star scalpers 7.(2) Canadian high-tech firm poised to make $415- million acquisition 8.(2) Futures-selling hits the Footsie for six 9.(1) A'S NOTEBOOK; Another Young Arm Called Up 10.(1) All-American SportPark Reaches Agreement for Release of Corporate Guarantees MSE 2400 Evolution & Learning10

A little math  Canadian high-tech firm poised to make $415-million acquisition 1.Estimate the probablity of a word in a topic by dividing the number of times the word appeared in the topic’s training set by the total number of word occurrences in the topic’s training set. 2.For each topic,T, sum the probability of finding each word of the title in a title that is classified as T. 3.The title is classified as the topic with the largest sum. Title’s evidence of being in Topic 2= Title’s evidence of being in Topic 1= Canadian 1 0: high 0 0: tech 2 0: firm 1 0 poised 0 0: make 0 0: million 4 4: acquisition 10 0 # of words in Topic2 = 1563 # of words in Topic1 = 429 MSE 2400 Evolution & Learning11

Machine learning and our focus Like human learning from past experiences. A computer does not have “experiences”. A computer system learns from data, which represent some “past experiences” of an application domain. Our focus: learn a target function that can be used to predict the values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low risk. The task is commonly called: Supervised learning, classification, or inductive learning. MSE 2400 Evolution & Learning12

Data: A set of data records (also called examples, instances or cases) described by –k attributes: A 1, A 2, … A k. –a class: Each example is labelled with a pre- defined class. Goal: To learn a classification model from the data that can be used to predict the classes of new (future, or test) cases/instances. The data and the goal MSE 2400 Evolution & Learning13

An example: data (loan application) Approved or not MSE 2400 Evolution & Learning14

An example: the learning task Learn a classification model from the data Use the model to classify future loan applications into –Yes (approved) and –No (not approved) What is the class for following case/instance? MSE 2400 Evolution & Learning15

Supervised vs. unsupervised Learning Supervised learning (classification): same thing as learning from examples. –Supervision: The data (observations, measurements, etc.) are labeled with pre- defined classes. It is like a “teacher” gives the classes (supervision). –Test data are classified into these classes too. Unsupervised learning (clustering) –Class labels of the data are unknown –Given a set of data, the task is to establish the existence of classes or clusters in the data MSE 2400 Evolution & Learning16

Supervised learning process: two steps Learning (training): Learn a model using the training data Testing: Test the model using unseen test data to assess the model accuracy MSE 2400 Evolution & Learning17

What do we mean by learning? Given –a data set D, –a task T, and –a performance measure M, a computer system is said to learn from D to perform the task T if after learning the system’s performance on T improves as measured by M. In other words, the learned model helps the system to perform T better as compared to no learning. MSE 2400 Evolution & Learning18

An example Data: Loan application data Task: Predict whether a loan should be approved or not. Performance measure: accuracy. No learning: classify all future applications (test data) to the majority class (i.e., Yes): Accuracy = 9/15 = 60%. We can do better than 60% with learning. MSE 2400 Evolution & Learning19

Fundamental assumption of learning Assumption: The distribution of training examples is identical to the distribution of test examples (including future unseen examples). In practice, this assumption is often violated to certain degree. Strong violations will clearly result in poor classification accuracy. To achieve good accuracy on the test data, training examples must be sufficiently representative of the test data. MSE 2400 Evolution & Learning20

Two categories of machine learning 1.Classification (supervised machine learning): With the class label known, learn the features of the classes to predict a future observation. The learning performance can be evaluated by the prediction error rate. 2.Clustering (unsupervised machine learning) Without knowing the class label, cluster the data according to their similarity and learn the features. Normally the performance is difficult to evaluate and depends on the content of the problem. MSE 2400 Evolution & Learning21

Machine learning classification MSE 2400 Evolution & Learning22

Machine learning clustering MSE 2400 Evolution & Learning23

Examples of ML Classifiers Decision Trees Bayesian Networks Neural Networks Support Vector Machines K-nearest Neighbor Instance-based Classifiers MSE 2400 Evolution & Learning24

Examples of ML algorithms  Decision Trees – Sort instances according to feature values, build into a hierarchy of tests based on entropy or other reasonable metric. Root node is the one that best divides the data. Branches represent the values that a node can assume (the “decision”). To build the tree, apply recursion to determine next best division... MSE 2400 Evolution & Learning25

Decision Trees, an example INPUT: data (symptom) low RBC count yes no size of cells largesmall bleeding? STOP B 12 deficient? yes stop bleeding no STOPgastrin assay positive negative STOPanemia OUTPUT: category (condition) MSE 2400 Evolution & Learning26

Decision Trees: Assessment Advantages: Classification of data based on limiting features is intuitive Handles discrete/categorical features best Limitations: Danger of “overfitting” the data Not the best choice for accuracy MSE 2400 Evolution & Learning27

Examples of ML algorithms  Naïve Bayes – This method computes the probability that a document is about a particular topic, T, using a) the words of the document to be classified and b) the estimated probability of each of these words as they appeared in the set of training documents for the topic, T – like the example previously given. MSE 2400 Evolution & Learning28

Bayesian Networks Graphical algorithm that encodes the joint probability distribution of a data set Captures probabilistic relationships between variables Based on probability that instances (data) belong in each category MSE 2400 Evolution & Learning29

Bayesian Networks, an example MSE 2400 Evolution & Learning30

Bayesian Networks: Assessment Advantages: Takes into account prior information regarding relationships among features Probabilities can be updated based on outcomes Fast!…with respect to learning classification Can handle incomplete sets of data Avoids “overfitting” of data Limitations: Not suitable for data sets with many features Not the best choice for accuracy MSE 2400 Evolution & Learning31

Examples of ML algorithms  Neural networks – During training, a neural network looks at the patterns of features (e.g. words, phrases, or N-grams) that appear in a document of the training set and attempts to produce classifications for the document. If its attempt doesn’t match the set of desired classifications, it adjusts the weights of the connections between neurons. It repeats this process until the attempted classifications match the desired classifications. MSE 2400 Evolution & Learning32

Neural Networks Used for: –Classification –Noise reduction –Prediction Great because: –Able to learn –Able to generalize Kiran –Plaut’s (1996) semantic neural network that could be lesioned and retrained – useful for predicting treatment outcomes Mikkulainen –Evolving neural network that could adapt to the gaming environment – useful learning application MSE 2400 Evolution & Learning33

Neural Networks: Biological Basis MSE 2400 Evolution & Learning34

Feed-forward Neural Network Perceptron : Hidden layer MSE 2400 Evolution & Learning35

Neural Networks: Training Presenting the network with sample data and modifying the weights to better approximate the desired function. Supervised Learning –Supply network with inputs and desired outputs –Initially, the weights are randomly set –Weights modified to reduce difference between actual and desired outputs –Backpropagation MSE 2400 Evolution & Learning36

Backpropagation MSE 2400 Evolution & Learning37

Examples of ML algorithms  Support Vector Machines (SVM) – do supervised learning by analyzing data and recognizing patterns, used for classification tasks. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output. MSE 2400 Evolution & Learning38

Support Vector Machines Support vector machines were invented by V. Vapnik and his co-workers in 1970s in Russia and became known to the West in SVMs are linear classifiers that find a hyperplane to separate two class of data, positive and negative. Kernel functions are used for nonlinear separation. SVM not only has a rigorous theoretical foundation, but also performs classification more accurately than most other methods in applications, especially for high dimensional data. It is perhaps the best classifier for text classification. MSE 2400 Evolution & Learning39

Basic SVM concepts Let the set of training examples D be {(x 1, y 1 ), (x 2, y 2 ), …, (x r, y r )}, where x i = (x 1, x 2, …, x n ) is an input vector in a real-valued space X  R n and y i is its class label (output value), y i  {1, -1}. 1: positive class and -1: negative class. SVM finds a linear function of the form (w: weight vector) f(x) =  w  x  + b MSE 2400 Evolution & Learning40

The SVM hyperplane The hyperplane that separates positive and negative training data is  w  x  + b = 0 It is also called the decision boundary (surface). So many possible hyperplanes, which one to choose? MSE 2400 Evolution & Learning41

Examples of ML algorithms  K-nearest neighbor – Creates clusters of data in an interactive fashion… does not use training data… it works on the actual data. MSE 2400 Evolution & Learning42

k-Nearest Neighbor Classification (kNN) Unlike all the previous learning methods, kNN does not build model from the training data. To classify a test instance d, define k- neighborhood P as k nearest neighbors of d Count number n of training instances in P that belong to class c j Estimate Pr(c j |d) as n/k No training is needed. Classification time is linear in training set size for each test case. MSE 2400 Evolution & Learning43

kNNAlgorithm k is usually chosen empirically via a validation set or cross-validation by trying a range of k values. Distance function is crucial, but depends on applications. MSE 2400 Evolution & Learning44

Example: k=6 (6NN) Government Science Arts A new point Pr(science| )? MSE 2400 Evolution & Learning45

Discussions kNN can deal with complex and arbitrary decision boundaries. Despite its simplicity, researchers have shown that the classification accuracy of kNN can be quite strong and in many cases as accurate as those elaborated methods. kNN is slow at the classification time kNN does not produce an understandable model MSE 2400 Evolution & Learning46

Examples of ML algorithms  Instance based – Saves documents of the training set and compares new documents to be classified with the saved documents. The document to be classified gets tagged with the highest scoring classifications. One way to do this is to implement a search engine using the documents of the training set as the document collection. A document to be classified becomes a query/search. A classification, C, is picked if a large number of its training set documents are at the top of the returned answer set. MSE 2400 Evolution & Learning47

Building our own classifiers In an upcoming lab, we will build our own classifiers using the Python programming language 48MSE 2400 Evolution & Learning

How well do ML classifiers work?  A good system will have an accuracy of above 80%.  Strong evidence of how good these systems are is the number of companies in the market place with machine learning document classification systems. MSE 2400 Evolution & Learning49

Advantages  Advantage over classification by humans: Once the system is trained, classification is done automatically with no or little human intervention – saving human resources.  Advantage over classification by humans: Consistent classification.  Advantage over rule based classification: Human resources are not needed to make rules. MSE 2400 Evolution & Learning50

Disadvantages  Disadvantage: Not always obvious why it classified a document in a certain way and not obvious how to keep it from doing the same type of classification in the future (i.e. don’t know how to modify it.)  Disadvantage: Human resources must be used to manually classify documents for the training set. Furthermore, the number and type of document that should be in the training set isn’t straightforward. MSE 2400 Evolution & Learning51

Uses of ML classifiers  Automatically classify things.  Suggest classifications that a human can pick from.  Classify paragraphs or even sentences in a document.  Find important information in a document. For example, rules of law in a case law document, or the facts of the case. MSE 2400 Evolution & Learning52

Challenges  Labeling the documents of the training set.  What is the best way to pick documents for the training set so the machine-learning algorithm produces a classifier with high accuracy?  Which machine-learning algorithm works best on your classification problem? MSE 2400 Evolution & Learning53

The Future of Supervised Learning (1) Generation of synthetic data –A major problem with supervised learning is the necessity of having large amounts of training data to obtain a good result. –Why not create synthetic training data from real, labeled data? –Example: use a 3D model to generate multiple 2D images of some object (such as a face) under different conditions (such as lighting). –Labeling only needs to be done for the 3D model, not for every 2D model. MSE 2400 Evolution & Learning54

The Future of Supervised Learning (2) Future applications –Personal software assistants learning from past usage the evolving interests of their users in order to highlight relevant news (e.g., filtering scientific journals for articles of interest) –Houses learning from experience to optimize energy costs based on the particular usage patterns of their occupants –Analysis of medical records to assess which treatments are more effective for new diseases –Enable robots to better interact with humans MSE 2400 Evolution & Learning55