A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Chapter 4: Linear Models for Classification
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Lecture 14 – Neural Networks
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
x – independent variable (input)
Principal Component Analysis
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Active Learning with Support Vector Machines
Support Vector Machines Kernel Machines
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Machines
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
Lecture 10: Support Vector Machines
Radial Basis Function Networks
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support vector machines
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support vector machines
SVMs for Document Ranking
Presentation transcript:

A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Introduction A linear discriminant is a group of mathematical models that allows us to classify data (like microarray) into preset groups ( eg. cancer vs. non- cancer, metastatic vs. non metastatic, respond well to drug vs. poorly to drug ) ‘Discriminant’ simply means that it has the ability to discriminate between two classes. The meaning of the word ‘linear’ will become clearer later.

Motivation I Spoke previously at great length about common clustering methods for microarray data (unsupervised learning). Supervised techniques are much more powerful/useful. Linear discriminants (supervised method) are one of the older, well studied supervised techniques, both in traditional statistics and machine learning.

Motivation II Linear discriminants are widely used today in many application domains, including the modeling of various types of biological data. Many classes or sub-classes of techniques are actually linear discriminants (eg. Artificial Neural Networks, Fisher Discriminant, Support Vector Machine and many more). Provides very general framework upon which much has been built i.e. can extend to very sophisticated, robust techniques.

Patient_X= (gene_1, gene_2, gene_3, …, gene_N) N (number of dimensions) is normally larger than 2, so we can’t visualize the data. Cancerous Healthy eg. Classifying Cancer Patients vs. Healthy Patients from Microarray

Cancerous Healthy Gene_1 expression level For simplicity, pretend that we are only looking at expression levels of 2 genes Gene_2 expression level Up-regulated Down-regulated

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Question: How can we build a classifier for this data? Gene_2 expression level

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Simple Classification Rule: IF gene_1 <0 AND gene_2 <0 THEN person=healthy IF gene_1 >0 AND gene_2 >0 THEN person=cancerous Gene_2 expression level

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Simple Classification Rule: IF gene_1 <0 AND gene_2 <0 AND … gene 5000 < Y THEN person=healthy IF gene_1 >0 AND gene_2 >0 … gene 5000 >W THEN person=cancerous If we move away from our simple example with 2 genes to a realistic case with say 5000 genes, then 1.What will these rules look like? 2.How will we find them? Gets a little complicated, unwieldy…

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Gene_2 expression level Reformulate the previous rule SIMPLE RULE: If data point lies to the ‘left’ of the line, then ‘healthy’. If data point lies to ‘right’ of line then ‘cancerous’ It is easier to generalize this line to 5000 genes than it is a list of rules. Also easier to solve mathematically.

More Than 2 Genes (dimensions) ? Easy to Extend Cancerous Healthy Line in 2D: x 1 C 1 + x 2 C 2 = T If we had 3 genes, and needed to build a ‘line’ in 3-dimensional space, then we would be seeking a plane. Plane in 3D: x 1 C 1 + x 2 C 2 + x 3 C 3 = T If we were looking in more than 3 dimensions, the ‘plane’ is called a hyperplane. A hyperplane is simply a generalization of a plane to dimensions higher than 3. Hyperplane in N-dimensions: x 1 C 1 + x 2 C 2 + x 3 C 3 + … + x N C N = T

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Gene_2 expression level Why is it called ‘linear’? The rule of ‘which side is the point on’, looks, mathematically like: gene1*C 1 + gene2*C 2 > T then cancer gene1*C 1 + gene2*C 2 < T then healthy It is linear in the input (the gene expression levels). <T >T

Linear Vs. Non-Linear gene1*C 1 + gene2*C 2 > T gene1*C 1 + gene2*C 2 < T 1/[1+exp-(gene1*C 1 + gene2*C 2 +T)] < 0 1/[1+exp-(gene1*C 1 + gene2*C 2 +T)] > 0 gene1 2 *C 1 + gene2*C 2 > T gene1 2 *C 1 + gene2*C 2 < T gene1*gene2*C > T gene1*gene2*C < T ‘logistic’ linear discriminant Mathematically, linear problems are generally much easier to solve than non-linear problems.

There are actually many (infinite) lines that ‘properly’ divide the points. Which is the correct one? Back to our Linear Discriminant

margin One solution (that SVMs use): 1.Find line that has the all data points on the proper side. 2.Of all lines that satisfy (1), find the one that maximizes the ‘margin’ (smallest distance between any point and line). 3.This is called ‘Constrained Optimization’ in mathematics smaller marginlargest margin margin

In general, the line that you end up with depends on some criteria, defined by the ‘Objective Function’ (for SVM, the margin) An ‘Objective Function’ is chosen by the modeler, and varies depending on exactly what the modeler is trying to achieve or thinks will work well ( eg margin, posterior probabilities, sum of squares error, small weight vector ). The function usually has a theoretical foundation ( eg. risk minimization, maximum likelihood/gaussian processes/zero mean gaussian noise ). Obtaining Different ‘Lines’: Objective Functions

What if the data looked like this? Cancerous Healthy Gene_1 expression level Gene_2 expression level How could we build a suitable line that divides the data nicely? Depends… Is it just a few points that are small ‘outliers’? Or is the data simply not amenable to this kind of classification?

A few outliers – probably can still find a ‘good’ line. Almost linearly separable data. Not linearly separable data. Inherently, the data cannot be separated by any one line. Cancerous Healthy Cancerous Healthy Linearly separable data. Can make a great classifier.

Cancerous Healthy Not linearly separable data. Inherently, the data cannot be separated by any one line. Cancerous Healthy If we allow the model to have more than one line (or hyperplane), then maybe we can still form a nice model. Much more complicated. This is one thing that neural networks allow us to do: combine linear discriminants together to form a single classifier (no longer a linear classifier). No time to delve further during this talk.

Not linearly separable data. Now what?? Even with many lines it would be extremely difficult to build a good classifier.

0 5 Not linearly separable data. Need to transform the coordinates: polar coordinates, Principal Components coordinates, kernel transformation into higher dimensional space (support vector machines). Distance from center (radius) Angular degree (phase) Linearly separable data. polar coordinates Sometimes Need to Transform the Data

Caveats May need to find a subset of the data that is linearly separable (called feature selection). Feature selection is what we call in computer science, an NP-complete problem, which means, in layman’s terms: impossible to solve exactly. Feature selection is an open research problem. There are a spate of techniques that give you approximate solutions to feature selection. Features selection is mandatory in microarray expression experiments because there is so much noisy, irrelevant data. Also, with microarray data, there is much missing data – introduces difficulties.

Other Biological Applications Gene finding in DNA: (input is part of DNA strand, output is whether or not nucleotide at centre is inside of a gene). Sequence-based gene classification: the input is a gene sequence, output is a functional class. Protein secondary structure prediction: input is a sequence of amino acids, output is the local secondary structure. Protein localization in cell: the input is an amino acid sequence, the output is position in the cell (eg. nucleus, membrane, etc.) Taken from Introduction to Support Vector Machines and Applications to Computational Biology, Jean Philippe Vert

Wrap-Up Intuitive feel for linear discriminants. Widely applicable technique – for many problems in Polyomx and many other areas. Difficulties: missing data, feature selection. Have used linear discriminants for our SNP data and microarray data. If interested in knowing more, great book: Neural Networks for Pattern Recognition, Christopher Bishop, 1999.

Minimize objective function 1.Exact solution via matrix algebra since here E is convex. 2.Iterative algorithms (gradient descent, conjugate gradient, Newton’s method, etc.) for cases where E may not be convex. Finding the Equation of the Linear Discriminant (How a Single Layer Neural Network Might Do It) The discriminant function: Eg. Sum-of-squares error function (more for regression): Can regularize by adding in ||w|| 2 to E.

Minimize ||w|| 2 subject to the following constraints: The discriminant function: The margin is given by: Finding the Equation of the Linear Discriminant (How an SVM would do it.) Use Lagrange Multipliers