A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Introduction A linear discriminant is a group of mathematical models that allows us to classify data (like microarray) into preset groups ( eg. cancer vs. non- cancer, metastatic vs. non metastatic, respond well to drug vs. poorly to drug ) ‘Discriminant’ simply means that it has the ability to discriminate between two classes. The meaning of the word ‘linear’ will become clearer later.

Motivation I Spoke previously at great length about common clustering methods for microarray data (unsupervised learning). Supervised techniques are much more powerful/useful. Linear discriminants (supervised method) are one of the older, well studied supervised techniques, both in traditional statistics and machine learning.

Motivation II Linear discriminants are widely used today in many application domains, including the modeling of various types of biological data. Many classes or sub-classes of techniques are actually linear discriminants (eg. Artificial Neural Networks, Fisher Discriminant, Support Vector Machine and many more). Provides very general framework upon which much has been built i.e. can extend to very sophisticated, robust techniques.

Patient_X= (gene_1, gene_2, gene_3, …, gene_N) N (number of dimensions) is normally larger than 2, so we can’t visualize the data. Cancerous Healthy eg. Classifying Cancer Patients vs. Healthy Patients from Microarray

Cancerous Healthy Gene_1 expression level For simplicity, pretend that we are only looking at expression levels of 2 genes. -5-5 0 5 -5-5 0 5 Gene_2 expression level Up-regulated Down-regulated

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Question: How can we build a classifier for this data? -5-5 0 5 -5-5 0 5 Gene_2 expression level

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level Simple Classification Rule: IF gene_1 <0 AND gene_2 <0 THEN person=healthy IF gene_1 >0 AND gene_2 >0 THEN person=cancerous -5-5 0 5 -5-5 0 5 Gene_2 expression level

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Simple Classification Rule: IF gene_1 <0 AND gene_2 <0 AND … gene 5000 < Y THEN person=healthy IF gene_1 >0 AND gene_2 >0 … gene 5000 >W THEN person=cancerous If we move away from our simple example with 2 genes to a realistic case with say 5000 genes, then 1.What will these rules look like? 2.How will we find them? Gets a little complicated, unwieldy…

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level -5-5 0 5 -5-5 0 5 Gene_2 expression level Reformulate the previous rule SIMPLE RULE: If data point lies to the ‘left’ of the line, then ‘healthy’. If data point lies to ‘right’ of line then ‘cancerous’ It is easier to generalize this line to 5000 genes than it is a list of rules. Also easier to solve mathematically.

More Than 2 Genes (dimensions) ? Easy to Extend Cancerous Healthy -5-5 0 5 -5-5 0 5 Line in 2D: x 1 C 1 + x 2 C 2 = T If we had 3 genes, and needed to build a ‘line’ in 3-dimensional space, then we would be seeking a plane. Plane in 3D: x 1 C 1 + x 2 C 2 + x 3 C 3 = T If we were looking in more than 3 dimensions, the ‘plane’ is called a hyperplane. A hyperplane is simply a generalization of a plane to dimensions higher than 3. Hyperplane in N-dimensions: x 1 C 1 + x 2 C 2 + x 3 C 3 + … + x N C N = T

eg. Classifying Cancer Patients vs. Healthy Patients from Microarray Cancerous Healthy Gene_1 expression level -5-5 0 5 -5-5 0 5 Gene_2 expression level Why is it called ‘linear’? The rule of ‘which side is the point on’, looks, mathematically like: gene1*C 1 + gene2*C 2 > T then cancer gene1*C 1 + gene2*C 2 < T then healthy It is linear in the input (the gene expression levels). <T >T

Linear Vs. Non-Linear gene1*C 1 + gene2*C 2 > T gene1*C 1 + gene2*C 2 < T 1/[1+exp-(gene1*C 1 + gene2*C 2 +T)] < 0 1/[1+exp-(gene1*C 1 + gene2*C 2 +T)] > 0 gene1 2 *C 1 + gene2*C 2 > T gene1 2 *C 1 + gene2*C 2 < T gene1*gene2*C > T gene1*gene2*C < T ‘logistic’ linear discriminant Mathematically, linear problems are generally much easier to solve than non-linear problems.

-5-5 0 5 -5-5 0 5 There are actually many (infinite) lines that ‘properly’ divide the points. Which is the correct one? Back to our Linear Discriminant

-5-5 0 5 -5-5 0 5 margin One solution (that SVMs use): 1.Find line that has the all data points on the proper side. 2.Of all lines that satisfy (1), find the one that maximizes the ‘margin’ (smallest distance between any point and line). 3.This is called ‘Constrained Optimization’ in mathematics. -5-5 0 5 -5-5 0 5 smaller marginlargest margin margin

In general, the line that you end up with depends on some criteria, defined by the ‘Objective Function’ (for SVM, the margin) An ‘Objective Function’ is chosen by the modeler, and varies depending on exactly what the modeler is trying to achieve or thinks will work well ( eg margin, posterior probabilities, sum of squares error, small weight vector ). The function usually has a theoretical foundation ( eg. risk minimization, maximum likelihood/gaussian processes/zero mean gaussian noise ). Obtaining Different ‘Lines’: Objective Functions

What if the data looked like this? Cancerous Healthy Gene_1 expression level -5-5 0 5 -5-5 0 5 Gene_2 expression level How could we build a suitable line that divides the data nicely? Depends… Is it just a few points that are small ‘outliers’? Or is the data simply not amenable to this kind of classification?

A few outliers – probably can still find a ‘good’ line. Almost linearly separable data. Not linearly separable data. Inherently, the data cannot be separated by any one line. Cancerous Healthy Cancerous Healthy Linearly separable data. Can make a great classifier.

Cancerous Healthy -5-5 0 5 -5-5 0 5 Not linearly separable data. Inherently, the data cannot be separated by any one line. Cancerous Healthy If we allow the model to have more than one line (or hyperplane), then maybe we can still form a nice model. Much more complicated. This is one thing that neural networks allow us to do: combine linear discriminants together to form a single classifier (no longer a linear classifier). No time to delve further during this talk.

-5-5 0 5 -5-5 0 5 Not linearly separable data. Now what?? Even with many lines it would be extremely difficult to build a good classifier.

0 5 Not linearly separable data. Need to transform the coordinates: polar coordinates, Principal Components coordinates, kernel transformation into higher dimensional space (support vector machines). Distance from center (radius) Angular degree (phase) Linearly separable data. polar coordinates Sometimes Need to Transform the Data

Caveats May need to find a subset of the data that is linearly separable (called feature selection). Feature selection is what we call in computer science, an NP-complete problem, which means, in layman’s terms: impossible to solve exactly. Feature selection is an open research problem. There are a spate of techniques that give you approximate solutions to feature selection. Features selection is mandatory in microarray expression experiments because there is so much noisy, irrelevant data. Also, with microarray data, there is much missing data – introduces difficulties.

Other Biological Applications Gene finding in DNA: (input is part of DNA strand, output is whether or not nucleotide at centre is inside of a gene). Sequence-based gene classification: the input is a gene sequence, output is a functional class. Protein secondary structure prediction: input is a sequence of amino acids, output is the local secondary structure. Protein localization in cell: the input is an amino acid sequence, the output is position in the cell (eg. nucleus, membrane, etc.) Taken from Introduction to Support Vector Machines and Applications to Computational Biology, Jean Philippe Vert

Wrap-Up Intuitive feel for linear discriminants. Widely applicable technique – for many problems in Polyomx and many other areas. Difficulties: missing data, feature selection. Have used linear discriminants for our SNP data and microarray data. If interested in knowing more, great book: Neural Networks for Pattern Recognition, Christopher Bishop, 1999.

Minimize objective function 1.Exact solution via matrix algebra since here E is convex. 2.Iterative algorithms (gradient descent, conjugate gradient, Newton’s method, etc.) for cases where E may not be convex. Finding the Equation of the Linear Discriminant (How a Single Layer Neural Network Might Do It) The discriminant function: Eg. Sum-of-squares error function (more for regression): Can regularize by adding in ||w|| 2 to E.

Minimize ||w|| 2 subject to the following constraints: The discriminant function: The margin is given by: Finding the Equation of the Linear Discriminant (How an SVM would do it.) Use Lagrange Multipliers

A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Similar presentations

Presentation on theme: "A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Similar presentations

Presentation on theme: "A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002."— Presentation transcript:

Similar presentations

About project

Feedback