N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.

N.U.S. - January 13, 2006 Gert Lanckriet (gert@ece.ucsd.edu) U.C. San Diego Classification problems with heterogeneous information sources

Motivation Statistical machine learning –Blends statistics, computer science, signal processing, optimization –Involves solving large-scale data analysis problems autonomously in tandem with a human Challenges: –Massive scale of data sets –On-line issues –Diversity of information sources describing data

Example: web-related applications Data point = web page Sources of information about the webpage: –Content: Text Images Structure Sounds –Relation to other webpages: links  network –Users (log data): click behavior origin

Example: web-related applications Data point = web page Sources of information about the webpage: –Content: Text Images Structure Sounds –Relation to other webpages: links  network –Users (log data): click behavior origin Information in diverse (heterogeneous) formats

Example: bioinformatics mRNA expression data upstream region data (TF binding sites) protein-protein interaction data hydrophobicity data sequence data (gene, protein)

Overview Kernel methods Classification problems Kernel methods with heterogeneous information Classification with heterogeneous information (SDP) Applications in computational biology

Kernel-based learning Linear algorithm SVM, MPM, PCA, CCA, FDA… Data Embed data x1x1 xnxn if data described by numerical vectors: embedding ~ (non-linear) transformation  non-linear versions of linear algorithms

Kernel-based learning Linear algorithm SVM, MPM, PCA, CCA, FDA… Data Embed data x1x1 xnxn embedding can be defined for non-vector data

Kernel-based learning Embed data K i j IMPLICITLY: Inner product measures similarity Property: Any symmetric positive definite matrix specifies a kernel matrix & every kernel matrix is symmetric positive definite

Kernel-based learningData Embed data x1x1 xnxn

Kernel-based learning Data Embed data Linear algorithm SVM, MPM, PCA, CCA, FDA… Kernel design Kernel algorithm K x1x1 xnxn

Kernel methods Unifying learning framework –connections to statistics, convex optimization, functional analysis –different data analysis problems can be formulated within this framework Classification Clustering Regression Dimensionality reduction Many successful applications

Kernel methods Unifying learning framework –connections to statistics, convex optimization, functional analysis –different data analysis problems can be formulated within this framework Many successful applications –hand-writing recognition –text classification –analysis of micro-array data –face detection –time series prediction

Training data: {(x i,y i )} i=1...n –x i : description i th object –y i 2 {-1,+1} : label Binary classification Problem: design a classification rule such that, given a new x, it predicts y with minimal probability of error HEART URINE DNA BLOOD SCAN HEART URINE DNA BLOOD SCAN x1x1 x2x2 y 1 = -1y 2 = +1

Find hyperplane that separates the two classes Binary classification HEART URINE DNA BLOOD SCAN x1x1 HEART URINE DNA BLOOD SCAN x2x2 Classification Rule:

Maximal margin classification Maximize margin: –Position hyperplane between two classes –Such that 2-norm distance to closest point from each class is maximized

If not linearly separable: –Allow some errors –Try to maximize margin for data points with no error Maximal margin classification

max margin min error correctly classified error slack Maximal margin classification: training algorithm

Training: convex optimization problem (QP) Dual problem: Maximal margin classification

Training: convex optimization problem (QP) Dual problem: Optimality condition: Maximal margin classification

Training: Classification rule: classify new data point x: Maximal margin classification

Kernel-based classification Data Embed data Linear classification algorithm Support vector machine (SVM) Kernel design Kernel algorithm K x1x1 xnxn

Overview Kernel methods Classification problems Kernel methods with heterogeneous information Classification with heterogeneous information (SDP) Applications in computational biology

Kernel methods with heterogeneous info Data points: proteins Information sources: K i j

Kernel methods with heterogeneous info Data points: proteins Information sources: K

Kernel methods with heterogeneous data Proposed approach –First focus on every single source j of information individually –Extract relevant information from source j into K j –Design algorithm to learn the optimal K, by “mixing” any number of kernel matrices K j, for a given learning problem

Kernel methods with heterogeneous data 1 K 2

Proposed approach –First focus on every single source k of information individually –Extract relevant information from source j into K j –Design algorithm that learns the optimal K, by “mixing” any number of kernel matrices K j, for a given learning problem Focus on kernel design for specific types of information Flexibility Can ignore information irrelevant for learning task Homogeneous, standardized input 1 2

Data matrix: –each row corresponds to a gene (data point) –each column corresponds to an experiment (mRNA expression level) Each gene: described by vector of numbers Kernel design: classical vector data

Inner product : Normalized inner product : Similar Dissimilar Kernel design: classical vector data

A more advanced similarity measurement for vector data: Gaussian kernel Corresponds to highly non-linear embedding Kernel design: classical vector data

>ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYMENSHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI Data points: proteins Described by variable-length, discrete strings (amino acid sequences) Kernel design: derive valid similarity measure, based on non-vector information Kernel design: strings protein 1 protein 2

>ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKALVLDTDVSNGVKEYMENSLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVKYVNTFKEALE KFDKALKALPMHIRLSFNPTQLEEQCHI Kernel design: strings >ICYA_JAKSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLDYCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI more similar String kernels less similar

Diffusion kernel: establishes similarities between vertices of a graph, based on the connectivity information –based upon a random walk –efficiently accounts for all paths connecting two vertices, weighted by path lengths Data points: vertices Information: connectivity described by graph Kernel design: graph

Kernel methods with heterogeneous data 1 K 2 ?

Learning the kernel matrix K ?? Any symmetric positive definite matrix specifies a kernel matrix Define cost function to assess the quality of a kernel matrix Positive semidefinite matrices form a convex cone Restrict to convex cost functions Learn K from the convex cone of positive-semidefinite matrices… … according to a convex quality measure

Learning the kernel matrix K ? Learn K from the convex cone of positive-semidefinite matrices… ? … according to a convex quality measure Semidefinite Programming (SDP) : deals with optimizing convex cost functions over the convex cone of positive semidefinite matrices (or a convex subset of it)

K Integrate constructed kernels Large margin classifier (SVM) ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels

K Integrate constructed kernels learn a linear combination Large margin classifier (SVM) ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels

K Large margin classifier (SVM) maximize the margin ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels Integrate constructed kernels learn a linear combination

SVM, one kernel, dual formulation SVM, multiple kernels, dual formulation Classification with multiple kernels Convex (pointwise max of set of convex functions) Semidefinite programming problem

SVM, one kernel, dual formulation SVM, multiple kernels, dual formulation Classification with multiple kernels Need to reformulate this in standard SDP format

Integrate constructed kernels learn a linear mix Large margin classifier (SVM) maximize the margin SDP (standard form) Classification with multiple kernels

Integrate constructed kernels learn a linear mix Large margin classifier (SVM) maximize the margin Theoretical performance guarantees Classification with multiple kernels

Yeast membrane protein prediction Yeast protein function prediction Applications in computational biology

Yeast Membrane Protein Prediction Membrane proteins: –anchor in various cellular membranes –serve important communicative functions across the membrane –important drug targets About 30% of the proteins are membrane proteins

Protein sequences: SW scores Protein sequences: BLAST scores E-values of Pfam domains Protein-protein interactions mRNA expression profiles Hydropathy profile Yeast Membrane Protein Prediction Diffusion Gaussian

Protein sequences: SW scores Protein sequences: BLAST scores E-values of Pfam domains Protein-protein interactions mRNA expression profiles Hydropathy profile Yeast Membrane Protein Prediction K

Five different types of data: –Pfam domains –genetic interactions (CYGD) –physical interactions (CYGD) –protein-protein interaction (TAP) –mRNA expression profiles Compare our approach to approach using Markov Random Fields (Deng et al.) –using the five types of data –also reporting improved accuracy compared to using any single data type Yeast Protein Function Prediction

MRF SDP/SVM (binary) SDP/SVM (enriched) Yeast Protein Function Prediction

Conclusion Computational and statistical framework to integrate data from heterogeneous information sources –flexible and unified approach –within kernel methodology –specifically: classification problems –resulting formulation: semidefinite programming Applications show classification performance can be enhanced by integrating diverse genome-wide information sources

N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.

Similar presentations

Presentation on theme: "N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.

Similar presentations

Presentation on theme: "N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources."— Presentation transcript:

Similar presentations

About project

Feedback