Delbert Dueck Department of Electrical & Computer Engineering University of Toronto July 30, 2008 Society for Mathematical Biology Conference Affinity.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Basic Gene Expression Data Analysis--Clustering

Slides from: Doug Gray, David Poole

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Cluster Analysis: Basic Concepts and Algorithms

Perceptron Learning Rule

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Clustering by Passing Messages Between Data Points Brendan J. Frey and Delbert Dueck Science, 2007.

27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.

Machine Learning Neural Networks

Non-metric affinity propagation for unsupervised image categorization Delbert Dueck and Brendan J. Frey ICCV 2007.

Functional genomics and inferring regulatory pathways with gene expression data.

Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.

Clustering (Part II) 11/26/07. Spectral Clustering.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Artificial Neural Networks

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Epistasis Analysis Using Microarrays Chris Workman.

An Integrated Pose and Correspondence Approach to Image Matching Anand Rangarajan Image Processing and Analysis Group Departments of Electrical Engineering.

The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.

Hub Queue Size Analyzer Implementing Neural Networks in practice.

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Radial Basis Function Networks

Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.

COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Gene expression & Clustering (Chapter 10)

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.

Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic.

Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Markov Cluster (MCL) algorithm Stijn van Dongen.

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.

Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Data-Driven 3D Voxel Patterns for Object Category Recognition Andrew Sharp.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Semi-Supervised Clustering

Clustering CSC 600: Data Mining Class 21.

Semi-supervised Affinity Propagation

1 Department of Engineering, 2 Department of Mathematics,

Learning to Combine Bottom-Up and Top-Down Segmentation

1 Department of Engineering, 2 Department of Mathematics,

Jianping Fan Dept of Computer Science UNC-Charlotte

1 Department of Engineering, 2 Department of Mathematics,

Computer Vision Chapter 4

Expectation-Maximization & Belief Propagation

Principle of Epistasis Analysis

“Clustering by Passing Messages Between Data Points”

Label propagation algorithm

Presentation transcript:

Delbert Dueck Department of Electrical & Computer Engineering University of Toronto July 30, 2008 Society for Mathematical Biology Conference Affinity Propagation: Clustering by Passing Messages Between Data Points

Caravaggio’s “Vocazione di San Matteo” (The Calling of St. Matthew) An interpretation of affinity propagation by Marc Mézard, Laboratoire de Physique Théorique et Modeles Satistique, Paris Affinity Propagation: Clustering by Passing Messages Between Data Points Delbert Dueck Probabilistic and Statistical Inference Lab Electrical & Computer Engineering University of Toronto July 30, 2008 Society for Mathematical Biology Conference Where is the exemplar?

Exemplar-based clustering T ASK : I NPUTS : A set of real-valued pairwise similarities, {s(i,k)}, between data points and the number of exemplars (K) or a real-valued exemplar cost O UTPUT : A subset of exemplar data points and an assignment of every other point to an exemplar O BJECTIVE F UNCTION : Maximize the sum of similarities between data points and their exemplars, minus the exemplar costs Identify a subset of data points as exemplars and assign every other data point to one of those exemplars

Exemplar-based Clustering Why is this an important problem? User-specified similarities offer a large amount of flexibility The clustering algorithm can be uncoupled from the details of how similarities are computed There is potential for significant improvement on existing algorithms

Greedy Method: k-medians clustering Randomly choose initial exemplars, (data centers) Assign data points to nearest centers For each cluster, pick best new center For each cluster, pick best new center Assign data points to nearest centers Convergence: Final set of exemplars (centers)

Affinity Propagation How well does k -medians clustering work?

Olivetti face database contains 400 greyscale 64×64 images from 40 people Similarity is based on sum-of-squared distance using a central 50×50 pixel window Small enough problem to find exact solution Example: Olivetti face images

Olivetti faces: squared error achieved by ONE MILLION runs of k -medians clustering Exact solution (using LP relaxation + days of computation) k-medians clustering, one million random restarts for each k Number of clusters, k Squared error

Affinity Propagation Closing the performance gap: AFFINITY PROPAGATION

Science, 16 Feb joint work with Brendan Frey One-sentence summary: All data points are simultaneously considered as exemplars, but exchange deterministic messages while a good set of exemplars gradually emerges.

Affinity Propagation: visualization

Affinity Propagation T ASK : I NPUTS : A set of pairwise similarities, {s(i,k)}, where s(i,k) is a real number indicating how well-suited data point k is as an exemplar for data point i e.g. s(i,k) = − ‖ x i − x k ‖ 2, i≠k For each data point k, a real number, s(k,k), indicating the a priori preference that it be chosen as an exemplar e.g. s(k,k) = p ∀ k Identify a subset of data points as exemplars and assign every other data point to one of those exemplars Need not be metric!

Affinity Propagation: message-passing Affinity propagation can be viewed as data points exchanging messages amongst themselves It can be derived as belief propagation (max-product) on a completely-connected factor graph Sending responsibilities, r Candidate exemplar k r(i,k) Data point i Competing candidate exemplar k’ a(i,k’) Sending availabilities, a Candidate exemplar k a(i,k) Data point i Supporting data point i’ r(i’,k)

Affinity Propagation: update equations Sending responsibilities Candidate exemplar k r(i,k) Data point i Competing candidate exemplar k’ a(i,k’) Sending availabilities Candidate exemplar k a(i,k) Data point i Supporting data point i’ r(i’,k) Making decisions:

Affinity Propagation: M ATLAB code 01 N=size(S,1); A=zeros(N,N); R=zeros(N,N); % initialize messages 02 S=S+1e-12*randn(N,N)*(max(S(:))-min(S(:))); % remove degeneracies 03 lam=0.5; % Set damping factor 04 for iter=1:100, 05 Rold=R; % NOW COMPUTE RESPONSIBILITIES 06 AS=A+S; [Y,I]=max(AS,[],2); 07 for i=1:N, AS(i,I(i))=-realmax; end; 08 [Y2,I2]=max(AS,[],2); 09 R=S-repmat(Y,[1,N]); 10 for i=1:N, R(i,I(i))=S(i,I(i))-Y2(i); end; 11 R=(1-lam)*R+lam*Rold; % Dampen responsibilities 12 Aold=A; % NOW COMPUTE AVAILABILITIES 13 Rp=max(R,0); for k=1:N, Rp(k,k)=R(k,k); end; 14 A=repmat(sum(Rp,1),[N,1])-Rp; 15 dA=diag(A); A=min(A,0); for k=1:N, A(k,k)=dA(k); end; 16 A=(1-lam)*A+lam*Aold; % dampen availabilities 17 end; 18 E=R+A; % pseudomarginals 19 I=find(diag(E)>0); K=length(I); % indices of exemplars 20 [tmp c]=max(S(:,I),[],2); c(I)=1:K; idx=I(c); % assignments More code available at

Recall Olivetti faces: squared error achieved by 1 million runs of k -medians clustering Exact solution (using LP relaxation + days of computation) k-medians clustering, one million random restarts for each K Number of clusters, K Squared error

Olivetti faces: squared error achieved by Affinity Propagation Exact solution (using LP relaxation + days of computation) k-medians clustering, one million random restarts for each K Number of clusters, K Squared error Affinity propagation, one run, 1000 times faster than 10 6 k -medians runs

A survey of applications investigated by other researchers and developers VQ codebook design, Jiang et al., 2007 Image segmentation, Xiao et al., 2007 Object classification, Fu et al., 2007 Finding light sources using images, An et al., 2007 Microarray analysis, Leone et al., 2007 Computer network analysis, Code et al., 2007 Audio-visual data analysis, Zhang et al., 2007 Protein sequence analysis, Wittkop et al., 2007 Protein clustering, Lees et al., 2007 Analysis of cuticular hydrocarbons, Kent et al., 2007 …

Affinity Propagation Affinity Propagation: Applications in Bioinformatics

Detecting transcripts (genes) using microarray data (Data from Frey et al., Nature Genetics 2005) s(segment i, segment k) = Similarity of expression patterns (columns) minus distance between segments in the DNA/genome s(segment i, garbage) = tunable constant # segments = 76,000 for chromosome 1 Mouse tissues DNA activity Low High Position in DNA … Segment i Segment k

Mouse tissues DNA activity Low High Position in DNA … Segment i Segment k False positive rate (%) True positives (%) REFSEQ Gene reconstruction error Number of clusters (“genes”) k-medians clustering (10,000 runs) Affinity propagation Random guessing Detecting transcripts (genes) using microarray data (Data from Frey et al., Nature Genetics 2005)

Gene-drug interactions for 1259 drugs on 5985 genes Threshold to binary interaction matrix GOAL: find small query set of genes on which new drugs could be tested to predict interactions for non-query genes Hold out 10% of drugs as test set s ( i, k ) = #drugs interacting with both gene i and gene j drugs yeast genes new drugs drugs (test set) drugs (training set) query set of genes Application #2: Yeast gene-deletion strains (presented at RECOMB 2008)

K (number of strain representatives) net similarity (interactions correctly predicted on training data) Affinity Propagation k-medians clustering (best of 10 restarts) k-medians clustering (best of 100 restarts) k-medians clustering (best of 1000 restarts) k-medians clustering (best of 10,000 restarts) k-medians clustering (best of 100,000 restarts) Application #2: Yeast gene-deletion strains

specificity (proportion of non-interactions correctly predicted in test data) sensitivity (proportion of interactions correctly predicted in test data) Affinity Propagation k-medians clustering (best of 10 restarts) k-medians clustering (best of 100 restarts) k-medians clustering (best of 1000 restarts) k-medians clustering (best of 10,000 restarts) k-medians clustering (best of 100,000 restarts) Application #2: Yeast gene-deletion strains

Some data points are potential treatments,  Correspond to HIV strain sequences Other data points are targets, , (sequence fragments) Correspond to epitopes that immune system responds to · · · Application #3: HIV vaccine design (presented at RECOMB 2008) · · · · · · MGARASVLSGGELDRWEKIRLRPGGKKKYQLKHIVWASRELERF · · · · · · MGARASVLSGGELDRWEKIRLRPGGKKKYRLKHIVWASRELERF · · · MGARASVLS GARASVLSG ARASVLSGG RASVLSGGK ASVLSGGKL SVLSGGKLD VLSGGKLDK LSGGKLDKW SGGKLDKWE GGKLDKWEK GKLDKWEKI KLDKWEKIR LDKWEKIRL DKWEKIRLR KWEKIRLRP WEKIRLRPG EKIRLRPGG KIRLRPGGK IRLRPGGKK RLRPGGKKK LRPGGKKKY RPGGKKKYK PGGKKKYKL GGKKKYKLK GKKKYKLKH KKKYKLKHI KKYKLKHIV KYKLKHIVW YKLKHIVWA KLKHIVWAS LKHIVWASR KHIVWASRE HIVWASREL IVWASRELE VWASRELER WASRELERF RASVLSGGE ASVLSGGEL SVLSGGELD VLSGGELDR LSGGELDRW SGGELDRWE GGELDRWEK GELDRWEKI ELDRWEKIR LDRWEKIRL DRWEKIRLR RWEKIRLRP RPGGKKKYQ PGGKKKYQL GGKKKYQLK GKKKYQLKH KKKYQLKHI KKYQLKHIV KYQLKHIVW YQLKHIVWA QLKHIVWAS RPGGKKKYR PGGKKKYRL GGKKKYRLK GKKKYRLKH KKKYRLKHI KKYRLKHIV KYRLKHIVW YRLKHIVWA RLKHIVWAS · · · MGARASVLSGGKLDKWEKIRLRPGGKKKYKLKHIVWASRELERF · · · s(T,R)s(T,R)

Application #3: HIV vaccine design The net similarity of a vaccine portfolio is its coverage Fraction of database 9-mers the vaccine contains Highest-possible coverage comes from artificially- constructed strains e.g. Mosaics (Fischer et al., Nature Medicine 2006 ) vaccine portfolio size Natural strainsArtificial Mosaic strains (upper bound) Affinity Propagation greedy method (k-medians variant) K= %77.34%80.84% K= %80.14%82.74% K= %81.62%83.64% K= %83.53%84.83%

Summary Exemplar-based clustering offers flexibility in choosing similarities between data points e.g. non-Euclidean, discrete, or non-metric data spaces Affinity Propagation achieves better clustering solutions than other methods number of exemplars, K, is automatically determined simple update equations, easy implementation F AST : # binary scalar operations  # input similarities Many applications in bioinformatics Microarray data, yeast gene-deletion strains, HIV vaccine design

Acknowledgements Affinity Propagation ( Brendan J. Frey (Electrical & Computer Engineering, University of Toronto) Detecting transcripts (genes) using microarray data Tim Hughes + lab (Banting & Best Department of Medical Research, University of Toronto) Yeast gene-deletion strains: Andrew Emili, Gabe Musso, Guri Giaever (Banting & Best Department of Medical Research, University of Toronto) HIV vaccine design: Nebojsa Jojic, Vladimir Jojic (Microsoft Research) Funding for this work provided by:

Affinity Propagation QUESTIONS?

Affinity Propagation

(k) Linear program (exact) medians Comparison of affinity propagation, linear programming, the VSH and k-medians clustering (400 Olivetti face images)

Error and timing comparison of affinity propagation and the VSH (Results from Brusco & Kohn and Frey & Dueck)

Selecting the “right” number of centers Preferences influence the number of detected centers Does affinity propagation find the proper number of centers? Yes.