Clustering of DNA Microarray Data Michael Slifker CIS 526.

Clustering of DNA Microarray Data Michael Slifker CIS 526

DNA Microarrays Measure gene expression in a sample for thousands of genes simultaneously Used to compare gene expression among samples –Between individuals or treatments –Over time –Between normal tissue and tumor –Assess normal biological variation

Microarray Process Single-stranded DNA is printed onto slide Extract mRNA from cells Experimental mRNA sample & reference sample are fluorescently labelled (Cy3-green, Cy5-red) RNA samples are hybridized onto slide – bind to complementary DNA Laser scanning – fluorescent labels allow relative levels of bound mRNA to be measured Gridding, background correction, log-ratio transformation, normalization, analysis (finally!)

Red = low expression relative to reference Green = high expression relative to reference Yellow = similar expression in two samples Black = no expression in either sample

Example DLBCL – Diffuse large B-cell lymphoma (Alizadeh et al, 2000) ~18,000 genes x 96 samples of normal and malignant leukocytes Clinical evidence of great heterogeneity in terms of survival Question: Are there subclasses of DLBCL that can be discovered by looking at gene expression profiles? (Answer: yes)

Why cluster? Very large numbers of genes and highly complex systems/pathways render clustering essential for interpretation and visualization Discover new tumor subclasses Describe common expression profiles (e.g., cell cycle)

What to cluster? Clustering genes: –Look for groups of genes with similar expression profiles – may identify genes that are involved in biochemical pathways Clustering samples: –Do clusters conform to known categories? –Can new structure be discovered (e.g., new subclasses of tumor)? Clustering both genes and samples at once

Clustering methods Hierarchical (agglomerative) – most common K-means, PAM Self-organizing maps (SOMs) PCA clustering Ensemble methods “Fuzzy” methods – genes can belong to more than one cluster Model-based methods (e.g., mixtures of Gaussians)

Challenges Noisy data in highly dimensional space Many choices of algorithm and algorithmic parameters –What distance measure? –What linkage? –How many clusters? How can we assess quality/reliability?

Two main sample clusters can be seen Genes correspond to two different types of B-cell Clusters are associated with differential survival beyond traditional clinical indicators

Conclusions To be useful, clustering of microarray data must ultimately be informed by biology Large number of genes and complexity of pathways means clustering is an essential part of most microarray analyses There is no “best” method – choices of distance, linkage, algorithm, gene filtering criteria. As much art as science

Clustering of DNA Microarray Data Michael Slifker CIS 526.

Similar presentations

Presentation on theme: "Clustering of DNA Microarray Data Michael Slifker CIS 526."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering of DNA Microarray Data Michael Slifker CIS 526.

Similar presentations

Presentation on theme: "Clustering of DNA Microarray Data Michael Slifker CIS 526."— Presentation transcript:

Similar presentations

About project

Feedback