Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering.

Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering

cDNA Microarrays

O O O O OO O O O O Light (deprotection) HO HO O O O T T O O OT T O O O T T C C OT T C C O Light (deprotection) T T O O OT T O O O C A T A TC A T A T A G C T GA G C T G T T C C GT T C C G Mask Wafer Mask Wafer T – C – Repeat Synthesis of Ordered Oligonucleotide Arrays

The Raw Data Expression levels, “Raw Data” experiments genes Entries of the Raw Data matrix: Ratio values Absolute values Distributions… Row = gene’s expression pattern / fingerprint vector Column = experiment/condition’s profile

Computational Challenges Normalization: How does one best normalize thousands of signals from same/different conditions/experiments? Identify differentially expressed genes between experiments Clustering: Partition genes into subsets that manifest similar exp. pattern Classification: Given partition of the conditions into types, classify the types of new conditions Biclustering: find subsets of genes and conditions that manifest a common exp. sub- pattern

Clustering: Objective Group elements (genes) to clusters satisfying: Homogeneity: Elements inside a cluster are highly similar to each other. Separation: Elements from different clusters have low similarity to each other. Needs formal objective functions Most useful versions are NP-hard.

K-means clustering Lloyd 57,MacQueen, 65 Initialize an arbitrary partition P into k clusters. For cluster j, element i  j, E P (i,j)=cost of soln. if i is moved to cluster j. Pick E P (r,s) that is minimum; move s to cluster r if improving Repeat until no improvement possible Requires knowledge of k k-median problem: E P (i,j)=  clusters p  i in cluster p d(v i,c p )

Hierarchical Clustering Form a tree-hierarchy of the input elements satisfying: More similar elements are closer along the tree. Tree distances reflect element similarity No explicit partition into clusters. Hierarchical Representations: 134 2 134 2 2.8 4.5 5.0

Hierarchical Clustering Average Linkage Input: Distance matrix D ij; Initially each element is a cluster. n r - size of cluster r Find min element D rs in D; merge clusters r,s Delete elts. r,s, add new elt. t with D it =D ti =n r /(n r +n s )D ir + n s /(n r +n s ) D is Repeat A General Framework Input: Distance matrix D ij; Initially each element is a cluster. Find min element D rs in D, merge clusters r,s Delete elts. r,s, add new elt. t with D it =D ti =  r D ir +  s D is +  |D ir -D is |

Hierarchical clustering of GE data Eisen et al., PNAS 1998 Growth response: Starved human fibroblast cells, added serum Monitored levels of 8600 genes over 13 time- points t ij - fluorescense levels of target gene i in condition j; r ij – same for reference D ij = log(t ij /r ij ) D* ij = [D ij –E(D i )]/std(D i ) Similarity of genes k,l: S kl Applied average linkage method Ordered leaves by the tree

Data randomly permuted within rows (1), columns (2) and both(3)

Yeast GE data

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Science 286 (Oct 1999) 531-537 Computational paper: Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander Proc. RECOMB 2000 ppt Source: Elashof-Horvath UCLA course, Statistical Analysis of DNA Microarray Data http://www.genetics.ucla.edu/horvathlab/Biostat278/Biostat278.htm

Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: by sites; by morphology, etc; Limitations of morphology classification: tumors of similar histopathological appearance can have significantly different clinical courses and response to therapy; Traditionally cancer classification relied on specific biological insights challenges: –finer classification of morphologically similar tumors at the molecular level; –systematic and unbiased approaches;

Background: Cancer Classification (Continued) Three challenges: Class prediction (classification) : assignment of particular tumor samples to already-defined classes. Feature selection : Identify the most informative genes for prediction Class discovery : defining previously unrecognized tumor subtypes ( = clusters)

Background: Leukemia Acute leukemia: variability in clinical outcome and subtle differences in nuclear morphology Subtypes: acute lymphoblastic leukemia (ALL) or acute myeloid leukemia (AML); ALL subcategories: T-lineage ALL and B-lineage ALL; Particular subtypes of acute leukemia have been found to be associated with specific chromosomal translocations; No single test is currently sufficient to establish the diagnosis, but a combination of different tests in morphology, histochemistry and immunophenotyping etc. Although usually accurate, leukemia classification remains imperfect and errors do occur;

Objective Develop a systematic approach to cancer classification based on gene expression data from microarray Use leukemia as test case Method: Biological Samples & microarrays Learning set: 38 bone marrow samples (27 ALL, 11 AML) obtained from acute leukemia patients at the time of diagnosis; test set: 34 leukemia samples (24 bone marrow and 10 peripheral blood samples); RNA from cells hybridized to high-density Affymetrix oligo arrays (6817 human genes)

Feature selection 50 genes mostly highly correlated with AML-ALL:

Class predictor The prediction of new samples assigned 36 of 38 samples as either AML or ALL and the remaining 2 are uncertain. All predictions agree with patients’ clinical diagnosis.

Promoter analysis Position Weight Matrix (PWM) a.k.a Position Specific Scoring Matrix (PSSM) Example: 00.20.700.80.1 A 0.60.40.10.50.10 C 0.40.10.500 G 0.300.10 0.9 T ATGCAGGATACACCGATCGGTA 0.0605 GGAGTAGAGCAAGTCCCGTGA 0.0605 AAGACTCTACAATTATGGCGT 0.0151 Need to set score threshold

Computational approaches to promoter analysis Look for overrepresented BSs in groups of promoters –Obtained by clustering expression profiles –Of genes with a common known function (e.g. from GO annotations) –From chip 2 data – requires knowledge of the TF, and an antibody. - Use a combination of sources De-novo or using known TF signatures

Location analysis Ren et al., Science 290:2306-2309 (2000)..

Expander analysis and visualization tool for gene expression data, including: Example: Human ATM-dependent Transcriptional Response to Ionizing Radiation Clustering CLICK, KMeans SOM hierarchical Promoter analysis PRIMA http://www.cs.tau.ac.il/~rshamir A. Maron, R. Sharan

ATM-dependent Transcriptional Response to Ionizing Radiation DNA damage response modulates many signaling pathways, including lesion processing, repair, cell cycle checkpoints and apoptotic pathway. ATM protein kinase is a master regulator of cellular response to double strand breaks. Goal: identify the transcriptional network.

Experimental Design Gene expression profiles: wild-type and Atm-/- mice ± ionizing radiation. Thymus tissue, time points: 0, 30 min, 120 min S. Rashi, R. Elkon, N. Weizman, C. Linhart, N. Amariglio, N. Orlev, G. Sternberg, A. Barzilai, Y. Shiloh Filtering ‘responding genes’ 1206 genes whose expression level is changed by >1.75 fold Clustering 6 main clusters generated by the CLICK algorithm Promoter Analysis NF-  B and p53 found by PRIMA analysis

Atm-dependent responding genes: The genes respond to radiation only in wild type Major Gene Clusters – Irradiated Thymus

Atm-dependent 2 nd wave of responding genes Major Gene Clusters – Irradiated Thymus

Similar response in both genotypes

???? Hidden layer ? ATM g3g13g12g10g9g1g8g7g6g5g4g11g2 Observed layer Clues are in the promoters Transcription Factors p53TF-CTF-B TF-A TF-D

PRIMA: PRomoter Integration in Microarray Analysis Assumption: Co-expression → Transcriptional co-regulation → common cis-regulatory promoter elements Step 1: Identification of co-expressed genes using microarray technology and clustering algorithms Step 2: Computational identification of transcription factors whose binding site signatures are significantly over-represented among promoters of co-expressed genes R. Elkon, C. Linhart, Y. Shiloh

PRIMA - Results

P-valueEnrichment factor Transcription factor PRIMA - Results NF-  B 5.1 3.8x10 -8 p534.29.6x10 -7 Hypothesis: NF-  B and p53 mediate the late response to DNA damage.

Expander analysis and visualization tool for gene expression data. Clustering CLICK, KMeans SOM hierarchical Promoter analysis PRIMA Biclustering SAMBA http://www.cs.tau.ac.il/~rshamir A. Maron, R. Sharan

Biclustering conditions genes Clusters: global partition of genes according to common expression pattern across all conditions Genes have multiple functions Conditions may be diverse Bicluster: subsets of genes with similar behavior in a subset of conditions  Finer, local analysis  Must use a reliable statistical model to guarantee specificity

A. Tanay, R. Sharan properties genes Goal: find high similarity submatrices SAMBA: Statistical-Algorithmic Method for Bicluster Analysis

Results – Modules Genes Properties Go annotation TF Binding assay Growth sensitivity Gene expression

Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering.

Similar presentations

Presentation on theme: "Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering.

Similar presentations

Presentation on theme: "Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering."— Presentation transcript:

Similar presentations

About project

Feedback