Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.

Similar presentations


Presentation on theme: "Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha."— Presentation transcript:

1 Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha

2 Outline Transcription factors interpret the regulatory information encoded in DNA to induce or repress gene expression Comparative genomics has been used to find the regulatory sites in yeast genome Looking at sequence alone does not reveal if a putative site is actually functioning as a binding site ChIP-chip data (also called “location data”) provides such information Harbison et al combine these two types of data

3 Source: http://www.chiponchip.org/ Chip-on-chip

4 Data Genome-wide “location analysis” using ChIP- on-chip Each experiment done with one TF 203 TFs experimented with, in “rich media conditions” 84 of these TFs also experimented with in at least one other condition Why? –Binding is not just a function of the presence of the site. It is also a function of the presence of the TF –TF may not be present in every condition

5 Data How were the 84 TFs (to be tested in additional conditions) chosen? If there was prior evidence that they play a role in that additional condition

6 ChIP-on-chip results 11,000 unique interactions between TFs and promoter regions identified A matrix of (m x n), where m is the number of TFs (203), n is the number of yeast genes (~6000) 11,000 of the entries were “1”, meaning the binding was significant –Need post-processing of binding affinities to assess if it is statistically significant

7 The next step: bring in the sequence Genome-wide “location data” or “binding data” combined with sequence data For each TF, collect all sequences bound by it –These are promoter length sequences, not exact binding sites Apply motif finding programs to estimate what the binding motif is (where the binding sites are)

8 Motif finding Only consider TFs that bound >= 10 sequences –147 such TFs Run 6 different motif-finders on the bound sequences 68000 motifs discovered ! A large number of these motifs are “variants” of the same motif, i.e., similar to each other

9 Motif finding Using clustering of motifs, and stringent statistical tests, identify high confidence motifs from among these 68000 motifs High confidence motifs found for 116 of the 147 TFs whose bound sequences were analyzed Now require that the motif also be conserved across other related yeast species 65 TFs with single, high-confidence, phylogenetically conserved motifs were found

10 Motif finding The 65 motifs were a mix of “known” and novel motifs. –That is, some of the motifs were similar to already known motifs –21 TFs’ motifs were new Took these 65 motifs, as well as other known motifs from the literature to form a compendium of 102 motifs for further analysis

11 Source: Harbison et al. Nature 431, 99-104(2 September 2004)

12 Next step We now have motifs for 102 TFs Next step is to locate binding sites of each TF in the whole genome Equivalent to finding matches to each motif in the whole genome Finding matches: –Require a high sequence similarity –Require phylogenetic conservation –Require high binding to that region by TF

13 Mapping sites in the genome “Map” gave 3353 sites (“interactions”) within 1296 promoters This is different from simply locating matches to motif Because TF binding information is also incorporated Under different conditions, only a subset of the binding sites in the map are actually occupied

14 Source: Harbison et al. Nature 431, 99-104(2 September 2004)

15 Does the map make sense? The map is telling us which TFs bind which actual sites in the genome, and hence which genes are being regulated In many cases, the known functions of the genes predicted to be targeted by a TF are consistent with the known function of the TF

16 More insights from the map Binding sites are not uniformly distributed over the promoter regions Sharply peaked distribution Very few sites in 100 bp immediately upstream of the genes Most sites (74%) are between 100 and 500 bp of gene Source: Harbison et al. Nature 431, 99-104(2 September 2004)

17 Arrangements of sites Specific arrangements of binding sites in a promoter Simple arrangement: one binding site for one TF Another arrangement: Repeats of a particular binding site –Allows for “graded response” –Some TFs show a significant preference for repeated sites

18 Source: Harbison et al. Nature 431, 99-104(2 September 2004)

19 Arrangements of sites Another arrangement: Binding sites for multiple TFs –“Combinatorial regulation”: In different conditions, different combinations of binding sites (and TFs) direct different gene expression –Genes whose promoters have such arrangement of sites are required for multiple pathways, and regulated in environment-specific fashion

20 Source: Harbison et al. Nature 431, 99-104(2 September 2004)

21 Arrangements of sites Another arrangement: Binding sites for specific pairs of TFs occur more frequently in same promoter than expected by chance –The two TFs perhaps interact physically in doing their job

22 Source: Harbison et al. Nature 431, 99-104(2 September 2004)


Download ppt "Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha."

Similar presentations


Ads by Google