Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Biology and Biotechnology

Similar presentations


Presentation on theme: "Genome Biology and Biotechnology"— Presentation transcript:

1 Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology 9. The localizome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology (VIB) University of Gent International course 2005 Academiejaar

2 Summary DNA localizome or DNA interactome Protein localizome
Genome-wide mapping of DNA binding proteins Transcription factor binding sites Localization of replication origins Protein localizome High throughput localization of proteins in cellular compartments

3 Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)

4 Genome-wide Analysis of Regulatory Sequences
Gene expression is regulated by transcription factors selectively binding to regulatory regions protein–DNA interactions involve sequence-specific recognition Other factors, such as chromatin structure may be involved Sequence-specific DNA-binding proteins from eukaryotes generally recognize degenerate motifs of 5–10 base pairs Consequently, potential recognition sequences for transcription factors occur frequently throughout the genome Genome-wide surveys of in vivo DNA binding proteins provides a platform to answer these questions

5 Genome-wide Analysis of Regulatory Sequences
Methods combine Large-scale analysis of in vivo protein–DNA crosslinking microarray technology ChIP-on-chip Chromatin Immuno-Precipitation on DNA chips Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)

6 Genome-Wide Location and Function of DNA Binding Proteins
Ren et. al., Science, 290, 2306 (2000) Paper presents proof of principle for microarray-based approaches to determine the genome-wide location of DNA-bound proteins Study of the binding sites of a couple of well known gene-specific transcription activators in yeast: Gal4 and Ste12 Combines data from in vivo DNA binding analysis with expression analysis to identify genes whose expression is directly controlled by these transcription factors

7 Chromatin Immuno Precipitation (Chip) Procedure
Cells are fixed with formaldehyde, harvested, and sonicated DNA fragments cross-linked to a protein of interest are enriched by immunoprecipitation with a specific antibody Immuno-precipitated DNA is amplified and labeled with the fluorescent dye Cy5 Control DNA not enriched by immunoprecipitation is amplified and labeled with the different fluorophore Cy3 DNAs are mixed and hybridized to a microarray of intergenic sequences The relative binding of the protein of interest to each sequence is calculated from the IP-enriched/unenriched ratio of fluorescence from 3 experiments Reprinted from: Ren et. al., Science, 290, 2306 (2000)

8 Modified Chromatin Immuno Precipitation (Chip) Procedure
Genoom Biologie Prof. M. Zabeau Modified Chromatin Immuno Precipitation (Chip) Procedure Close-up of a scanned image of a micro-array containing 6361 intergenic region DNA fragments of the yeast genome ChIP-enriched DNA fragment Fig. 1. The genome-wide location profiling method. (A) Close-up of a scanned image of a microarray containing DNA fragments representing 6361 intergenic regions of the yeast genome. The arrow points to a spot where the red intensity is over-represented, identifying a region bound in vivo by the protein under investigation. (B) Analysis of Cy3- and Cy5-labeled DNA amplified from 1 ng of yeast genomic DNA using a single-array error model (8). The error model cutoffs for P values equal to 103 and 105 are displayed. (C) Experimental design. For each factor, three independent experiments were performed and each of the three samples were analyzed individually using a single-array error model. The average binding ratio and associated P value from the triplicate experiments were calculated using a weighted average analysis method Reprinted from: Ren et. al., Science, 290, 2306 (2000) Academiejaar

9 Proof of concept: Gal4 transcription factor
Identification of sites bound by the transcriptional activator Gal4 in the yeast genome and genes induced by galactose Gal4 activates genes necessary for galactose metabolism The best characterized transcription factor in yeast 10 genes were bound by Gal4 and induced in galactose 7 genes in the Gal pathway, previously reported to be regulated by Gal4 3 novel genes: MTH1, PCL10, and FUR4 Reprinted from: Ren et. al., Science, 290, 2306 (2000)

10 Genome-wide location of Gal4 protein
Genes whose promoter regions are bound by Gal4 and whose expression levels were induced at least twofold by galactose Reprinted from: Ren et. al., Science, 290, 2306 (2000)

11 Role of Gal4 in Galactose-dependent Cellular Regulation
The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains how regulation of several different metabolic pathways can be coordinated increases intracellular pools of uracil Fur4 Pcl10 MTH1 reduces levels of glucose transporter Reprinted from: Ren et. al., Science, 290, 2306 (2000)

12 Conclusions The genes whose expression is controlled directly by transcriptional activators in vivo Are identified by a combination of genome-wide location and expression analysis Genome-wide location analysis provides information On the binding sites at which proteins reside in the genome under in vivo conditions

13 Genomic Binding Sites of the Yeast Cell-cycle Transcription Factors SBF and MBF
Iyer et al., Nature 409: 533 (2001) Paper presents The use of CHIP and DNA microarrays to define the genomic binding sites of the SBF and MBF transcription factors in vivo The SBF and MBF transcription factors are active in the initiation of the cell division cycle (G1/S) in yeast A few target genes of SBF and MBF are known but the precise roles of these two transcription factors are unknown The two transcription factors are heterodimers containing the same Swi6 subunit and a DNA binding subunit MBF is a heterodimer of Mbp1 and Swi6 SBF is a heterodimer of Swi4 and Swi6

14 Genomic targets of SBF and MBF
Genoom Biologie Prof. M. Zabeau Genomic targets of SBF and MBF Figure 3 Genomic targets of SBF and MBF. Percentile ranks of intergenic fragments that meet selection thresholds are inicated (blue–yellow colour scale). Loci with  70% overall nucleotide sequence identity to another yeast locus (potentially crosshybridizing) are indicated (closed circles). The combination of Cy3 and Cy5 labelled probes, the antibody used for IP (if used) and the culture conditions for each experiment are summarized (left panel). Experiments 9, 10, 17 and 18 involved independent crosslinking and IPs. DNA microarrays that included all yeast ORFs and other features, in addition to the intergenic fragments, were used for experiments 3, 8, 13 and 14. Reprinted from: Iyer et al., Nature 409: 533 (2001) Academiejaar

15 In Vivo Targets of SBF and MBF
The CHIP experiments identified 163 possible targets of SBF 87 possible targets of MBF 43 possible targets of both factors Support for the possible in vivo targets Most of the genes downstream of the putative binding sites peak in G1/S Target genes are highly enriched for functions related to DNA replication, budding and the cell cycle In vivo binding sites are highly enriched for sequences matching the defined consensus binding sites Reprinted from: Iyer et al., Nature 409: 533 (2001)

16 Expression Profiles of SBF and MBF Targets
Genoom Biologie Prof. M. Zabeau Transcriptome data for synchronized cell cultures Expression Profiles of SBF and MBF Targets Figure 4 Expression profiles of SBF and MBF targets. a, Expression patterns of SBF and MBF targets are indicated (red–green colour scale). Cell-cycle data are from ref. 12 and sporulation data are from ref. 18. The stages of the cell cycle are: M/G1, yellow; G1, green; S, blue; S/G2l, red; and G2/M, orange. Yellow boxes indicate the presence of consensus binding sites in the intergenic sequences upstream of each ORF (right), and the median percentile rank in IPs of the upstream sequences is also indicated (blue–yellow colour scale), as in Fig. 3. For each set of targets, the top panel contains cell-cycle regulated genes, the bottom panel contains genes that are members of divergently transcribed pairs in which the other member was cell-cycle regulated, and the middle panel contains the remainder of the non-cell-cycle regulated genes. b, Average expression profiles of the cell-cycle regulated targets of SBF and MBF, computed by averaging the log 2 (Cy5/Cy3) ratios. Note the specific induction of MBF targets during sporulation. Reprinted from: Iyer et al., Nature 409: 533 (2001) Academiejaar

17 Expression Profiles of SBF and MBF Targets
Why are two different transcription factors used to mediate identical transcriptional programmes during the cell-division cycle in yeast? A possible answer is suggested by differences in the functions of the genes that they regulate Many of the targets of SBF have roles in cell-wall biogenesis and budding 25% of the MBF target genes have known roles in DNA replication, recombination and repair The results support a model in which SBF is the principal controller of membrane and cell-wall formation MBF primarily controls DNA replication The need for DNA replication and membrane / cell-wall biogenesis may be different in the mitotic and meiotic cell cycle Reprinted from: Iyer et al., Nature 409: 533 (2001)

18 A high-resolution map of active promoters in the human genome
Kim et. al., Nature 436: (2005) Paper presents a genome-wide map of active promoters in human fibroblast cells determined by experimentally locating the sites of RNA polymerase II preinitiation complex (PIC) binding map defines 10,567 active promoters corresponding to 6,763 known genes >1,196 un-annotated transcriptional units Global view of functional relationships in human cells between transcriptional machinery chromatin structure gene expression

19 Identification of active promoters in the human genome
Genoom Biologie Prof. M. Zabeau Identification of active promoters in the human genome Microarrays cover All non-repeat DNA at 100 bp resolution Pol II preinitiation complex (PIC) RNA polymerase II transcription factor IID general transcription factors ChIP of PIC-bound DNA monoclonal antibody against TAF1 subunit of the complex (TBP associated factor 1 ) FIGURE 1. Identification and characterization of active promoters in the human genome. a, Outline of the strategy used to map TFIID-binding sites in the genome. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

20 Results from TFIID ChIP-on-chip analysis
Genoom Biologie Prof. M. Zabeau Results from TFIID ChIP-on-chip analysis FIGURE 1. Identification and characterization of active promoters in the human genome. b, A representative view of the results from TFIID ChIP-on-chip analysis. Top panel, the logarithmic ratio (log2R) of hybridization intensities between TFIID ChIP DNA and a control DNA. Middle panel, RefSeq gene annotation. Bottom panel, a close-up view of two replicate sets of TFIID ChIP-on-chip hybridization signals around the 5' end of the TCFL1 gene. Arrows indicate the position of the TFIID-binding site determined by a peak-finding algorithm. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

21 Characterization of active promoters
Genoom Biologie Prof. M. Zabeau Characterization of active promoters Matched the 12,150 TFIID-binding sites to the 5' end of known transcripts in transcript databases 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of known messenger RNAs 8,960 promoters were mapped within annotated boundaries of 6,763 known genes in the EnsEMBL genes FIGURE 1. Identification and characterization of active promoters in the human genome. d, e, Venn diagrams showing the number of identified promoters that matched EnsEMBL genes (d) or promoters annotated in DBTSS (e). Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

22 The chromatin-modification features of the active promoters
Genoom Biologie Prof. M. Zabeau The chromatin-modification features of the active promoters Validation of active promoters ChIP-on-chip using an anti-RNAP antibody ChIP-on-chip analysis using anti-acetylated histone H3 (AcH3) antibodies anti-dimethylated lysine 4 on histone H3 (MeH3K4) antibodies known epigenetic markers of active genes FIGURE 2. The chromatin-modification features of the active promoters. a, Logarithmic ratios of the ChIP-on-chip hybridization intensities (log2R) of probes from 0.5 kb upstream to 0.5 kb downstream of the identified TFIID-binding sites for TFIID, RNAP, AcH3 and MeH3K4 are plotted in a yellow−blue colour scale for 9,328 transcript-matched promoters. The bottom panel shows the colour scale with corresponding log2R values. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

23 TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene
Genoom Biologie Prof. M. Zabeau TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene FIGURE 2. The chromatin-modification features of the active promoters. b, A detailed view of TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

24 Additional findings Promoters of non-coding transcripts
Genoom Biologie Prof. M. Zabeau Additional findings Promoters of non-coding transcripts Are very similar to promoters of protein coding genes Promoters of novel genes Estimate 13% of human genes remain to be annotated in the genome Clustering of active promoters co-regulated genes tend to be organized into coordinately regulated domains Genes using multiple promoters Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

25 Multiple promoters in human genes
Genoom Biologie Prof. M. Zabeau Multiple promoters in human genes WEE1 gene locus Two different transcripts with alternative 5’ends Encoding different proteins Two different TFIID-binding sites- two promoters Differential transcription during the cell cycle FIGURE 3. Use of multiple promoters by human genes. a, Annotation of the WEE1 gene locus and the corresponding TFIID-binding profile. Black bars over the first and second exons in transcripts indicate the positions of the primers used for analysis of each transcript, using real-time quantitative PCR with reverse transcription (RT−PCR). b, RT−PCR analysis of NM_ and AK transcripts in an asynchronous population of IMR90 cells. c, Real-time quantitative RT−PCR analysis of NM_ and AK transcripts in cell-cycle synchronized populations of IMR90 cells. Transcript levels observed for each cell-cycle phase were normalized to the level observed in the asynchronous population. Error bars represent standard deviation. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

26 The transcriptome of a cell line
Genoom Biologie Prof. M. Zabeau The transcriptome of a cell line Functional relationship between transcription machinery and gene expression correlated genome-wide expression profiles with PIC promoter occupancy Four general classes of promoters Actively transcribed genes Weakly expressed genes Weakly PIC bound genes Inactive genes FIGURE 4. Four distinct classes of promoters define the transcriptome of IMR90 cells. a, A matrix describes the distribution of genes defined by expression and PIC occupancy on the promoter. b, c, Matrices showing the percentages of genes associated with AcH3 (b) or MeH3K4 (c) modification for each of the four classes of genes. Italicized numbers in some boxes represent extrapolation from the 29 ENCODE regions. Reprinted from: Kim et. al., Nature 436: (2005) Academiejaar

27 Genome-Wide Distribution of ORC and MCM Proteins in yeast: High-Resolution Mapping of Replication Origins Wyrick et. al., Science, 294, 2357 (2001) Paper presents Genome-wide location analysis to map the DNA replication origins in the 16 yeast chromosomes by determining the binding sites of prereplicative complex proteins

28 Chromosome Replication In Eukaryotic Cells
initiates from origins of replication distributed along chromosomes Origins of replication comprise autonomously replicating sequences (ARS) ARS contain an 11-bp ARS consensus sequence (ACS) Essential for replication initiation Recognized by the Origin Recognition Complex (ORC) The majority of sequence matches to the ACS in the genome do not have ARS activity Prereplicative complexes at replication origins comprise Origin Recognition Complex (ORC) proteins Minichromosome Maintenance (MCM) proteins Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

29 Prereplicative Complexes At Origins Of Replication
Genoom Biologie Prereplicative Complexes At Origins Of Replication Prof. M. Zabeau The complexities of duplication. The proteins that form the pre-replication complex (pre-RC) required for the initiation of DNA replication. The ORC binds to origins of replication (oris) in the chromosomes and establishes docking sites for the other protein components, such as MCM proteins, of the pre-RC. In metazoan species, geminin, which is degraded during mitosis, inhibits the activity of Cdt1, which is necessary for binding of MCM proteins to the origins of replication. Reprinted from: Stillman, Science, 294, 2301(2001) Academiejaar

30 ORC- and MCM-binding sites compared with known ARSs
Genoom Biologie Prof. M. Zabeau ORC- and MCM-binding sites compared with known ARSs High degree of correlation between MCM and ORC binding sites and known ARSs Correct identification of 88% known ARSs The method can accurately identify the position of ARSs to a resolution of 1 kb or less Figure 1. ORC and MCM binding to previously identified replication origins. Average binding ratios (blue/white) of ORC and MCM proteins to the known ARS-containing loci on chromosomes III and VI (ARS308 and ARS604 were not present on the arrays) and some randomly selected loci are shown. Random selection was accomplished with the "randbetween" function in Excel. The "i" preceding the locus name indicates the intergenic region to the right of the gene. Asterisks indicate randomly selected loci adjacent to or within 1 kb of a predicted origin. Data for other known origins are available in Web table 1 (18). Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Academiejaar

31 Genome-wide Location Of Potential Replication Origins
Genoom Biologie Genome-wide Location Of Potential Replication Origins Prof. M. Zabeau Identification of 429 potential origins on the entire genome Figure 2. Genome-wide location of potential replication origins. The genomic position of each probe present on the arrays is plotted to scale as a green bar (Web table 3) (18). The predicted origin-containing loci (pro-ARS) are plotted to scale as a red bar and named systematically (Web table 2) (18). Variations in width and apparent intensities of green or red color reflect different probe lengths, not hybridization ratios. Probes to Watson and Crick ORFs are plotted on the top and bottom rows; intergenic sequences are plotted on the center rows. Asterisks indicate known ARSs that were not identified. Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Academiejaar

32 Conclusions The ChIP-based method identified the majority of origins found in the analysis of genome-wide replication timing in yeast and provides direct, high-resolution mapping of potential origins Similar approaches identified origins in other organisms For example: Coordination of replication and transcription along a Drosophila chromosome MacAlpine et al., Genes & Dev. 18: (2004) Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

33 Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)

34 Global analysis of protein localization in budding yeast
Huh et. al., Nature 425, (2004) Paper presents An approach to define the organization of proteins in the context of cellular compartments involving the construction and analysis of a collection of yeast strains expressing full-length, chromosomally tagged green fluorescent protein fusion proteins

35 Experimental Strategy
Systematic tagging of yeast ORFs with green fluorescent protein (GFP) GFP is fused to the carboxy terminus of each ORF Full length fusion proteins are expressed from their native promoters and chromosomal location The collection of yeast strains expressing GFP fusions was analyzed by fluorescence microscopy to determine the primary subcellular localization of the fusion proteins Defines 12 categories co-localization with red fluorescent protein (RFP) markers to refine the subcellular localization Defines 11 additional categories Reprinted from: Huh et. al., Nature 425, (2004)

36 Construction of GFP fusion proteins
For each ORF a pair of PCR primers was designed Homologous to the chromosomal insertion site Matching a GFP – selectable marker construct Yeast was transformed with the PCR products to generate Strains expressing chromosomally tagged ORFs Reprinted from: Huh et. al., Nature 425, (2004)

37 Representative GFP Images
Nucleus Nuclear periphery ER Bud neck mitochondrion Lipid particle Reprinted from: Huh et. al., Nature 425, (2004)

38 GFP and RFP Co-localization Images
Nucleolar marker Reprinted from: Huh et. al., Nature 425, (2004)

39 Global results Constructed ~6.000 ORF-GFP fusions
22 categories Constructed ~6.000 ORF-GFP fusions 4.156 had localizable GFP signals (~75% of the yeast proteome) Good concordance with data from earlier studies GFP does not affect the location Localized 70% of the new proteins Major compartments: cytoplasm (30%) and the nucleus (25%) 20 other compartments: 44% of the proteins Most the proteins can be located in discrete cellular compartments Reprinted from: Huh et. al., Nature 425, (2004)

40 The proteome of the nucleolus
Detected 164 proteins in the nucleolus Plus 45 identified in other studies Data are consistent with MS analysis of human Nucleolar proteins Allows identification of yeast-human orthologs Reprinted from: Huh et. al., Nature 425, (2004)

41 Transcriptional co-regulation and subcellular localization are correlated
33 transcription modules Co-regulated genes Reprinted from: Huh et. al., Nature 425, (2004)

42 Conclusion The high-resolution, high-coverage localization data set
represents 75% of the yeast proteome classified into 22 distinct subcellular localization categories, Analysis of these proteins in the context of transcriptional, genetic, and protein–protein interaction data provides a comprehensive view of interactions within and between organelles in eukaryotic cells. helps reveal the logic of transcriptional co-regulation Reprinted from: Huh et. al., Nature 425, (2004)

43 Recommended reading DNA-interactome
Genome-Wide Location of DNA Binding Proteins Ren et. al., Science, 290, 2306 (2000) Map of active promoters in the human genome Kim et. al., Nature 436: (2005) Global analysis of protein localization in yeast Huh et. al., Nature 425, (2004)

44 Further reading Genome-Wide Location of DNA Binding Proteins
Genomic Binding Sites of the Yeast Cell-cycle Transcription Factors SBF and MBF Iyer et al., Nature 409: 533 (2001) High-Resolution Mapping of Replication Origins Wyrick et. al., Science, 294, 2357 (2001)


Download ppt "Genome Biology and Biotechnology"

Similar presentations


Ads by Google