Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Biology and Biotechnology

Similar presentations


Presentation on theme: "Genome Biology and Biotechnology"— Presentation transcript:

1 Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology 7. The phenome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology (VIB) University of Gent International course 2005 Academiejaar

2 Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles Localizome Cellular, tissue location DNA Interactome Protein-DNA interactions Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)

3 The phenome: genome-wide phenotypic analysis
Classical (forward) genetic screens Saturated mutagenesis to identify all the genes that exhibit a specific phenotype Draw back characterization of the gene through positional cloning is slow and laborious Phenomics platforms: Reverse genetics Systematic alteration of gene function to identify the functions of predicted genes Advantage Identity of the gene is known beforehand Phenomics platforms Transposon-based mutant libraries Extensively used in yeast and Arabidopsis RNA interference (RNAi)-based mutant libraries the technology of choice for gene knock-outs

4 Ross-Macdonald et al., Nature 402: 413 (1999)
Large-scale analysis of the yeast genome by transposon tagging and gene disruption Ross-Macdonald et al., Nature 402: 413 (1999) Paper presents a transposon-tagging strategy to perform large-scale analysis of gene function in yeast to simultaneously study phenotypes gene expression protein localization a large collection (>11,000 strains) of yeast mutants carrying a transposon inserted in genes Tagged 30% of all yeast genes

5 Transposon-based Method for the Large-scale Functional Genomics
Minitransposon (mTn) Derived from the bacterial transposable element Tn3 LacZ reporter gene lacking an initiator methionine and upstream promoter sequence b-galactosidase (b-gal) is produced when lacz is fused in-frame to the protein-coding sequence Haemaglutinin (3xHA) epitope tag Recombination of the lox sites produces epitope tagged proteins No ATG: gene fusions Haemaglutinin tag Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

6 Minitransposon mTn–3xHA/lacZ
Gene-lacZ fusion protein Cre-mediated recobination Gene-3xHA fusion protein

7 High Throughput Insertion Mutagenesis
Genoom Biologie Prof. M. Zabeau Yeast genomic DNA library mutagenized with mTn plasmids were digested with Not I transformed into a diploid yeast strain Integrated by homologous recombination Transformants were assayed for b-gal activity The mTn insertion project. Most steps were performed using a Robbins Hydra 96-channel dispenser; all strains are maintained in a 96-well format. Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999) Academiejaar

8 Analysis of the MTn Insertion Strains
Identified 11,232 strains expressing lacZ Sequenced the site of insertion in 6,358 strains 5,442 in or within 200 bp of an annotated ORF Insertions affect 1,917 different ORFs (~30%) Identified 328 previously non-annotated ORFs 52% overlap an ORF in the antisense direction 33% are in intergenic regions - small ORFs 15% overlap an ORF in the same orientation in a different frame In the annotation genes are missed because of Arbitrary lower size limit of 100 amino acids Not annotating partially overlapping ORFs Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

9 Analysis of Mutant Phenotypes
Phenotypes of essential genes 14.1% of the insertions are non viable in haploid strains Represent genes that are essential for viability Large scale scoring of “other” phenotypes growth under 20 different growth conditions 'phenotypic macroarrays' (96-well format) Insertions in 407 genes (20%) result in a phenotype different from the wild type The majority (80%) of the insertions exhibit no phenotype! Expand the range of phenotypic assays Utilize more precise criteria for phenotypic analysis Growth rate Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

10 Phenotypic Macroarray Analysis of Yeast Mutants
Genoom Biologie Prof. M. Zabeau Phenotypic Macroarray Analysis of Yeast Mutants Figure 3 Phenotypic macroarray analysis. a, Examples of 21 cm 21 cm macroarray plates scoring growth on test media: YPD, YPD supplemented with 20 µg ml-1 benomyl, YPD with 66.7 µg ml-1 calcofluor, YPD with 46 µg ml-1 hygromycin, YPGlycerol, YPD supplemented with 12 µg ml-1 calcofluor. Arrows indicate strains mutated for genes functioning in cellular respiration (NDI1) or cell-wall biogenesis (ECM33, SLG1, YOR275C)17. b, Macroarray analysis of yeast mutants deficient in oxidative phosphorylation and cell-wall maintenance. Mitochondrial mutants unable to carry out oxidative phosphorylation were identified as white (rather than red) colonies on YPD medium; red pigment formation within ade2 mutant strains requires oxidative phosphorylation28. Representative respiratory genes identified within our screen are labelled with arrows (PET122, PET56, OXA1). Genes functioning in cell-wall maintenance were characterized through macroarray analysis of mutants grown on YPD overlaid with agar containing BCIP. BCIP is a chromogenic substrate of alkaline phosphatase, which when released from vacuoles during cell lysis cleaves BCIP, thereby staining lysed mutants blue. Arrows highlight a sample of genes (IMP2', VRP1, SLA1)17 involved in cell-wall maintenance. mutants deficient in oxidative phosphorylation mutants deficient in cell-wall maintenance Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999) Academiejaar

11 Genomic Scale Analysis of Phenotypes
Phenotypes observed Expected phenotypes genes involved in microtubule functions - sensitive to benomyl Unexpected phenotypes Genes involved in cell wall biogenesis - stress-related responses Pleiotropic phenotypes: observed in apparently unrelated assays Sensitivity to hydroxyurea, benomyl and calcofluor Pleitrophic mutants are the rule Many mutants exhibit phenotypes in specific subsets of conditions Mutants appear to ‘group' into discrete classes “pheno-clusters” represent groups of mutants having common disruption phenotypes Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

12 Cluster Analysis of the Phenotypic Data
Transformants sorted by increasing distance from the cluster average Growth conditions Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

13 Cluster Analysis of the Phenotypic Data
Pheno-clusters predict the cellular functions associated with an ORF 'YPG' cluster: mutants that do not grow on glycerol Cluster highly enriched in genes involved in cellular respiration predict the function of uncharacterized genes “Guilt by association” Assay-clusters ‘Two-dimensional cluster' analysis of the data groups phenotypic assays identifying strains exhibiting similar phenotypic profiles Assays for growth in hydroxyurea and MMS are closely associated identify mutants defective in DNA metabolism Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

14 Analysis of Subcellular Localization of Proteins
Genoom Biologie Prof. M. Zabeau Analysis of Subcellular Localization of Proteins Immuno fluorescence DAPI HAT-epitope tagged proteins sub cellular localization Immunofluorescence with antibodies against the HAT-epitope Analysis of 1,340 strains 201 proteins localized in cellular compartments nucleus, nucleolus, mitochondria, plasma membrane, cell neck and spindle pole body 214 proteins localized in the cytoplasm cytoplasm actin filaments Figure 5 Immunolocalization of epitope-tagged proteins. Left, examples of immunofluorescence patterns in vegetative cells stained with monoclonal antibody against HA. Right, the same cells stained with the DNA-binding dye 4',6-diamidino-2-phenylindole (DAPI). a, Diffuse cytoplasmic staining in a transformant HAT-tagged through an in-frame mTn insertion event in YLR249W. b, A punctate pattern of cytoplasmic staining resulting from HAT-tagging of Crn1, an actin-binding protein that bundles actin filaments29. c, Localization to the plasma membrane plasma membrane Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999) Academiejaar

15 Conclusions Insertion strategy generates in a single mutagenic event
reporter gene fusions epitope-tagging constructs insertion alleles Random approaches are intrinsically limited in achieving saturation mutagenesis Small genes are less likely to be mutagenized than are large genes to mutagenize 90% of the yeast genes an additional 30,000 mTn insertions in yeast ORFs would be required This amounts to a 5 to 10 fold redundancy For multicellular organisms collections of to insertions are needed Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)

16 RNA Interference (RNAi)
Phenomenon first discovered in transgenic plants “anti-sense mediated gene silencing” Anti-sense constructs reduce the expression of the cognate gene “co-suppresion” Enhanced gene expression constructs occasionally lead to reduced gene expression “related” phenomena were later found in C. elegans Small temporal RNAs (stRNAs) responsible for the control of gene expression during development stRNAs contain sequences complementary to specific target mRNAs Broader significance of RNA-mediated gene regulation became apparent in recent years

17 RNA-mediated Gene Regulation
Small regulatory RNAs are involved in two pathways for RNA-mediated gene regulation: micro RNA pathway (miRNAs) responsible for the control of gene expression during development miRNAs contain sequences complementary to specific target mRNAs – specific silencing of one or more target genes Short interfering RNA pathway (siRNAs) responsible for gene silencing by RNA interference (RNAi) dsRNA triggers destruction of a homologous mRNA that has the same sequence as one of the dsRNA strands guide DNA modifying (methylating) enzymes to corresponding genomic regions converting these regions to heterochromatin

18 RNA-mediated Gene Regulation Pathways
Genoom Biologie RNA-mediated Gene Regulation Pathways Prof. M. Zabeau micro RNA pathway short interfering RNA pathway 22bp dsRNA 21-23bp dsRNA (Left) Double-stranded RNA molecules about 22 nucleotides in length (stRNAs) regulate the translation of specific mRNAs during development. The antisense strand of the stRNA (blue) forms a characteristic interrupted hybrid with the 3'-untranslated region of the target mRNA (red), which then cannot be translated into protein. (Right) Double-stranded siRNAs are also ~22 nucleotides long. The antisense strand of the siRNA (blue) forms a continuous hybrid with the mRNA target (red), which is then degraded. Both types of small RNA are formed by cleavage of double-stranded RNA precursor molecules by the enzyme Dicer (pale blue) (1, 2). When processing the nonuniform dsRNA precursors of stRNAs, Dicer requires help from ALG-1 and ALG-2 (green), proteins of the RDE family (2). Dicer together with RDE-4 is needed to process long uniform dsRNA precursor molecules into siRNAs. The differing effects of mature stRNAs (2) and siRNAs (14) on the fate of target mRNAs depend on components specific to each pathway. Heterochromatin Reprinted from: Ambros V., Science, 293, 811 (2001) Academiejaar

19 RNA-mediated Gene Regulation
RNA-mediated gene regulation is ancient in origin Evolved before the divergence of plants and animals Two pathways are interconnected and share molecular components Highly conserved nuclease Dicer Small dsRNAs about 21 to 23 nucleotides in length RNA Interference (RNAi) is thought to be a primitive genetic surveillance mechanism that protects cells from viruses RNAi is well suited for large scale gene knockout First pioneered in C. elegans Now used in all model organisms

20 RNA Interference (RNAi) in C. Elegans
Injection of anti-sense or double stranded RNA into cells can be used to interfere with the function of endogenous genes results in silencing of the corresponding gene The RNA interference process involves a catalytic or amplification component Only a few molecules of injected dsRNA are required injection of dsRNA into the extracellular body cavity in C. Elegans, results in silencing in the whole animal Experimentally, gene silencing is achieved in nematodes Feeding worms E. coli expressing dsRNAs

21 RNA Interference (RNAi) in C. Elegans
dsRNA is expressed in E. coli by bi-directional transcription by phage T7 RNA polymerase Open Reading Frame T7 promoter T7 promoter Feeding on wt E.coli Feeding on E.coli expressing ds GFP RNA Reprinted from: Timmons et al., Nature 395: 854 (1998)

22 Functional Genomic Analysis of C
Functional Genomic Analysis of C. Elegans Chromosome I by Systematic RNAi Fraser et al., Nature 408: 325 (2000) Paper reviews/presents RNAi approach to systematically investigate loss-of-function phenotypes of predicted genes of C. Elegans chromosome I by feeding worms with E. coli bacteria that express double-stranded RNA Demonstrates that high-throughput genome-wide RNAi screens can be performed using a library of dsRNA-expressing bacteria The specificity of RNAi make it an ideal tool for investigating gene function

23 Functional Analysis of Chromosome I Genes
Constructed a library of E.coli expressing dsRNA for the predicted genes on chromosome I 2,416 predicted genes (87.3% of the predicted genes) Screened the library for detectable phenotypes L3–L4 stage worms were were fed for 72 h at 15 °C on bacterial cultures for each targeted gene Phenotypes of adults and progeny were scored Embryonic lethal (Emb) 10–100% embryonic lethality Sterile (Ste) brood size of <= 10 (wild-type worms typically give > 50) Progeny sterile (Stp) brood size of <= to 10 in the progeny of fed worms Reprinted from: Fraser et al., Nature 408: 325 (2000)

24 Functional Analysis of Chromosome I Genes
Genoom Biologie Functional Analysis of Chromosome I Genes Prof. M. Zabeau Assigned a phenotype to 13.9% of the genes Confirmed 90% of the known embryonic lethal genes number of genes with known phenotypes increased from 70 to 378 Not all genes give a RNAi phenotype Did not find phenotypes for some previously characterized genes genes involved in neuronal function Highly conserved genes are more likely to have an RNAi phenotype than genes that show no conservation >72% of genes with an RNAi phenotype have a Drosophila match Reprinted from: Fraser et al., Nature 408: 325 (2000) Academiejaar

25 Functional Analysis of Chromosome I Genes
Embryonic lethal (Emb) mutants: essential genes genes involved in the basal cellular machinery: RNA-binding proteins, chromosome condensation and separation, components of signal transduction pathways genes involved in basic metabolic processes largest class: >60% of the mutants Uncoordinated and post-embryonic mutants High proportion (30% to 40%) of genes of unknown function genes that regulate the development are still largely unknown Reprinted from: Fraser et al., Nature 408: 325 (2000)

26 Biochemical Function and RNAi Phenotype
Genoom Biologie Prof. M. Zabeau Biochemical Function and RNAi Phenotype Figure 3 Functional classes of Emb, Ste, Unc and Pep genes. a, Predicted products of genes that gave Ste, Emb, Unc or viable post-embryonic (Pep) RNAi phenotypes were placed into functional classes as described in Methods. Genes whose products could not be accurately classified into any of the eight functional classes were placed into the unknown category (white). Numbers denote the percentage of genes in each functional class; pie charts illustrate these numbers graphically. b, Pie charts show distributions of predicted gene products grouped as follows: basal metabolic category (red) comprises the classes of DNA, RNA, protein and intermediate metabolism; specialized functions (blue) comprises cell-cycle and chromosome dynamics, cell biology and cellular structure, gene-specific transcription factors and signal transduction. Worms show the tissue affected in each phenotypic class shaded in grey. c, Distribution of genes giving rise to non-viable RNAi phenotypes in C. elegans (worm) or to non-viable phenotypes following disruption in S. cerevisiae (yeast). Reprinted from: Fraser et al., Nature 408: 325 (2000) Academiejaar

27 Rual et. al., Genome Research 14:2162-2168(2004)
Genoom Biologie Prof. M. Zabeau Toward Improving Caenorhabditis elegans Phenome Mapping With an ORFeome-Based RNAi Library Rual et. al., Genome Research 14: (2004) Paper presents the use of the C. elegans ORFeome as a starting point for high throughput RNAi with enhanced flexibility increasing the possibilities for phenome mapping in C. elegans additional HT-RNAi libraries can be generated to perform gene knockdowns under various conditions Academiejaar

28 Generating RNAi resources from flexible Gateway ORFeome and promoterome collections
Reprinted from: Rual et. al., Genome Research 14: (2004)

29 Screening the ORFeome-RNAi v1.1 Library
Genoom Biologie Prof. M. Zabeau Screening the ORFeome-RNAi v1.1 Library The C. elegans ORFeome v1.1 library contains 11,942 ORFs cloned as Gateway Entry clones ORFs were transferred into the RNAi Destination vector (T7 promoter vector) Genome-Wide Phenotypic Analysis RNAi-by-feeding at the first larval stage observed phenotypes for 1066 (10%) of the ORFs tested Figure 2 Overview of the RNAi screening procedure. (A) Overnight cultures of 96 bacterial clones are inoculated in sixteen 24-wells plates (six clones per plate and four wells per clone). After overnight induction of the dsRNA synthesis, three to 10 worms synchronized at the L1-stage (N2 or lin-35 strain) were deposited into the wells. (B) RNAi experiments were performed at 20°C. We observed a wide range of phenotypes across development. Reprinted from: Rual et. al., Genome Research 14: (2004) Academiejaar

30 Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genoom Biologie Prof. M. Zabeau Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells Boutros et. al., Science, 303, (2004) Paper presents a high-throughput RNA-interference (RNAi) screen of nearly all (91%) predicted Drosophila genes Using in Drosophila cultured cells to characterize genes in cell growth and viability Treatment of cells with dsRNA leads to detect specific phenotypes Systematic screen for loss-of-function phenotypes Genome-wide RNAi performed on two embryonic cell lines Established a quantitative assay of cell death: z-score Academiejaar

31 Genome-wide RNAi screen for viability defects
Genoom Biologie Prof. M. Zabeau Genome-wide RNAi screen for viability defects Fig. 1. Genome-wide RNAi screens result in highly reproducible growth and viability defects. Results for Kc167 cells are shown; similar results were observed for S2R+ cells (22) (table S1). (C) Results from one genome-wide RNAi screen, after 5 days dsRNA treatment. Each RNAi experiment is represented by a shaded box (a single well), arranged by 384-well plates as outlined in upper left. Results in each plate were mean-centered before overall analysis. Gray values indicate z score, with darker shades representing below-average results. Each 384-well plate had four control wells containing either D-IAP1 or the negative controls gfp, Rho1, or no dsRNAs. The D-IAP1 control phenotypes are evident as the dark boxes in the upper left corner of each plate, indicative of dying cells and a lower signal. (D) Example of highly reproducible phenotypes with similar z scores from two independent RNAi screens [enlarged from (C) and from duplicate screen in fig. S2]. [View Larger Version of this Image (33K GIF file)] Reprinted from: Boutros et. al., Science, 303, (2004) Academiejaar

32 Distribution of the frequency of RNAi phenotypes
Genoom Biologie Prof. M. Zabeau Distribution of the frequency of RNAi phenotypes 438 dsRNAs (3%) resulted in significantly reduced cell number with a z score of 3 or more Fig. 3. Quantitative grouping of RNAi phenotypes. (A) Distribution of the frequency of RNAi phenotypes recovered for each specified range of z scores. We used a z score of three or more standard deviations from the mean as a threshold to select 438 results for further analysis (tables S1 and S10). Reprinted from: Boutros et. al., Science, 303, (2004) Academiejaar

33 Pheno clusters of quantitative RNAi phenotypes
Genoom Biologie Prof. M. Zabeau Pheno clusters of quantitative RNAi phenotypes (D) Classification of quantitative RNAi phenotypes of selected genes (rows) identifies groups of related and newgene functions, as determined from duplicate screens per cell type (columns) and visualized by z score (scale, bottom). Both D-IAP1* (added control) and D-IAP1 (within RNAi library) yield equivalent phenotypes. [View Larger Version of this Image (33K GIF file)] Reprinted from: Boutros et. al., Science, 303, (2004) Academiejaar

34 Genome-wide RNAi screening in Arabidopsis
The Arabidopsis GST Entry clone resource was used to Generate a library of hairpin RNA (hpRNA) expression plasmids Large scale transformation of Arabidopsis hairpin RNA expression constructs GST GST Reprinted from: Hilson et. al., Genome Research 14: (2004)

35 Phenotypes of plants carrying a GST hpRNA transgene targeting a subunit of cellulose synthase
Genoom Biologie Prof. M. Zabeau (A) At5g17420 codes for CesA7, a subunit of cellulose synthase. (Top) Five-week-old control wild-type Arabidopsis plants. (Bottom) Eight-week-old T1 plants. The transformed plants grow more slowly than the wild type and have the weak, floppy stems seen in knockout mutants of this gene (scale bar, 6 cm). Reprinted from: Hilson et. al., Genome Research 14: (2004) Academiejaar

36 Phenotypes of plants carrying a GST hpRNA transgene targeting a H+-ATPase subunit
Genoom Biologie Prof. M. Zabeau B) At1g20260 codes for the vacuolar-type H+-ATPase subunit B3. Figure and insets show a representative range of phenotypes in T1 plants 7 wk after germination. In particular, the insets document severe dwarfing, particularly in rosette tissue, whereas flower size and development are not similarly affected (scale bars, 1 cm). Reprinted from: Hilson et. al., Genome Research 14: (2004) Academiejaar

37 Conclusions The function of 10 to 20% of the genes is identified by insertional mutagenesis and RNAi Expect that the detection of phenotypes for other genes will require alternative approaches different growth conditions, for example, environmental stress in other genetic backgrounds Reverse and forward genetics are complementary Reverse genetics Has the advantage of being high throughput and non-redundant Mutant phenotype is automatically connected to a known sequence Classical forward genetics Has the disadvantage that positional cloning is slow and laborious Some genes are resistant to RNAi, while all genes are sensitive to mutagens Can also yield gain-of-function mutations

38 Genome Biology and Biotechnology
Genoom Biologie Prof. M. Zabeau Genome Biology and Biotechnology 8. The transcriptome International course 2005 Academiejaar

39 Functional Maps or “-omes”
Genes or proteins n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)

40 Summary Transcriptome mapping Transcriptome profiling
Identification of transcribed regions in the genome Experimental confirmation of predicted gene models Discovery of non-coding RNA genes The “evolving” transcriptome map shows that The genome contains many more “genes” than simply genes coding for proteins Transcriptome profiling Functional characterization of genes based on expression patterns Cluster analysis of expression patterns Identification of co-regulated gene clusters Classification of tumors

41 Transcriptome mapping platforms
Large scale EST sequencing Primarily used to identify protein coding genes Noisy data sets that have been difficult to interpret Large scale full-length cDNA sequencing Technically very difficult and laborious Limited to a few model organisms: mouse and human Microarray technologies Become increasingly powerful as the density of the microarrays has increased tremendously Providing the most detailed view of the transcribed regions in the genome

42 EST Sequencing 5’EST vector vector Cloned cDNA 3’EST
Genoom Biologie Prof. M. Zabeau EST Sequencing 5’EST poly A vector vector Cloned cDNA 3’EST 3’ or 5’ ESTs sequences of individual cDNA clones cDNAs are often truncated at the 5’ end (not full length) Typically done on to clones per library Identifies the 1000 to 2000 most abundantly expressed genes Identifying ~70% of the protein coding genes requires Sequencing several 10s or even 100s of libraries Typically EST data bases contain > to ESTs EST sequence assemblies yield unigene collections Clusters of overlapping sequence reads from the same gene Academiejaar

43 Full length cDNA Sequencing
Technically very challenging Special techniques for selecting full length cDNA clones 5’ end (Capped end) selection Aggressive subtraction/normalization required to cover “all” genes Mouse and human “FANTOM” full length cDNA libraries Large scale sequencing of >> million 5' end and 3'-end sequences Complete sequencing of > full length cDNA clones Full length cDNAs define transcriptional units (TU) segments of the genome from which transcripts are generated TUs are DNA strand-specific, and are typically bounded by promoters at one end and termination sequences at the other

44 Transcriptional Units
Transcriptional units (TUs) comprise Protein coding transcripts (genes) and non-coding transcripts (genes?) Alternatively spliced transcripts Transcripts with alternative 5' start Transcripts with alternative 3' ends Frequently transcripts are made from both strands Sense and antisense transcripts are considered to be made from separate TUs The transcriptome is much more complex than we have always thought! Reprinted from: The FANTOM consortium, Nature 420, (2002)

45 The complexity of the transcriptome
Genoom Biologie Prof. M. Zabeau The complexity of the transcriptome Sense transcripts Protein coding transcripts Fig. 1. The complexity of transcription of protein-coding (blue) and noncoding (red) RNA sequences. Transcripts may be derived from either or both strands, and they may be overlapping and interlaced (2, 3, 6, 11, 12). Many transcripts (including some noncoding transcripts) are alternatively spliced. Both exons and introns may transmit information. Many miRNAs and all small nucleolar RNAs in animals are sourced from introns [see (13) for a review]. The range of types and functions of noncoding RNAs is unknown. Anti-sense transcripts Non-protein coding transcripts Academiejaar

46 Mouse transcriptome The FANTOM 2 transcriptome
60,770 completely sequenced clones comprises ~ TUs ~60% coding transcripts (~ genes) ~40% non coding transcripts (~ new genes) 29% are spliced Typical polyadenylation sites: RNA Pol II-mediated transcription Many are antisense transcripts to coding transcripts Estimate of the complete mouse transcriptome transcriptional units coding transcriptional units (> protein coding genes?) non-coding transcriptional units Reprinted from: The FANTOM consortium, Nature 420, (2002)

47 Experimental annotation of the human genome using microarray technology
Microarrays with 2 probes for each predicted exon Hybridized with a total of 69 cDNA samples Gene validation based on correlated exon expression Reprinted from: Shoemaker et. al., Nature 409, 922 (2001)

48 Analysis of Chromosome 22 genes
correct Incorrect exon Merged genes Ab initio correct Reprinted from: Shoemaker et. al., Nature 409, 922 (2001)

49 The transcriptional activity of human Chromosome 22
Rinn et al., Genes & Dev. 17: (2003) Paper describes Global transcriptional activity in placental RNA using DNA microarrays of 19,525  PCR fragments (300 bp to 1.4 kb) representing nearly all of the unique (nonrepetitive) sequences of human Chromosome 22 Array design 1.000 2.000 bp probes Average exon

50 The human Chr 22 placental transcriptome
Novel gene Transcription PCR probes Annotated genes Annotated gene Reprinted from: Rinn et al., Genes & Dev. 17: (2003)

51 The human Chr 22 placental transcriptome
Twice as many sequences are transcribed than previously reported Equal number of transcribed sequences in unannotated regions as in annotated regions Transcripts from unannotated regions comprise transcripts internal to annotated introns transcripts that are antisense to annotated genes a large portion of the novel transcripts is evolutionarily conserved in the mouse Reprinted from: Rinn et al., Genes & Dev. 17: (2003)

52 Kampa et. al., Genome Res. 13: 331-342 (2003)
Novel RNAs Identified From an In-Depth Analysis of the Transcriptome of Human Chromosomes 21 and 22 Kampa et. al., Genome Res. 13: (2003) Paper describes Transcriptome analysis of nonrepetitive regions of chromosomes 21 and 22 in 11 different cell lines using High density oligonucleotide arrays with a 35 bp resolution uniformly spaced 25-mers oligonucleotide probes Array design 500 1.000 bp probes Average exon

53 Transcription maps based on adjacent probes intensities
Transfrags adjacent probes detecting transcripts Well-annotated genes 80% to 90% of the known genes show alternative splicing Reprinted from: Kampa et. al., Genome Res. 13: (2003)

54 Transcriptome maps of Chr 21 and 22
50% of the transcription falls outside known genes 75% contain no ORFs and are thus non-coding ~10% is antisense to known genes Transcriptome is greater than previously estimated the total number of transcripts is much larger than the present estimates of 25,000 genes Reprinted from: Kampa et. al., Genome Res. 13: (2003)

55 Bertone et. al., Science 306, 2242-2246 (2004)
Global Identification of Human Transcribed Sequences with Genome Tiling Arrays Bertone et. al., Science 306, (2004) Paper presents Transcriptome analysis of the nonrepetitive regions of the human genome in human liver tissue RNA using High density oligonucleotide arrays with a 46 bp resolution uniformly spaced 36-mer oligonucleotide probes A total of 51,874, mer probes representing 1.5 Gb of nonrepetitive human genomic DNA Array design 500 1.000 bp probes sense anti-sense Average exon

56 Annotated genes aligned with microarray fluorescence intensities
probes Exon/intron probes Exon/intron Reprinted from: Bertone et. al., Science 306, (2004)

57 Identification of Novel Transcription Units
Transcribed regions outside of previously annotated exons Identified 8958 novel transcription units Over half were distal to annotated genes Many transcription units are homologous to mouse genome sequences Reprinted from: Bertone et. al., Science 306, (2004)

58 Cheng et. al., Science. 308: 1149-1154 (2005)
Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution Cheng et. al., Science. 308: (2005) Paper presents Transcriptome analysis of the nonrepetitive regions of the 10 human chromosomes (30% of the genome) in 8 cell lines RNA using Ultra high density oligonucleotide arrays with a 5 bp resolution Tiling array of 25-mer oligonucleotide probes with a 20 bp overlap Array design 500 1.000 bp probes Average exon

59 Correlation of poly A+ transcripts to annotations
Genoom Biologie Prof. M. Zabeau Correlation of poly A+ transcripts to annotations Larger amount of transcripts 57% novel transcripts in unannotated regions Intergenic and intronic Novel transcripts frequently overlap with other transcripts spliced Fig. 1. The correlation of detected transcription in one of eight cell lines to annotations along each of the 10 chromosomes is shown for each chromosome individually and as a collective of all chromosomes. The detected transcription was determined using poly A+ cytosolic RNA from each of the eight cell lines. The annotations used in this correlation are defined in (15). The pattern code used in the central pie chart is used in all other pie charts. Reprinted from: Cheng et. al., Science. 308: (2005) Academiejaar

60 Poly A+ and poly A– transcription in the nucleus and cytosol
Genoom Biologie Prof. M. Zabeau Poly A+ and poly A– transcription in the nucleus and cytosol Analysis of poly A+ and poly A– transcripts poly A– transcripts are twice as abundant as poly A+ A large proportion of the transcripts is found exclusively in the nucleus or the cytoplasm cytoplasm Poly A- Poly A+ Fig. 3. Distribution of poly A+ and poly A– transcription in the nucleus and cytosol with respect to genome annotations. A four-circle Venn diagram represents proportions of transcribed base pairs in cytosolic poly A+ (cyan), cytosolic poly A– (black), nuclear poly A+ (red), and nuclear poly A– (dark blue). Numbers indicate percentage of total transcription detected in each unique compartment (fig. S9 and Table 1). Pie charts illustrate the distribution of transcribed base pairs detected in each indicated unique compartment among various classes of annotations. The annotations used in this correlation are described in (15). nucleus Reprinted from: Cheng et. al., Science. 308: (2005) Academiejaar

61 Conclusions Transcriptome mapping experiments show that
Genoom Biologie Prof. M. Zabeau Conclusions Transcriptome mapping experiments show that a larger percentage of the genome is transcribed than can be accounted for by the current state of genome annotations The human transcriptome is composed of a network of overlapping transcripts (> 50% of the transcripts) Poly A– RNAs potentially comprise almost half of the human transcriptome Our understanding of the human transcriptome is still evolving… What are the functions of the non-coding transcripts? Reprinted from: Cheng et. al., Science. 308: (2005) Academiejaar

62 The complexity of the transcriptome
Genoom Biologie Prof. M. Zabeau The complexity of the transcriptome Fig. 1. The complexity of transcription of protein-coding (blue) and noncoding (red) RNA sequences. Transcripts may be derived from either or both strands, and they may be overlapping and interlaced (2, 3, 6, 11, 12). Many transcripts (including some noncoding transcripts) are alternatively spliced. Both exons and introns may transmit information. Many miRNAs and all small nucleolar RNAs in animals are sourced from introns [see (13) for a review]. The range of types and functions of noncoding RNAs is unknown. Reprinted from: Mattick, Science. 309: (2005) Academiejaar

63 A Gene Expression Map for the Euchromatic Genome of Drosophila melanogaster
Stolc et. al., Science, 306, (2004) Paper presents Transcriptome map of the Drosophila genome using microarrays with 179,972 unique 36-nucleotide probes 61,371 exon probes for the 13,197 predicted genes 30,787 splice junction probes 87,814 nonexon probes from intronic and intergenic regions Using RNA from six developmental stages during the Drosophila life cycle

64 Genomic expression patterns
93% of all annotated gene were significantly expressed confirmed 2426 annotated genes not yet validated through an EST sequence The majority of the genes are developmentally regulated Reprinted from: Stolc et. al., Science, 306, (2004)

65 Transcriptome map of Drosophila
41% of intergenic and intronic probes are expressed One fraction does not correspond to exons and may represent putative noncoding transcription units 15% of the intergenic and intronic probes are developmentally regulated Alternative splicing 53% of expressed Drosophila genes exhibit exon skipping 46% of genes showed multiple patterns of exon expression suggesting alternative splicing or alternative promoter usage Alternative splicing in Drosophila Much higher than previously estimated Reprinted from: Bertone et. al., Science 306, (2004)

66 Transcriptome or Gene Expression Profiles
The transcriptome is dynamic Changes rapidly and dramatically in response to perturbations, environmental stimuli or during normal cellular events Changes in the patterns of gene expression provide clues about cellular functions biochemical pathways regulatory mechanisms Transcriptome or gene expression profiling aims to Monitor the expression levels of “all” genes Correlate expression profiles with biological activity Identifying genetic networks and pathways Identifying the function of unknown genes Diagnose physiological (disease) states Reprinted from: Lockhart and Winzeler, Nature 405, 827 (2000)

67 Eukaryotic Transcriptome
Genoom Biologie Prof. M. Zabeau Eukaryotic Transcriptome Reprinted from: “The Cell ” Academiejaar

68 Transcriptome Profiling Platforms
Genoom Biologie Prof. M. Zabeau Transcriptome Profiling Platforms DNA sequencing based methods DNA sequencing of individual cDNA clones to count the number of times a cDNA clone is present in a cDNA library Limited resolution but measures absolute RNA levels DNA fragment analysis based methods PCR-based amplification of DNA fragments derived from mRNA or cDNA whereby Each DNA fragment represents a different mRNA Currently primarily used for not (yet) sequenced species Array-based hybridization methods Hybridization to microarrays with gene-specific DNA probes Has become the most performant and most widely used platform High resolution exon microarrays allow quantitative analysis of alternatively spliced transcripts Academiejaar

69 Cluster Analysis and Display of Genome-wide Expression Patterns
Eisen et. Al., PNAS 95, (1998) Paper presents Method for analyzing and representing genome-wide expression data Cluster analysis of data using standard statistical algorithms to arrange genes according to similarity in pattern of gene expression The output is displayed graphically, conveying the clustering and the expression data simultaneously in a form intuitive for biologists

70 Cluster Analysis of Expression Patterns
A logical basis for organizing gene expression data is to group genes with similar patterns of expression using a mathematical description of similarity that captures similarity in "shape" of expression profiles Since there is no a priori knowledge of gene expression patterns, unsupervised methods are favored Pair wise average-linkage cluster analysis - a form of hierarchical clustering - similar to that used in sequence and phylogenetic analysis Yields a similarity tree: branch lengths reflect the degree of similarity between the objects Reprinted from: Eisen et. Al., PNAS 95, (1998)

71 Example: Similarity Tree of CDK Genes
0.1 Ms_CDKB1_1_MsD CDC2b-like_VERO At_CDKB1_1_BAA Le_CDKB1_1_CAC Le_CDKb2_1_CAC Ms_cdc2F_CAA CDC2FbAt_VERO CDC2FaAt_VERO At_CDKA_2_AAA Ms_CDKA_2_CAA Ms_CDKA_1_AAB Ms_CDKE_1_CAA put35prot_AT5_5_ _prot Ms_CDKC_1_CAA putCDKC2_T42526 At_CDKC_2 At_CDKC_1 put10Cprot.tfa CAK1AT_BAA put4CAK_AT1_4_ _prot Os_CDKD_1_CAKR2_CAA4117 put5CAK_OK

72 Graphical Representation
Combines clustering with a graphical representation of the primary data By representing each data point with a color that is a quantitative reflection of the experimental observations Green: down regulated Red: up regulated Images show contiguous patches of color Representing groups of genes that share similar expression patterns over multiple conditions Analysis of clustered genes shows that The clustered genes share common functions in cellular processes Reprinted from: Eisen et. Al., PNAS 95, (1998)

73 Graphical Representation
Different experimental observations Cluster 1 Different genes Cluster 2 Reprinted from: Eisen et. Al., PNAS 95, (1998)

74 Cluster Analysis of Combined Yeast Data Sets
Genoom Biologie Prof. M. Zabeau Cluster Analysis of Combined Yeast Data Sets Synchronized cell division Sporulation Heath shock Reducing agents Low temperature Cluster analysis of combined yeast data sets. Data from separate time courses of gene expression in the yeast S. cerevisiae were combined and clustered. Data were drawn from time courses during the following processes: the cell division cycle (9) after synchronization by alpha factor arrest (ALPH; 18 time points); centrifugal elutriation (ELU; 14 time points), and with a temperature-sensitive cdc15 mutant (CDC15; 15 time points); sporulation (10) (SPO, 7 time points plus four additional samples); shock by high temperature (HT, 6 time points); reducing agents (D, 4 time points) and low temperature (C; 4 time points) (P. T. S., J. Cuoczo, C. Kaiser, P.O. B., and D. B., unpublished work); and the diauxic shift (8) (DX, 7 time points). All data were collected by using DNA microarrays with elements representing nearly all of the ORFs from the fully sequenced S. cerevisiae genome (8); all measurements were made against a time 0 reference sample except for the cell-cycle experiments, where an unsynchronized sample was used. All genes (2,467) for which functional annotation was available in the Saccharomyces Genome Database were included (12). The contribution to the gene similarity score of each sample from a given process was weighted by the inverse of the square root of the number of samples analyzed from that process. The entire clustered image is shown in A; a larger version of this image, along with dendrogram and gene names, is available at Full gene names are shown for representative clusters containing functionally related genes involved in (B) spindle pole body assembly and function, (C) the proteasome, (D) mRNA splicing, (E) glycolysis, (F) the mitochondrial ribosome, (G) ATP synthesis, (H) chromatin structure, (I) the ribosome and translation, (J) DNA replication, and (K) the tricarboxylic acid cycle and respiration. The full-color range represents log ratios of 1.2 to 1.2 for the cell-cycle experiments, 1.5 to 1.5 for the shock experiments, 2.0 to 2.0 for the diauxic shift, and 3.0 to 3.0 for sporulation. Gene name, functional category, and specific function are from the Saccharomyces Genome Database (13). Cluster I contains 112 ribosomal protein genes, seven translation initiation or elongation factors, three tRNA synthetases, and three genes of apparently unrelated function. Reprinted from: Eisen et. Al., PNAS 95, (1998) Academiejaar

75 Genes of Similar Function Cluster Together
Ribosomal proteins Histones Reprinted from: Eisen et. Al., PNAS 95, (1998)

76 Global Analysis of the Genetic Network Controlling a Bacterial Cell Cycle
Laub et. Al., Science, 290, 5499 (2000) Paper presents full-genome evidence that bacterial cells use discrete transcription patterns to control cell division Demonstrating that genes involved in a given cell function are activated at the time of execution of that function

77 Cell division in the bacterium Caulobacter crescentus
A complex genetic network controls cell division DNA replication and the ordered biogenesis of cell structures Reprinted from: Laub et. Al., Science, 290, 5499 (2000)

78 Microarray Analysis of the Control of cell division
Experimental set up Constructed DNA microarrays containing 2966 predicted ORFs Isolated swarmer cells which were allowed to proceed synchronously through the 150-min cell cycle RNA was harvested from samples taken at 15-min intervals identified RNAs which varied in function of the cell cycle Using an algorithm to identify expression profiles that varied in a cyclical manner identified 553  cell cycle-regulated transcripts including the 72 genes with previously characterized cell cycle-regulated promoters Reprinted from: Laub et. Al., Science, 290, 5499 (2000)

79 Genoom Biologie Prof. M. Zabeau Clustered Expression Profiles for the 553 Cell Cycle-regulated Transcripts Temporally regulated genes are maximally expressed at specific times throughout the entire cell cycle Genes were induced immediately before or coincident with each cell cycle-regulated event Figure 1. ( B) Clustered expression profiles for the 553 identified cell cycle-regulated transcripts are organized by time of peak expression. Expression profiles for genes are in rows with temporal progression from left to right, as indicated at the top. Ratios are represented using the color scale at the bottom. Expression profiles were clustered using the self-organizing map analysis of the GeneCluster software and plotted using TreeView software. Each cluster is numbered; for an expanded, annotated view of these clusters Reprinted from: Laub et. Al., Science, 290, 5499 (2000) Academiejaar

80 Profiles Profiles of Genes Associated With DNA Replication and Cell Division
Reprinted from: Laub et. Al., Science, 290, 5499 (2000)

81 Expression Profiles of Genes Involved in Flagellar Biogenesis
Genoom Biologie Prof. M. Zabeau Expression Profiles of Genes Involved in Flagellar Biogenesis Genes for flagellar biogenesis are organized in a 4-level transcriptional hierarchy The expression of each class of genes is required for expression of all subsequent classes Pili and flagellar biogenesis are apparently organized as a temporal transcriptional cascades Figure 2. (A to C) Expression profiles of functionally related sets of genes. The flagellar biogenesis genes in (C) are organized in classes I to IV, reflecting the temporal hierarchy of their transcription. The column labeled ctrAts shows the change in expression level for each gene in response to loss of CtrA. Expression levels are color-coded as in Fig. 1. For expanded, annotated profiles of each set of genes Reprinted from: Laub et. Al., Science, 290, 5499 (2000) Academiejaar

82 Conclusions The global analysis of bacterial cell cycle regulation
has established the outline of the complex genetic circuitry that controls bacterial cell cycle progression identified 553 genes whose mRNA levels varied as a function of the cell cycle, demonstrating that (i) genes involved in a given cell function are activated at the time of execution of that function (ii) genes encoding proteins that function in complexes are coexpressed (iii) temporal cascades of gene expression control in multiprotein structure biogenesis Reprinted from: Laub et. Al., Science, 290, 5499 (2000)

83 Gene expression profiling predicts clinical outcome of breast cancer
Van 'T Veer et. al., Nature 415, 530 (2002) Paper presents The application of gene expression profiling to diagnose breast cancer patients that are likely to develop metastases and should receive chemotherapy Exemplifies the clinical applications of microarray technology

84 Experimental design Microarray hybridizations Data analysis
Oligonucleotide microarrays for human genes Selected 98 primary breast cancers from 44 patients with good prognosis (disease-free for >5 years) 34 patients with poor prognosis (developed metastases within 5 years) 20 patients with BRCA1 and BRCA2 mutations Hybridized RNA isolated from frozen tumor material Data analysis Two-dimensional unsupervised hierarchical clustering of The 98 tumor samples the 5000 genes that were significantly regulated Reprinted from: Van 'T Veer et. al., Nature 415, 530 (2002)

85 Cluster Analysis of 98 Breast Tumours
Genoom Biologie Cluster Analysis of 98 Breast Tumours Prof. M. Zabeau Good prognosis Figure 1 Unsupervised two-dimensional cluster analysis of 98 breast tumours. a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). Poor prognosis Reprinted from: Van 'T Veer et. al., Nature 415, 530 (2002) Academiejaar

86 Prognostic expression markers
Identification of predictive genes 3-step supervised classification method selected From 5000 significantly regulated genes 231 genes were selected as significantly associated with the disease outcome The 231 genes were rank ordered on the correlation an optimal set was selected iteratively that showed the strongest power to classify the tumors Selected 70 genes that correctly predict 85% of the patients Can be used to diagnose patients for chemotherapy Reprinted from: Van 'T Veer et. al., Nature 415, 530 (2002)

87 Expression profiles of the 70 predictive genes
sensitivity accuracy Reprinted from: Van 'T Veer et. al., Nature 415, 530 (2002)

88 Conclusions Microarray-based expression profiling is
Currently the most powerful tool for functional gene analysis Comprehensive approach to investigate the response of genes under a broad spectrum of conditions such as Genetic backgrounds Perturbations Environmental stimuli Continued increases in probe density Provide more detailed analyses of the different transcripts Alternative promoter usage Alternative splicing Non-coding transcripts


Download ppt "Genome Biology and Biotechnology"

Similar presentations


Ads by Google