Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics.

Similar presentations


Presentation on theme: "Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics."— Presentation transcript:

1 Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics

2 High-throuhput data on gene function What do I mean: omics, microarray, chip-on-chipWhat do I mean: omics, microarray, chip-on-chip Why are people generating these data?Why are people generating these data? –post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in yeast and how they interact to create a eukaryotic organism. –Because they can: apply automation also to other areas of molecular biology beyond sequencing –To have “screens” for the research question at hand rather than to have to test each guess at a time What about evolutionary genomics?What about evolutionary genomics? YeastYeast Accuracy / noiseAccuracy / noise What do I mean: omics, microarray, chip-on-chipWhat do I mean: omics, microarray, chip-on-chip Why are people generating these data?Why are people generating these data? –post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in yeast and how they interact to create a eukaryotic organism. –Because they can: apply automation also to other areas of molecular biology beyond sequencing –To have “screens” for the research question at hand rather than to have to test each guess at a time What about evolutionary genomics?What about evolutionary genomics? YeastYeast Accuracy / noiseAccuracy / noise

3 HTP data What do they mean: experimental knowledge, but still what do they in terms of e.g. function?What do they mean: experimental knowledge, but still what do they in terms of e.g. function? A delugeA deluge Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this dataBioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this data What do they mean: experimental knowledge, but still what do they in terms of e.g. function?What do they mean: experimental knowledge, but still what do they in terms of e.g. function? A delugeA deluge Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this dataBioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this data

4 Microarray data

5 two conditions often used for “screens”

6 (Correlated) mRNA expression mRNA levels are systematically measured under a variety of different cellular conditions, and genes are grouped if they show a similar transcriptional response to these conditions.mRNA levels are systematically measured under a variety of different cellular conditions, and genes are grouped if they show a similar transcriptional response to these conditions.

7 Profile Similarity Identifies Sterol-Pathway Disturbance Resulting from Deletion of Uncharacterized ORF YER044c (ERG28) and from Dyclonine Treatment (A)Prominent gene clusters responding to interference with ergosterol biosynthesis, (B)Comparison of the transcript profile of an erg28Δ strain to that of an erg3Δ strain. (C) Sterol content of wild-type (left) and erg28Δ (right) strains. Hughes et al. 2000Cell

8 Conventional hierarchical clustering of co-expression data could fail, because genes can play a role in multiple cellular processes and their common regulatory element can only be detected in a subset of experiments. detect genes that are co-expressed under a subset of conditions. a comprehensive set of overlapping ‘transcriptional modules’ Ihmels et al. 2002 Nature Genetics

9 Citric acid cycle? Different activity under different experimental conditions

10 Rapid divergence in expression between duplicate genes inferred from microarray & promotor data 0.1 = 3.2 My

11 Clustering conditions where the conditions are genes: yet another way to get to functional “links”

12 Yeast-2-hybrid Pairs of proteins to be tested for interaction are expressed as fusion proteins ('hybrids') in yeast: one protein is fused to a DNA-binding domain, the other to a transcriptional activator domain. Any interaction between them is detected by the formation of a functional transcription factor.

13 Examples from the original Ito publication: A autophagy B spindle pole body function C and vesicular transport Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey. Examples from the original Ito publication: A autophagy B spindle pole body function C and vesicular transport Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey.

14 Accuracy of Y2H and how to improve it b

15 Improving reliability using protein complexes reasoning / internal consistency Internal filtering!

16 Accuracy of Y2H and how to improve it B

17 Mass spectrometry of purified complexes. Individual proteins are tagged and used as 'hooks' to biochemically purify whole protein complexes. These are then separated and their components identified by mass spectrometry.Individual proteins are tagged and used as 'hooks' to biochemically purify whole protein complexes. These are then separated and their components identified by mass spectrometry.

18

19 b

20

21 socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines, >15. Bait proteins are shown in bold and shaded circles around groups of proteins indicate cores and modules. ExosomeSki Stages in mRNA degradation

22 pdb Y2H Cellular Function Phylogenetic profile

23 Protein interactions: literature databases Literature derived, normally manually curated (as opposed to text mining)Literature derived, normally manually curated (as opposed to text mining) Biased?Biased? No new knowledgeNo new knowledge Useful for benchmarking & for the study of the evolution of e.g. protein complexesUseful for benchmarking & for the study of the evolution of e.g. protein complexes For example: Munich Informatation center for Protein Sequences (MIPS)For example: Munich Informatation center for Protein Sequences (MIPS) Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND), Literature derived, normally manually curated (as opposed to text mining)Literature derived, normally manually curated (as opposed to text mining) Biased?Biased? No new knowledgeNo new knowledge Useful for benchmarking & for the study of the evolution of e.g. protein complexesUseful for benchmarking & for the study of the evolution of e.g. protein complexes For example: Munich Informatation center for Protein Sequences (MIPS)For example: Munich Informatation center for Protein Sequences (MIPS) Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),

24 Systematic screening for lethality of knockouts on a rich medium The functions of many open reading frames (ORFs) identified in genome- sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high- throughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium.The functions of many open reading frames (ORFs) identified in genome- sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high- throughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium. Winzeler et al. 1999 Science

25 Genetic interactions (synthetic lethal/sick) Two nonessential genes that cause lethality when mutated at the same time form a synthetic lethal interaction. Such genes are often functionally associated and their encoded proteins may also interact physically.Two nonessential genes that cause lethality when mutated at the same time form a synthetic lethal interaction. Such genes are often functionally associated and their encoded proteins may also interact physically. Tong et al. 2001 Science

26

27 One thing we can do with synthetic lethals Ideker: protein interactionsIdeker: protein interactions

28 What do to with synthetic lethals? Kelley and Ideker 2005 Nature Biotech

29

30 ChIP-on-chipChIP-on-chip Tagged strains (one strain for each regulator).Tagged strains (one strain for each regulator). Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA.Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA. Tagged strains (one strain for each regulator).Tagged strains (one strain for each regulator). Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA.Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA. b

31 Gfp localization Mating of fluorescent protein markers specific for organelles plus fluorescent protein tags for each geneMating of fluorescent protein markers specific for organelles plus fluorescent protein tags for each gene

32 Other functional genomics data: the omes quantitative proteomicsquantitative proteomics KinomeKinome PTMomePTMome (almost) All of these data is freely and publicly available(almost) All of these data is freely and publicly available Take home message “wow this exists !!!”Take home message “wow this exists !!!” quantitative proteomicsquantitative proteomics KinomeKinome PTMomePTMome (almost) All of these data is freely and publicly available(almost) All of these data is freely and publicly available Take home message “wow this exists !!!”Take home message “wow this exists !!!”

33 Accuracy Coverage purified complexes TAP yeast two-hybrid two methods three methods Purified Complexes HMS-PCI combined evidence mRNA co-expression genomic context synthetic lethality fraction of reference set covered by data fraction of data confirmed by reference set filtered data raw data parameter choices Bioinformatics for Benchmarking & Integration

34 Advanced integration B


Download ppt "Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics."

Similar presentations


Ads by Google