1 Computational functional genomics Lital Haham Sivan Pearl.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Protein-protein interactions
COG and GO tutorial.
Bioinformatics and Phylogenetic Analysis
Protein domains vs. structure domains - an example.
Protein-protein interactions Ia. A combined algorithm for genome-wide prediction of protein function. Edward M. Marcotte, Matteo Pellegrini, Michael J.
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Protein interaction Computational (inferred) Experimental (observed)
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Protein Modules An Introduction to Bioinformatics.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
2.7 DNA Replication, transcription and translation
Protein Interactions and Disease Audry Kang 7/15/2013.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Chapter 3 The Biological Basis of Life. Chapter Outline  The Cell  DNA Structure  DNA Replication  Protein Synthesis  What is a Gene?  Cell Division:
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Genomics in Drug Organon, Oss Tim Hulsen.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Phylogenetic Trees: Common Ancestry and Divergence 1B1: Organisms share many conserved core processes and features that evolved and are widely distributed.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Chapter 3 The Biological Basis of Life. Chapter Outline The Cell DNA Structure DNA Replication Protein Synthesis Cell Division: Mitosis and Meiosis New.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Chapter 3 The Biological Basis of Life. Chapter Outline  The Cell  DNA Structure  DNA Replication  Protein Synthesis.
Protein and RNA Families
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
The Mammalian Protein – Protein Interaction Database and Its Viewing System That Is Linked to the Main FANTOM2 Viewer Genome Research (2003) Speaker: 蔡欣吟.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
Using blast to study gene evolution – an example.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
Introduction to biological molecular networks
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics Winter Roi Adadi Naama Kraus
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
InterPro Sandra Orchard.
Detecting Protein Function and Protein-Protein Interactions from Genome Sequences TuyetLinh Nguyen.
bacteria and eukaryotes
Basics of Comparative Genomics
FLiPS Functional Linkage Prediction Service.
DNA 2.7 Replication, transcription and translation
Genome Annotation Continued
Protein Interaction Networks
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Central Dogma
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Yamanishi, M., Itoh, M., Kanehisa, M.
Basics of Comparative Genomics
Presentation transcript:

1 Computational functional genomics Lital Haham Sivan Pearl

2 Introduction Piles of information but only flakes of knowledge. The existing information: Collections of genomic sequences. Expression profiles Protein-protein interactions And many more…

3 Introduction Computational biology strives to extract the maximal possible information from known sequences, by classifying them according to their homologous relationships, predicting their biochemical activity, cellular function, 3-dimensional structures and evolutionary origin.

4 The COG -Clusters of Orthologous Groups of proteins Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Reflects one-to-one, one-to-many and many-to- many relationships. The purpose of COG is to serve as a platform for functional annotation of newly sequenced genomes and for study of genome evolution.

5 The COG -statistics In 2003, there are 3307 COGs including proteins from 43 genomes. Genomes from- Bacteria, Archaea and Eukaryota. The database includes 17 functional groups.

6 The COG - make on your own COG construction procedure is based on the notion that any group of at least 3 proteins from distant genomes that are more similar to each other than to any other protein from the same genomes, are most likely to belong to an orthologous family.

7 The COG - make on your own All-against-all protein sequence comparison Detect and collapse paralogs Detect triangles of mutually genome specific best hits Merge triangles with a common side, to form COG

8 The COG - make on your own

9 The COG - adding new genomes The COGNITOR program adds new proteins to pre-existing COGs on the basis of multiple Best Hits % of the proteins of prokaryotes could be included.

10 The COG - more applications: Detecting missed genes. Convenient for variety of evolutionary- oriented analyses of protein families.

11 Methods Experimental method: Biochemical and genetic experiments Computational methods: Homology method (BLAST), mRNA expression Phylogenetic profile Fusion method (Rosetta stone analysis) Gene neighbour method

12 Homology method Homology method: searches proteins whose AA sequences are similar % of new genome can be assigned to some function. Involve identification of some molecular function.

13 mRNA expression Analysis of correlated mRNA expression levels enables to establish functional linkages, by detecting changes in mRNA expression in different cell types, or different environments.

14 Phylogenetic profile Describes the pattern of presence or absence of a particular protein, across a set of organisms. Number of possible profiles: This number far exceeds the protein families.

15 Phylogenetic profile Why would two proteins always both be inherited into new species or neither inherited, unless the two function together? If two proteins have the same phylogenetic profile, it is inferred that they have a functional link: engaged in a common pathway or complex.

16 Phylogenetic profile 1 11

17 Phylogenetic profile- example Analysis of three proteins: RL7, FlgL and His5, according to their phylogenetic profiles. RL7: more than half have function associated with the ribosome. FlgL: more than half include various flagellar proteins and cell-wall maintenance proteins. His5: more than half involved in amino acid metabolism.

18 Phylogenetic profile - example RL7 ribosome L7 RL15 ribosome L15 RL17 ribosome L17 PTH peptidyl-tRNA hydrolase RNC ribonuclease III PgsA phospholipid synthesis YGGH hypothetical YBEX hypothetical RL34 ribosome L34 RL36 ribosome L36 RL27 ribosome L27 RL25 ribosome L25 YQCB hypothetical YABO hypothetical YCEC hypothetical RFH peptide release factor ClpB geat shock protein YJFH hypothethocal RS14 ribosome S14 G3P3 dehydrogenase RL4 ribosome L4 NONE hypothtical GrpE co-chaperone GidB glucose inhib. Division RL24 ribosome L24 DEF polypeptide deformylase RL20 ribosome L20 MesJ cell cycle protein RL19 ribosome L19 RL21 ribosome L21 RL9 ribosome L9 SmpB small protein B

19 Phylogenetic profile Keyword No. proteins No. neighbors in keyword group No. neighbors in random group Ribosome Transcription tRNA synthase and ligase26115 Membrane proteins25895 Flagellar21893 Iron, ferric, and ferritin19312 Galactose metabolism18312 Molybdoterin and Molybdenum, and molybdoterin 1261 Hypothetical Phylogenetic profiles link protein with similar keywords

20 Fusion method or the Rosetta stone analysis Some pairs of interacting proteins have homologs in another organism, fused into a single protein chain. When two separate proteins in one organism, A and B, are expressed as a fused protein in some other species, there is a high probability that A and B are linked in function.

21 Fusion method

22 The Rosetta Stone model

23 Fusion method – what is it good for? Predicts protein pairs that have related biological functions. Predicts potential protein-protein interactions. Can turn up complexes of proteins, or protein pathways.

24 Fusion method – what is it good for?

25 Fusion method The group searched the 4290 protein sequences of the E.coli genome. The proteins could form at most (4290)(4289)/2 pair interactions. But we expect much less… There were found 6809 candidate for pair interactions.

26 Fusion method – validation Looking for a similar function in existing annotations that would imply at least functional interaction. Of the E.coli pairs that were found in the Rosetta Stone analysis, 68% share at least one keyword in their annotations, whereas from E.coli proteins that were selected randomly, only 15% share a keyword.

27 Fusion method – validation From a database containing protein pairs that have been found to interact (experimentally) – 6.4% are linked by Rosetta Stone sequences. The phylogenetic profile method was applied to the interactions predicted by the fusion method. It found more than 8 times as many interactions suggested by the phylogenetic profile method, as for randomly chosen sets of interactions.

28 Fusion method – missing pairs False negatives: There was no fusion of the interacting proteins. The fused protein disappeared during the course of evolution.

29 Fusion method – False alarms False positives: False prediction of physical interactions when the proteins are fused, but are co-regulated and don’t interact. Cannot distinguish between homologs that bind and those that do not.

30 Fusion method – False alarms The false positive rate in E.coli due to the inability to distinguish homologs is about 82%. To reduce these errors: the “promiscuous” domains were found and removed during the analysis. By filtering of only 5% of all domains, we can remove the majority of falsely predicted interactions.

31 Fusion method – False alarms

32 Neighbour method Functional links between genes can be identified by examining whether the proximity of the genes is conserved across multiple genomes. Powerful in uncovering functional linkages in prokaryotes where operons are common.

33 Neighbour method

34 Neighbour method - definitions ‘close’: proximate genes are on the same strand within 300 bp, and transcribed in the same direction. Direct link: two proximate genes that are also proximate in at least two other genomes of different phylogenetic groups. Inferred link: two genes that are not close but with orthologs that are close in at least three other genomes of different phylogenetic groups.

35 Neighbour method - defenitions

36 Neighbour method Proximity between genes is maintained mostly because it facilitates their co-transfer to another organism. Example: restriction-modification systems.

37 Neighbour method - validation Identification of links that are annotated in KEGG or COG – and calculate the fraction of those in the same functional pathway / category. The functional correspondence is correlated to the minimal number of phylogenetic groups, in which the proximity is detected.

38 Neighbour method - validation N tradeoff

39 Neighbour method - example

40 Happy end??? The group analyzed the 6,217 proteins of the yeast Saccharomyces combining several methods. one can expect each protein to be functionally linked to perhaps 5 – 50 other proteins, giving 30,000 – 300,000 biologically meaningful links.

41 Happy end???

42 Networks When methods of detecting functional linkages are applied to all the proteins of an organism, network of interacting, functionally linked proteins can be traced. As methods improve for detecting protein linkages, it seems likely that most of the proteins will be included in the network.

43 Networks

44 פורים שמח