1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The STRING database Michael Kuhn EMBL Heidelberg.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Comparative genomics Joachim Bargsten February 2012.
MDI Retraite 2007 Evolution of the immune system from model organism to man Tim Hulsen 1, Wilco W.M. Fleuren 1, Peter M.A. Groenen 2 1 CMBI, Radboud University.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Xenolog: Homologs resulting from horizontal gene transfer.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Some new sequencing technologies. Molecular Inversion Probes.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Protein-protein interactions
Bioinformatics and Phylogenetic Analysis
Detecting Orthologs Using Molecular Phenotypes a case study: human and mouse Alice S Weston.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Affymetrix Expression Data Comics Group Nijmegen Tim Hulsen.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Genomics in Drug Organon, Oss Tim Hulsen.
Networks and Interactions Boo Virk v1.0.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Protein World SARA Amsterdam Tim Hulsen.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
An orthology case study: the trypsin inhibition pathway Tim Hulsen (2005/03/07)
Protein and RNA Families
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 Computational functional genomics Lital Haham Sivan Pearl.
Testing sequence comparison methods with structure Organon, Oss Tim Hulsen.
Matching 7 Matching 8 Matching 9 Matching 10 Medical Abbreviations Matching 1 Matching 2 Matching 3 Matching 4 Matching 5 Matching 6.
Phylogeny and the Tree of Life
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
P-POD-PANTHER: update
Genomes and their evolution
Genome Annotation Continued
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Anatomy An Introduction.
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003

2 Summary (1) An introduction to orthology and paralogy (2) Orthology determination within eukaryotes (3) Testing the advantages of our ortholog set (4) Using evolutionary conservation of co- expression for function prediction (5) Evolutionary conservation of chromosomal distance and orientation

3 (1) An introduction to orthology and paralogy Homologous genes: genes that have a common ancestor Orthologous genes: genes that evolved from a common ancestor through a speciation event (  equivalents in different species) Paralogous genes: genes that evolved from a common ancestor through a duplication event

4 Orthology and paralogy explained graphically (from

5 The importance of orthology and paralogy Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)

6 (2) Orthology determination within eukaryotes Not much eukaryotic orthology available at this moment: euKaryotic Orthologous Groups (KOG,NCBI) Inparanoid OrthoMCL Existing databases are either too inclusive or too restrict Most methods rely on best bidirectional hit (E- value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!

7 Our orthology determination within eukaryotes Hs At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp Z>20, RH>0.5*QL 24,263 groups PHYLOME SELECTION OF HOMOLOGS ALIGNMENTS AND TREE GENOME GENOMES TREE SCANNING LIST Hs-Mm: 85,848 pairs Hs-Dm: 55,934 pairs etc.

8 Our orthology determination: using phylogenetic trees Example: BMP6 (Bone Morphogenetic Protein 6)  5 orthologous relations are defined, all Hs-Mm

9 The ortholog database: Eukaryortho (only accessible from Organon, CMBI and SARA)

10 (3) Testing the advantages of our ortholog set Quality of orthology difficult to test Orthologs should have more or less the same function --> use conservation of function as an orthology benchmark Gene Ontology (GO) database: hierarchical system of function and location descriptions Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class

11 GO molecular function benchmark Molecular function: one of the three ‘subroots’ (together with biological process and cellular location) ‘True’ orthologs should share a 4th level molecular function (here: GO ) Our Hs-Mm ortholog set: 67 % KOG Hs-Mm ortholog set: 51 %

12 Co-expression benchmark Second method: comparing expression profiles of each orthologous gene pair Using GeneLogic Expressor data set: –Human chips: 3269 samples, fragments, 115 tissue categories, 15 SNOMED tissue categories –Mouse chips: 859 samples, fragments, 25 tissue categories, 12 SNOMED tissue categories

13 SNOMED tissue categories used for co-expression calculation HUMANMOUSE 1 Blood vessel 2 Cardiovascular system 3 Digestive organs 4 Digestive system 5 Endocrine gland- 6 Female genital system 5 Female genital system 7 Hematopoietic system 6 Hematopoietic system 8 Integumentary system 7 Integumentary system HUMANMOUSE 9 Male genital system 8 Male genital system 10 Musculoskeletal system 9 Musculoskeletal system 11 Nervous system10 Nervous system 12 Product of conception - 13 Respiratory system 11 Respiratory system 14 Topographic region - 15 Urinary tract12 Urinary tract

14 Calculating the correlation N  xy – (  x)(  y) r = sqrt( (N  x 2 - (  x) 2 )(N  y 2 – (  y) 2 ) ) Human gene 1: _s_at Mouse gene 1: _at Tissue categoryHuman gene 2: _s_at Mouse gene 2: 97166_at  High correlation:  Low correlation:

15 Co-expression comparison of our ortholog set to the KOG set

16 (4) Using evolutionary conservation of co-expression for function prediction Human Gene A Gene B Human/Mouse Gene A’ Gene B’ Co-expression = Cab (-1<=corr.<=1) Ca’b’ >= Cab  Increases probability that A and B are involved in the same process (Co-expression calculated over 115 tissues in human, 25 in mouse)

17 GO biological process benchmark Biological process: one of the three ‘subroots’ (together with cellular location and molecular function) Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO )

18 Conservation of co-expression used in function prediction

19 The importance of (conserved) co- expression for function prediction Co-expression without conservation can already be used for function prediction Paralogous conservation gives a 2x higher accuracy Orthologous conservation gives a 3x or 4x higher accuracy Alternative for GO Biological Process: KEGG Pathway database  similar results

20 (5) Evolutionary conservation of chromosomal distance and orientation Human Gene A Gene B Distance = Dab (# bp) Orientation = Oab ( , ,  ) Co-expression = Cab (-1<=corr.<=1) Da’b’ <= Dab Oa’b’ == Oab Ca’b’ >= Cab Human/Mouse  Increases probability that A and B are involved in the same process Gene A’ Gene B’ (Co-expression calculated over 115 tissues in human, 25 in mouse)

21 Function prediction using co- expression and chromosomal distance (without conservation)

22 Conservation of chromosomal distance used in function prediction

23 The importance of chromosomal distance and orientation for function prediction Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons) Only genes with distance < 1 Mbp seem to be coregulated Conservation of relative orientation seems to be important only for very close gene pairs Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation

24 Conclusions Orthologous and paralogous relations can be used to improve function prediction Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co- expression and involvement in the same process Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases Future plans: find examples where the function of a protein can be predicted using these methods

25 Credits Martijn Huynen Peter Groenen Others at Comics Others at Organon Bioinf.