Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer Bork EMBL & MDC Heidelberg & Berlin Proteome analysis in silico.

Similar presentations


Presentation on theme: "Peer Bork EMBL & MDC Heidelberg & Berlin Proteome analysis in silico."— Presentation transcript:

1 bork@embl.de http://www.bork.embl-heidelberg.de/ Peer Bork EMBL & MDC Heidelberg & Berlin Proteome analysis in silico

2 ‘omics – research on an entirety of biomolecular objects Proteomics – research on the entirety of proteins (so far in an organism) coined beginning of the 90th Original intention exemplified by the genome: Common Praxis: ‘omics - used to describe large-scale approaches (whereby large is sometimes 1) ‘omes: use and misuse Proteomics - used for research on many proteins (whereby many might mean 3) ‘ome – entirety of biomolecular objects (ALL genes etc)

3 Protein profiling and interaction proteomics Originally two main directions: Protein profiling: establishment of protein inventories under controlled conditions (organelles, tissues, organisms). Interaction proteomics: identification of temporally and spatially defined functional modules formed by proteins Bioinformatics analysis is essential in both areas

4 Part I Part II Protein detection and annotation by homology and orthology (function in1D) Protein interactions and protein networks (function in 2D) Proteome analysis in silico Temporal and spatial considerations (function in 3D+4D)

5 AlternativeSplicing Genomeannotation Bork et al. JMolBiol 1998 Domain analysis Protein networks Literature mining coupled to genomic data

6 70% prediction accuracy is great!

7 Concepts in function prediction Homology-based (intrinsic molecular features) Gene context (functional associations) - Sequence and domain DBs (Blast, Pfam,Smart) - Gene neighbourhood, fusion, co-occurrence - Shared regulatory elements Other (residue level, functional class ) - Correlated mutations - Interaction threading - Function transfer by orthology - Feature analysis

8 www. bork.embl-heidelberg.de I. Homology-based protein annotation Metazoan proteome analysis: human vs chicken Evolution of protein function Metazoan genome annotation: the dark side… Homology detection and domain annotation

9 Status of homology based function prediction Many homologues, an increasing number of predictable folds, but tough times for automatic function prediction

10 Molecular Functions have to be defined on a domain basis i.e. separately for each structurally independent unit within a sequence Henikoff et al. 1997 Science 278, 609

11

12 History of signaling domain discovery Systematic discovery by 1) searching ‘in between’ regions 2) starting with repeats Doerks et al. 2002 Genome Res. Ponting et al. 2001 Genome Res.

13 Domain discovery in disease genes

14 SMART Blast-like input - - Access to different databases - - Domain annotation & architecture www.smart.embl-heidelberg.de Collaboration with Chris Ponting - - Alerting

15 Digested output -signal sequence, Coiled coil and TM -Pfam integrated SMART -comparison of domain context www.smart.embl-heidelberg.de

16 Calpain7 MIT Spastin SKD1 protein VPS4p ATPase (Vacuolar protein sorting factor 4A and 4B) Tobacco mosaic virus helicase domain-binding protein MIT Sorting nexin 15 MIT RSK-like protein MIT Similar to ribosomal protein S6 kinase MIT CG8866 MIT Ciccarelli, F. D., et al. Genomics 81(03)437Patel, H. et al. Nat Genet 31(02)347, Spartin Mutation MIT Plant-related A putative transport-associated microtubule-binding domain Unifying disorders associated to hereditary spastic paraplegia?

17 www. bork.embl-heidelberg.de I. Homology-based genome annotation Metazoan proteome analysis: human vs chicken Evolution of protein function Metazoan genome annotation: the dark side… Homology detection and domain annotation Metazoan genome annotation: the dark side…

18 21 Number of human genes in time Aug00Apr01Oct00Dec00Feb01Feb00 0 100 120 20 40 80 60 HGS, Incyte and co Textbooks, public opinion Celera HGP 38 32 52 39 27 24 22 No human genes in thousands HGS others Basis for Feb 01 publications 10T 8T 6T 4T 2T NEMAX50 index Jan05 10T 8T 6T 4T 2T TecDAX index

19 Improvement of gene cluster predictions 8 genes / 11 pseudogenic fragments Mouse chr4:94-94,6 Mb p450 (CYP2J) region: 8 genes / 11 pseudogenic fragments cyp2j6cyp2j9cyp2j5 Known genes cyp2j13 ESTs Twinscan (1 gene) GeneID (3 genes) fgenesh++ (13 genes) ENSEMBL (9 genes) (comparison performed in 2004)

20 BLAST2GENE finds independent gene copies BLAST of cyp2j13 protein vs. Mouse chr4:94-94,6 Mb ~ 150 Alignments 355 2662 22025 21614259635425245704 733 10328106461857619633 49573955 12288 1978 816 126262869024 1298325664 294 20546 25638844 19731 600 507430897684 22780 644 19940164511458713029 2403 23116 47173443 20352 775248 241231808678186354821988328021113613 383 95477380 15275 29601772352216563839 362 15495141 14703 9639 13461 986 328914525270 11826 986 328914525270 11826 12983 25664 294 20546 355 26625482 22025 BLAST2GENE 548276742499996095022772 Hundrets often considerable differences to current gene prediction pipelines!

21 regions containing independent elements Merging of fragments of the same element 1. Similarity search in intergenic regions Masking of known repeats and already predicted genes 1.5-2 million fragments fragments with significant sequence similarity BLASTX vs nr prot. db E-value < 0.001 Exclusion of transposon and virus derived sequence Closest known protein (first blast hit) GENEWISE Torrents, Suyama, Bork Genome Res. 13(2003)2550 Annotation of pseudogenes changes gene numbers Ka/Ks functionality check Ca 20.000 detectable pseudogenes in each: human, mouse, rat

22 Still >3000 pseudogenes among the predicted human genes mid 2004 (build 34) e1e2 Processed Pseudogene Genewise prediction using sptrembl|Q9HBM5 e3e4 e5 e6 Processed Pseudogene Genewise prediction using SwissProt|RS2_RAT 80 kb Predicted Gene Mm chr1:7608644-7681026 Stop codon or frameshift 2. Consistency check of gene predictions Annotation of pseudogenes changes gene numbers Arrays, chips et al. 20%off?

23 genes Protein diversity 20-40k genes >100k transcripts >1000k proteins? What do we count?

24 Rate of detectable alternative splicing depends on EST coverage and library range 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 AS per mRNA (x) Brett et al. Nature Genet. 30(2002)29

25 www. bork.embl-heidelberg.de Boue et al. Bioessays 03

26 Homology-based predictions of exons and alternative transcripts ( www.smart.embl-heidelberg.de) SMART domain DB links to genomes

27 Top 10 domains* in human: 30% diff.! humanflyworm Immunoglobulin C2H2zinc finger *Only no of genes given, no of domains higher; note that only around 90% is sequenced Protein kinase Rhod.-like GPCR P-loop NTPase Rev.transcriptase RRM (RNA-binding) WD40 (G-protein) Ankyrin repeat 765 (381) 14064 706 (607) 357151 575 (501) 319437 569 (616) 97358 433 198183 350 1050 300 (224) 15796 277 (136) 162102 276 (145) 105107 1330018200 Nature 409 (01)860; Science 291(01)1304 Total no genes Species Homeobox 267 (160) 148109 26500(26500)

28 Metazoan genome annotation an ongoing process and far from complete n >2000 pseudogenes in mammalian gene sets: Only now they are about to be included in prediction pipelines n Ca 150 retro-related genes in mammalian gene sets (>1000 in 2004), but true human genes sometimes suppressed n Annotation of gene clusters need considerable improvements n Alternative splicing still a major unknown n Considerable human factor in annotation

29 www. bork.embl-heidelberg.de I. Homology-based genome annotation Metazoan proteome analysis: human vs chicken Evolution of protein function Metazoan genome annotation: the dark side… Homology detection and domain annotation Metazoan genome annotation: the dark side… Metazoan proteome analysis: human vs chicken

30 Human: Nature Feb 2001 Mouse: Nature Dec 2002 Mosquito: Science Oct 2002 Rat: Nature Apr 2004 75 40 mouse rat chicken chimp 310MY fugu 450MY 600-1200MY? ? C.eleg. D.mena. 250MY mosquito 5 human chicken: Nature Dec 2004

31 Chicken genome analysis Zdobnov et al Science 02 15% 45% Hillier et al Nature 04

32 Chicken genome analysis: orthology and cellular processes 75.4% identity (median) between chicken and human 1:1 orthologs Immune response evolves fastest

33 www. bork.embl-heidelberg.de Chicken genome analysis: Innovation and Expansion of domain families

34 Orthology analysis reveals more subtle functional changes

35 Evolution by duplication: Burst of an olfactory receptor family …thought to recognize MHC diversity chicken human …221 copies in chicken …given a ca 300 ORs in chicken and 450 in human

36 Chicken genome analysis: Evolution of function by domain accretion Scavenger receptor cysteine-rich domain acquired by a fibrinogen-domain containing protein (identified and displayed by SMART)

37 www. bork.embl-heidelberg.de I. Homology-based genome annotation Metazoan proteome analysis: human vs chicken Evolution of protein function Metazoan genome annotation: the dark side… Homology detection and domain annotation Metazoan proteome analysis: human vs chicken Evolution of protein function

38 Phylogenetic Distribution of orthologs - Losses

39 Sterol Metabolism Squalene monooxygenase (EC 1.14.99.7) --xx-xx 7-dehydrocholesterol reductase (EC 1.3.1.21) --xxxxx Farnesyl-diphosphate farnesyltransferase ( EC 2.5.1.21) --xx-xx Lanosterol synthase (EC 5.4.99.7) --xx-xx --xx-xx 3-oxo-5-alpha-steroid 4-dehydrogenase 1 (EC 1.3.99.5) --x-xxx C-5 sterol desaturase (EC 1.3.3.2) Ergosterol biosynthesis --xx-xx Cytochrome P450 P51, sterol 14-alpha demethylase --xx-xx diminuto/24-dehydrocholesterol reductase ('seladin1') --x-xxx Biosynthesis of NAD Kynureninase (EC 3.7.1.3) ---xxxx 3-hydroxyanthranilate 3,4-dioxygenase (EC 1.13.11.6) synthesis of excitotoxin quinolinic acid ---xxxx Quinolinate phosphoribosyltransferase (EC 2.4.2.19) --xx-xx DNA-methylation and repair DNA (cytosine-5)-methyltransferase 1  ) --x--xx uracil-DNA glycosylases --x-xxx DNA-(apurinic or apyrimidinic site) lyase (EC 4.2.99.18) ---xx-- D A P Y W H M Gene loss in diptera

40 Functional changes at evolutionary time scales Orthologs mapped onto metazoan phylogeny

41 Summary (homology-based function prediction) Emphasis in homology based genome annotation shifts from sensitivity (e.g. domain identification) to selectivity issues (orthology assignment for 1:1 function transfer) Metazoan genome annotation is far from being complete and caution is needed when using incomplete and partially erroneous parts list (e.g. when predicting networks) Yet, with the incoming number of metazoan genomes our understanding of functional diversification at the protein level will increase dramatically....although the proteome remains far from being deciphered


Download ppt "Peer Bork EMBL & MDC Heidelberg & Berlin Proteome analysis in silico."

Similar presentations


Ads by Google