Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics &

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

PLAZA 2.5 – a resource for plant comparative genomics Michiel Van Bel Bioinformatics & Evolutionary Genomics group Comparative & Integrative Genomics group.
Dissecting plant genomes using PLAZA 2.5 Michiel Van Bel 1,2+, Sebastian Proost 1,2+, Elisabeth Wischnitzki 1,2, Sara Mohavedi 1,2, Christopher Scheerlinck.
The Genome of Melampsora larici-populina, The Poplar Leaf Rust Tree/Microbe Interactions, INRA/Nancy University.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The genome sequence of Melampsora larici-populina the causal agent of the poplar rust disease Inventory and annotation of Mlp Signaling genes Mlp Summer.
Basics of Comparative Genomics Dr G. P. S. Raghava.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et.
Stéphane HACQUARD (INRA NANCY) The secretome of Melampsora larici-populina First results Nancy, workshop Melampsora, august 2008 David JOLY (CFL QUEBEC)
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
Annotation of Tomato Stephane Rombauts Wageningen 18/09/2004 Bioinformatics & Evolutionary Genomics Ghent, Belgium.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA.
New and old regions for Barcoding Teun Boekhout CBS.
Tomato genome annotation pipeline in Cyrille2
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Department of Plant Systems Biology Research at the Bioinformatics & Computational Biology research groups.
Part I: Identifying sequences with … Speaker : S. Gaj Date
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Calculating branch lengths from distances. ABC A B C----- a b c.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Transposable elements in Melampsora larici-populina genome Marie-Pierre Oudot-Le Secq Melampsora Genome Consortium 2008 Summer Workshop Melampsora Genome.
Mark D. Adams Dept. of Genetics 9/10/04
Phylogenetic prediction of gene function Daniel Barker Centre for Evolution, Genes and Genomics, School of Biology, University of St Andrews
Bioinformatics and Computational Biology
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
InterPro Sandra Orchard.
What is BLAST? Basic BLAST search What is BLAST?
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
What is BLAST? Basic BLAST search What is BLAST?
S1 Fig Supplementary Image 1 |Phylogenetic analysis of fungal MAPKs. Phylogenetic tree was constructed by comparing amino acid sequences using the neighbor-joining.
bacteria and eukaryotes
Bioinformatics Overview
The mating type genes of Melampsora larici-populina
Introduction to Bioinformatics Resources for DNA Barcoding
EGASP 2005 Evaluation Protocol
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Genome organization and Bioinformatics
Sequence Based Analysis Tutorial
Comparative Genomics.
Volume 22, Issue 13, Pages (July 2012)
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Basics of Comparative Genomics
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Basic Local Alignment Search Tool
Presentation transcript:

Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics & Evolutionary Genomics VIB Department of Plant Systems Biology, UGent

Overview Gene prediction (structure annotation) Gene family analysis Phylogeney position of Melampsora

EuGène: gene prediction platform EuGène Intrinsic information Extrinsic information FunSiP Coding IMM Intronic IMM Coding IMM Intronic IMM Translation start TE & Repeat database Protein databases ESTs databases Puccinia genomic sequence RepeatMasker TblastX BlastX BlastN GenomeThreader BlastN GenomeThreader start site GT/AG Splice site GT/AG Splice site Content potential for coding, intronic and intergenic Other prediction programs Alternative models Predicted genes Genomic sequence

Resources for Melampsora gene prediction Gene models for training –Previously identified core genes in basidiomycetes –Genes with manual curation from INRA-Nancy Splice site training/prediction –FunSiP: Michiel Van Bel developed it & helped for training BlastX database –8 basidiomycete proteomes, Fungi RefSeq, SwissProt TBLASTX database –Puccinia graminis genomic sequence EST libraries –JGI Sanger sequencing –454 Pyrosequencing (the 1 st mira assembly) Repeat libraries –Hadi/Marie-Pierre. –In-house script, collected from first run of gene prediction. –Masked area from JGI. EuGene 3.4

Gene prediction – comparison of two prediction results EuGeneJGI Number of protein coding genes17,16716,694 Coding sequence < 300 aa6,989 (40.7%)8,212 (49.2%) Average gene length (bp)1,742.71,685.5 Average coding sequence length (bp)1,369.71,131.4 Average exon length (bp) Average exon number Average intron length (bp) SwissProt support6,521 (38.0%)5,699 (34.1%) EST support6,152 (35.8%)6,241 (37.4%) EST support (< 300 aa)1,066995

Gene prediction – similarity distribution of two prediction comparing to SwissProt database

Gene prediction – protein length distribution

Example: metallothionein-like protein Metallothionein-like protein in Magnaporthe Protein length: 22-amino acid (MMT1) Six Cystein residues. Mmt1 mutants loose the ability to cause plant disease. Difficulties in in silicon identification –Sequence divergence. –Short sequence, easily been rejected by E-value cut-off.

Overview Gene prediction and annotation platform Gene family analysis Phylogeny position of Melampsora

Gene family expansion and contraction Gene family clustering –Similarity search with 12 fungi genomes (10 basidiomycetes, 2 ascomycetes), (All-against-all BLASTP, E-value cutoff 1e-5). –Gene families constructed by TribeMCL with inflation factor 4.0. Species/Lineage specific gene family expansions –The mean gene family size and standard deviations were calculate for all gene families (exclude SSFs and orphans). –To center and normalize the data, the matrix of previous profile was transformed into a matrix of z-score. Functional assignment –Domain based: RPS-BLAST –HMM profile for each family -> Search the SwissProt and NR database. –GO terms.

Protein phylogeny profile / z-score ABCMeanSD ABC Protein phylogeny profile Z-score profile Z = Gene number – mean gene number Standard deviation Species specific gene family Core-gene family Genome Family

Fungi genomes characteristics Genome Genome size (Mb) Genes < 300 a.a genes GC content (%) Magnaporthe grisea ,8325,312 (41.4%)51.6 Neurospora crassa ,8223,445 (35.1%)49.3 Sporobolomyces roseus ,714 (31.0%)49.5 Puccinia graminis ,56611,319 (55.0%)43.0 Melampsora larici- populina ,6948,212 (49.2%)42.1 Ustilago maydis 19.76,5221,668 (25.6%)54.0 Malassezia globosa 8.94,2861,468 (34.3%)52.0 Postia placenta ,4154,629 (37.3%)52.4 Phanerochaete chrysosporium ,0483,579 (35.6%)53.2 Laccaria bicolor ,03610,013 (52.6%)46.6 Coprinus cinereus ,5445,487 (40.5%)51.6 Cryptococcus neoformans 19.57,1702,372 (33.1%)

Molecular divergence of Melampsora with other species Pairwise comparison Mean similarity (%) Pairs of comparison Melampsora / Puccinia67.05,101 Melampsora / Sporobolomyces64.03,498 Melampsora / Schizosaccharomyces57.32,944 Melampsora / Arabidopsis53.62,686 Laccaria / Coprinus70.96,300

Orphans / Species specific gene families 1 2 3

Difference in average gene family size Neurospora crassa Magnaporthe grisea Cryptococcus neoformans Coprinus cinereus Laccaria bicolor Phanerochaete chrysosporium Postia placent Malassezia globosa Ustilago maydis Sporobolomyces roseus Puccinia graminis_f._sp._tritici Melampsora larici-populina *Total 8035 families, exclude the species specific families

Hierarchical clustering of gene family N. crassa M. grisea S. roseus P. graminis M. larici-populin U. maydis M. globosa P. placenta P. chrysosporium C. cinereus L. bicolor C. neoformans Top100 most variable profiles, based on the standard deviations were calculated. Red: Protein kinase, esterase lipase, cre recombinase, DNA/RNA helicase, Leucine-rich repeat Blue: major facilitator superfamily

Overview Gene prediction and annotation platform Gene family analysis Phylogeny position of Melampsora

Phylogenies of Melampsora Construct the Melampsora phylogenic tree based on FUNYBASE with selected fungi genomes. FUNYBASE: single-copy gene family (246 genes) within 21 fungi species (mostly ascomycetes). 22 selected species: –Ascomycete : Aspergillus nidulans, Coccidioides immitis, Fusarium graminearum, Mycosphaerella graminicola, Magnaporthe grisea, Neurospora crassa, Nectria haematococca, Pyrenophora tritici- repentis, Stagonospora nodorum, Schizosaccharomyces pombe, Sclerotinia sclerotiorum. –Basidiomycete : Coprinus cinereus, Cryptococcus neoformans, Laccaria bicolor, Malassezia globosa, Melampsora larici-populina, Phanerochaete chrysosporium, Puccinia graminis, Postia placenta, Sporobolomyces roseus, Ustilago maydis –Zygomycete : Rhizopus oryzae *new genome; reject in FUNYBASE

Phylogenies of Melampsora - Method 246 HMM models for the conserved protein sequence blocks in FUNYBASE. For each genome, HMMER search against whole proteome and retain the protein sequence of the best hit in each model. 148 models have single-copy gene in our 22 selected species. Concatenate the 148 single-copy orthologs for tree building.

Melampsora in the phylogenetic tree of fungi using phylo_win, Neighbor joining method with Poisson correction, 500 bootstrap.

Acknowledgements Gent Stephane Rombauts Michiel Van Bel Klaas Vandepoele Kenny Billiau Thomas Abeel Pierre Rouzé Lieven Sterck Yves Van de Peer Nancy Stéphane Hacquard Emilie Tisserant Marie-Pierre Oudot-Le Secq Sébastien Duplessis Francis Martin