Presentation is loading. Please wait.

Presentation is loading. Please wait.

Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics &

Similar presentations


Presentation on theme: "Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics &"— Presentation transcript:

1 Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics & Evolutionary Genomics VIB Department of Plant Systems Biology, UGent

2 Overview Gene prediction (structure annotation) Gene family analysis Phylogeney position of Melampsora

3 EuGène: gene prediction platform EuGène Intrinsic information Extrinsic information FunSiP Coding IMM Intronic IMM Coding IMM Intronic IMM Translation start TE & Repeat database Protein databases ESTs databases Puccinia genomic sequence RepeatMasker TblastX BlastX BlastN GenomeThreader BlastN GenomeThreader start site GT/AG Splice site GT/AG Splice site Content potential for coding, intronic and intergenic Other prediction programs Alternative models Predicted genes Genomic sequence

4 Resources for Melampsora gene prediction Gene models for training –Previously identified core genes in basidiomycetes –Genes with manual curation from INRA-Nancy Splice site training/prediction –FunSiP: Michiel Van Bel developed it & helped for training BlastX database –8 basidiomycete proteomes, Fungi RefSeq, SwissProt TBLASTX database –Puccinia graminis genomic sequence EST libraries –JGI Sanger sequencing –454 Pyrosequencing (the 1 st mira assembly) Repeat libraries –Hadi/Marie-Pierre. –In-house script, collected from first run of gene prediction. –Masked area from JGI. EuGene 3.4

5 Gene prediction – comparison of two prediction results EuGeneJGI Number of protein coding genes17,16716,694 Coding sequence < 300 aa6,989 (40.7%)8,212 (49.2%) Average gene length (bp)1,742.71,685.5 Average coding sequence length (bp)1,369.71,131.4 Average exon length (bp)261.1235 Average exon number5.34.8 Average intron length (bp)86.9117.8 SwissProt support6,521 (38.0%)5,699 (34.1%) EST support6,152 (35.8%)6,241 (37.4%) EST support (< 300 aa)1,066995

6 Gene prediction – similarity distribution of two prediction comparing to SwissProt database

7 Gene prediction – protein length distribution

8 Example: metallothionein-like protein Metallothionein-like protein in Magnaporthe Protein length: 22-amino acid (MMT1) Six Cystein residues. Mmt1 mutants loose the ability to cause plant disease. Difficulties in in silicon identification –Sequence divergence. –Short sequence, easily been rejected by E-value cut-off.

9 Overview Gene prediction and annotation platform Gene family analysis Phylogeny position of Melampsora

10 Gene family expansion and contraction Gene family clustering –Similarity search with 12 fungi genomes (10 basidiomycetes, 2 ascomycetes), (All-against-all BLASTP, E-value cutoff 1e-5). –Gene families constructed by TribeMCL with inflation factor 4.0. Species/Lineage specific gene family expansions –The mean gene family size and standard deviations were calculate for all gene families (exclude SSFs and orphans). –To center and normalize the data, the matrix of previous profile was transformed into a matrix of z-score. Functional assignment –Domain based: RPS-BLAST –HMM profile for each family -> Search the SwissProt and NR database. –GO terms.

11 Protein phylogeny profile / z-score ABCMeanSD 151015105 246551 3 2020 5 11.77.6 100111 2010100 ABC 101 2 10 31.1-0.9-0.2 Protein phylogeny profile Z-score profile Z = Gene number – mean gene number Standard deviation Species specific gene family Core-gene family Genome Family

12 Fungi genomes characteristics Genome Genome size (Mb) Genes < 300 a.a genes GC content (%) Magnaporthe grisea 41.712,8325,312 (41.4%)51.6 Neurospora crassa 39.239,8223,445 (35.1%)49.3 Sporobolomyces roseus 21.155361,714 (31.0%)49.5 Puccinia graminis 88.6420,56611,319 (55.0%)43.0 Melampsora larici- populina 101.116,6948,212 (49.2%)42.1 Ustilago maydis 19.76,5221,668 (25.6%)54.0 Malassezia globosa 8.94,2861,468 (34.3%)52.0 Postia placenta 90.912,4154,629 (37.3%)52.4 Phanerochaete chrysosporium 35.110,0483,579 (35.6%)53.2 Laccaria bicolor 64.919,03610,013 (52.6%)46.6 Coprinus cinereus 37.513,5445,487 (40.5%)51.6 Cryptococcus neoformans 19.57,1702,372 (33.1%)48.2 1 2 3

13 Molecular divergence of Melampsora with other species Pairwise comparison Mean similarity (%) Pairs of comparison Melampsora / Puccinia67.05,101 Melampsora / Sporobolomyces64.03,498 Melampsora / Schizosaccharomyces57.32,944 Melampsora / Arabidopsis53.62,686 Laccaria / Coprinus70.96,300

14 Orphans / Species specific gene families 1 2 3

15 Difference in average gene family size Neurospora crassa Magnaporthe grisea Cryptococcus neoformans Coprinus cinereus Laccaria bicolor Phanerochaete chrysosporium Postia placent Malassezia globosa Ustilago maydis Sporobolomyces roseus Puccinia graminis_f._sp._tritici Melampsora larici-populina *Total 8035 families, exclude the species specific families

16 Hierarchical clustering of gene family N. crassa M. grisea S. roseus P. graminis M. larici-populin U. maydis M. globosa P. placenta P. chrysosporium C. cinereus L. bicolor C. neoformans Top100 most variable profiles, based on the standard deviations were calculated. Red: Protein kinase, esterase lipase, cre recombinase, DNA/RNA helicase, Leucine-rich repeat Blue: major facilitator superfamily

17 Overview Gene prediction and annotation platform Gene family analysis Phylogeny position of Melampsora

18 Phylogenies of Melampsora Construct the Melampsora phylogenic tree based on FUNYBASE with selected fungi genomes. FUNYBASE: single-copy gene family (246 genes) within 21 fungi species (mostly ascomycetes). 22 selected species: –Ascomycete : Aspergillus nidulans, Coccidioides immitis, Fusarium graminearum, Mycosphaerella graminicola, Magnaporthe grisea, Neurospora crassa, Nectria haematococca, Pyrenophora tritici- repentis, Stagonospora nodorum, Schizosaccharomyces pombe, Sclerotinia sclerotiorum. –Basidiomycete : Coprinus cinereus, Cryptococcus neoformans, Laccaria bicolor, Malassezia globosa, Melampsora larici-populina, Phanerochaete chrysosporium, Puccinia graminis, Postia placenta, Sporobolomyces roseus, Ustilago maydis –Zygomycete : Rhizopus oryzae *new genome; reject in FUNYBASE

19 Phylogenies of Melampsora - Method 246 HMM models for the conserved protein sequence blocks in FUNYBASE. For each genome, HMMER search against whole proteome and retain the protein sequence of the best hit in each model. 148 models have single-copy gene in our 22 selected species. Concatenate the 148 single-copy orthologs for tree building.

20 Melampsora in the phylogenetic tree of fungi using phylo_win, Neighbor joining method with Poisson correction, 500 bootstrap.

21 Acknowledgements Gent Stephane Rombauts Michiel Van Bel Klaas Vandepoele Kenny Billiau Thomas Abeel Pierre Rouzé Lieven Sterck Yves Van de Peer Nancy Stéphane Hacquard Emilie Tisserant Marie-Pierre Oudot-Le Secq Sébastien Duplessis Francis Martin


Download ppt "Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics &"

Similar presentations


Ads by Google