Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology www.phylosoft.org www.phyloxml.org.

Similar presentations


Presentation on theme: "Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology www.phylosoft.org www.phyloxml.org."— Presentation transcript:

1 Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology www.phylosoft.org www.phyloxml.org

2 Phylogenomics Original definition the application of phylogenetic information for gene function analysis (Eisen, 1998) Recent usage species evolution based on whole genome analyses (for example, Dunn et al., 2008) various types of studies at the intersection of genomics and phylogenetics 2www.phyloxml.org

3 The application of phylogenetic information for gene function analysis RAT MOUSE HUMAN CIONA RAT CIONA MOUSE HUMAN CIONA RAT CIONA Y Z : query sequence : orthologous to query : most similar to query : gene duplication RAT X Z Y 3www.phyloxml.org

4 What information do we need for a phylogenomic analysis (sequence function analysis type)? In phylogenomic analyzes, tree nodes might be annotated with: Sequence name Species name Duplication: true/false Branches might be annotated with: Branch lengths Support values (bootstrap, probability, …) 4www.phyloxml.org

5 What information might we need for other types of phylogenomic analyses? Support values (possible multiple) Taxonomy information (possibly detailed) Geographic information Host/parasite data (relation between tree nodes) Gene expression values Genomic location Mutations, variation, disease … 5www.phyloxml.org

6 How is this information processed and stored? Tree topologies are described by hierarchical parenthesis: ((A,B),C) Unique tree node labels mapped to text files, spreadsheets, databases Manual processing of text files with text editors Macros, shell scripts, Perl scripts New Hamphshire eXtended (NHX) format Adds tags for different fields: Species: S= Bootstrap support: B= Example: ADH2:0.1[&&NHX:S=human:B=90] http://www.phylosoft.org/forester/NHX.html 6www.phyloxml.org

7 How is this information published? Mostly as images of phylogenetic trees in journals not suitable as input for further studies! Submission to (publicly accessible) databases rare 7www.phyloxml.org

8 Problems with this approach Tedious Error prone Published images are difficult to use as input for further studies Meta-analyzes are hard Different, and incompatible, “dialects” of NHX appeared Limited expressiveness 8www.phyloxml.org

9 phyloXML by example example from Prof. Joe Felsenstein's book "Inferring Phylogenies“ 0.06 A 0.102 B 0.23 C 0.4 9www.phyloxml.org

10 phyloXML Important elements: Taxonomy Sequence Confidence Events (duplication, speciation) Property (“custom data”) Typed relations (between clades, sequences) XSD schema, examples, description, applications: http://www.phyloxml.org/ Current version: 1.o 10www.phyloxml.org

11 Important clade level elements www.phyloxml.org11

12 phyloXML applications/implementations (examples) BioPerl: Parser, writer ATV — A Tree Viewer Java based tree display tool suitable for large (>10 000) and highly decorated phylogenetic/taxonomic trees http://www.phylosoft.org/atv phyloxml_converter Command line tool to convert Newick (NH), NHX, and Nexus formatted trees to phyloXML www.phyloxml.org12


Download ppt "Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology www.phylosoft.org www.phyloxml.org."

Similar presentations


Ads by Google