DeepFin will Advance The Phylogeny of Fishes A Research Coordination Network 1.To promote fish phylogenetics (resolve the fish tree!) 2.To develop cyberinfrastructure, a portal for fish phylogenetics (www.deepfin.org) with networking tools and interconnected relational databases 3.To develop educational material to foster education on fish biodiversity, fish evolution, and current knowledge on the phylogenetic relationships of fishes
1.To promote fish phylogeneticshow far are we from the tree of all fishes?? Integrate all sources of information: Morphology Genetics Genetics Paleontology DeepFin will Advance The Phylogeny of Fishes A Research Coordination Network
Issues with molecular phylogenies based on a single gene or few loci Low resolution or low support (characters v taxa) Conflicts among trees inferred from different loci. –Analytical reasons (base compositional bias / long branch attraction / heterotachy).
GC% at the 3rd codon position of RAG 1 Mean 0.5 1 ElasmobranchiiTetrapodaPolypteriformesBasal actinop.Osteoglosso. ElopomorphaOstariophysi Clupeomorpgha Protacanthop. Stomiiformes Basal neoteleosts Paracanthop. Acanthopterygii Ogcocephalus Lophiiformes Colisa Arnoglossus Sparus Trigla scorpaenids Gonostoma galaxiids Albula Albuliformes Megalops Elops Muraenesox Zeus Gasterosteus 0 Engraulis
Issues with molecular phylogenies based on a single gene or few loci Low resolution or low support (characters v taxa) Conflicts among trees inferred from different loci. –Biological reasons (gene tree vs. organismal tree)
Lineage sorting Gene duplication Horizontaltransfer Gene trees within organismal trees Gene trees within organismal trees
Phylogenomics: use many (genome- scale) loci to infer phylogeny Large number of characters will increase statistical power Analysis of many independent loci may reduce systematic error Genome-scale nuclear gene markers will be more likely to represent organismal evolution
How to collect phylogenomic data (from multiple loci) Using available genome databases (model organisms) Sequencing cDNA/EST libraries Directly amplify and sequence target fragments from genomic DNA using universal nuclear markers How can we find new universal nuclear gene markers??? How can we find new universal nuclear gene markers???
Three criteria to choose good nuclear gene markers* single-copy genes (so, what about gene duplications?) 1)Orthologous genes should be easy to identify and amplify in all taxa of interest. To minimize the chance of mistaken paralogy, we seek only single-copy genes (so, what about gene duplications?) * Chenhong Li (UNL) and Guoqing Lu (UN-Omaha)
2) The amplicon (i.e. target sequences amplified by the PCR primers) should be of reasonable size (exons >800 bp). zebrafish elongation factor 1-alpha (ef1a) Three criteria to choose good nuclear gene markers
Gonadotropin-releasing hormone 3) The gene should be reasonably conserved, so universal primers can be designed and the sequences can be easily aligned. Three criteria to choose good nuclear gene markers
If we agree with these 3 criteria (single copy, long exon, reasonable conservation) for good nuclear makers, randomly testing genes provides a poor chance to finding a good marker (additional criteria are possible) Directly apply the 3 criteria to screen genomes of two model organisms, zebrafish (Danio rerio) and pufferfish (Takifugu rubripes). If we agree with these 3 criteria (single copy, long exon, reasonable conservation) for good nuclear makers, randomly testing genes provides a poor chance to finding a good marker (additional criteria are possible) Directly apply the 3 criteria to screen genomes of two model organisms, zebrafish (Danio rerio) and pufferfish (Takifugu rubripes).
Scheme of our marker-developing strategy 130 candidate loci were identified in silico
109 are located on 24 of the 25 chromosomes (21 with no location information). Chi-square test did not reject the Poisson distribution of these markers (p=0.0746). Distribution of 109 candidate markers in zebrafish chromosomes
Size range: from 802 bp to 5811 bp in zebrafish. Base composition: GC content ranges from 41.6% to 63.9% in zebrafish. Identity: of these markers between zebrafish and pufferfish ranges from 77.3% to 93.2%. Summary of the 130 candidate loci
A random sample of 15 candidate markers was examined in 52 ray-finned fish taxa (40/47 orders of Actinopterygii). PCR primers were designed to conserved regions (nested PCR strategy) 10 out of the 15 markers tested were successfully amplified by PCR from genomic DNA in most taxa Experimental test of the candidate markers Experimental test of the candidate markers
Marker * Exon ID PCR Fragment Size (bp) No. of PI sites Average p-distance § zic1 ENSDARE00000015655 9453440.156 myh6 ENSDARE00000025410 7353290.179 RYR3 ENSDARE00000465292 8374210.210 ptr ENSDARE00000145053 7083720.205 tbr1 ENSDARE00000055502 7233130.189 ENC1 ENSDARE00000367269 8103600.180 Gylt ENSDARE00000039808 8825100.211 SH3PX3 ENSDARE00000117872 7083170.167 plagl2 ENSDARE00000136964 6903450.173 sreb2 ENSDARE00000029022 9873870.149
PI, parsimony informative sites; SDR, standard deviation of substitution rates among three codon positions; CI-MP, consistency index;, gamma distribution shape parameter; RCV, relative composition variability. Treeness, ratio of internal branch length to total branch length.
Summary Gene markers that satisfied the three criteria are widely distributed in zebrafish genome Ten out of 15 markers tested seem useful for phylogenetic inference. Their profiles are comparable to the popular RAG1 gene The strategy is successful! –The new markers developed will help to infer the tree of ray-finned fishes –The bioinformatic tool developed can be used in other taxonomic groups (S: similarity may vary)