Wfleabase.org/docs/arthropod-gene-finding/ Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia’s many tandems were.

Slides:



Advertisements
Similar presentations
Proteogenomics: Refining and Improving Genome Annotation Samuel H Payne J Craig Venter Institute.
Advertisements

RNA-Seq based discovery and reconstruction of unannotated transcripts
Generic model/many/my organism database toolkit Dec 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Perfect Arthropod Genes Constructed from Gigabases of RNA May/June 2012Don Gilbert Biology Dept., Indiana University
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Peter Tsai Bioinformatics Institute, University of Auckland
Paula Mabee, University of South Dakota Eva Huala, Carnegie Institution for Science Andy Deans, North Carolina State University Suzanna Lewis, Lawrence.
The Sense of Sequense The Sense of Sequense Chris Evelo BiGCaT Bioinformatics Universiteit Maastricht.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
The human genome Lander et al Venter et al Patil et al Gabriel et al Reich et al Sabeti et al Yu et al
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Bootcamp: Data Resources1 Paul Bain Reference and Education Services Librarian Countway Library of Medicine Countway.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Many genes have unknown function 30% have unknown function only 9% are experimentally verified The Arabidopsis Genome Initiative, Nature 2000 of the 25,498.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
WFleaBase Daphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept Don Gilbert,
Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.
The Hymenoptera Genome Database (HGD, is an informatics resource supporting genomics of hymenopteran insect species. It currently.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Generic model/many/my organism database Oct 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
16 x 24 U 3’ row 2 U 5’ col. 4 The Complete Arabidopsis Transcriptome MicroArray (CATMA) Project Most cDNA clones included in DNA arrays are identified.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Why do we need good quality annotations? Pankaj Jaiswal Oregon State University Gene Annotation Workshop July 31, 2010 ASPB Plant Biology 2010 Montreal,
The iPlant Collaborative
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Toward a Unified Gene Page GMOD Meeting, April 2004 Don Gilbert,
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Bulk data files // TeraGrid uses for Genome Databases GMOD meet, June 2006 Don Gilbert,
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Managing Next Generation Sequence Data with GMOD Dave Clements 1, Scott Cain 2, Paul Hohenlohe 3, Nicholas Stiffler 3, Paul Etter 3, Eric Johnson 3, William.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Contribution of Epigenetic Variation to Expression Changes Among Tissues and Genotypes Steve Eichten – Springer Lab PAG iPlant Workshop 1/17/12.
Genomics and Forensics
EB3233 Bioinformatics Introduction to Bioinformatics.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Translational evidence and the accuracy of prokaryotic gene annotation Luciano Brocchieri Department of Molecular Genetics & Microbiology and Genetics.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
What is BLAST? Basic BLAST search What is BLAST?
NCGAS provides A specific goal is to provide dedicated access to memory rich supercomputers customized for genomics studies, including Mason and other.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
What is BLAST? Basic BLAST search What is BLAST?
Daphnia Genome Preview at wFleaBase.org
ENCODE Pseudogenes and Transcription
Introduction to G-OnRamp
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Follow-up from last night: XSEDE credits
Ortholog identification and summaries.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Schematic representation of a transcriptomic evaluation approach.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

wfleabase.org/docs/arthropod-gene-finding/ Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia’s many tandems were lost for a bit Duplicate genes, a bain and a boon Genome tile expression picks out many more April 2008Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University

wfleabase.org/docs/arthropod-gene-finding/ Environ Stresses find Novels Novel Daphnia genes show under stress Novel Drosophila species genes are missed by prediction

wfleabase.org/docs/arthropod-gene-finding/ Duplicate genes are common Daphnia surpasses C.elegans for rich tandem gene set. Bugs have many tandem genes

wfleabase.org/docs/arthropod-gene-finding/ Duplicates confuse Finders Prediction errors are common in duplicate gene regions. None of 13 predictors found all 4 tandems of this Dwil P450 cluster, but each gene was properly predicted among them.

wfleabase.org/docs/arthropod-gene-finding/ Duplicates find Errors Duplicates solve prediction dilemma in Drosophila. Prediction cline is artifact of Dmel training. Retraining with Dmoj removes it.

wfleabase.org/docs/arthropod-gene-finding/ Odorant genes concur Curation of Drosophila Obp genes also removes prediction cline. Vieira et al. (2007), and further analysis by myself recovered genes using Psi-Blast trained on species Obp genes. Computational errors are significantly more common in Far-, Mid-mel group. Obp genes show no overall gain/loss across groups.

wfleabase.org/docs/arthropod-gene-finding/ Tile expression finds genes Daphnia tile expression with gene finding calls 26% coding bases over the genome, compared to 17% from gene predictions, or 5, ,000 new genes. Manak et al 2006, with Drosmel also found 24% CDS/genome, up from 18% CDS/genome from reference gene set. Computational tools need to mature; gene finding is preliminary.

wfleabase.org/docs/arthropod-gene-finding/ Summary: Locating novel genes 1.More genes are expressed in unusual environs, and are specific. Use many environmental, developmental and tissue conditions to see range of genes via expression. Understand the limits of gene homology. 2.Duplicate genes are common, a problem, an aid to finding genes. Examine duplicate genes carefully. Tools that distinguish these can be used to find paralogs missed by traditional methods. 3.Near species training reduces errors and spurious effects. Use same- species and near-species data as much as possible in preparing automated annotations. Be aware of and control for informant species-distance as a source of bias. 4.Genome-wide tile expression finds more genes. As an alternative to EST studies, it has values and drawbacks. Computational methods need to improve to use this data well.

wfleabase.org/docs/arthropod-gene-finding/ Genome maps on your laptop Genome data sets that I use are available for your computer. Includes GMOD GBrowse software in a ready-to-run bundle* * This is fully configured for Intel-MacOSX 10.5, others need further installation. See Map data (large) are at ftp://eugenes.org/eugenes/gbrowse/databases/ daphnia_pulex : Daphnia genome data from wfleabase.org nasonia : Wasp gene predictions, homology, EST tribcas : Tribolium basic gene set from NCBI genomes drospege : 12 Drosophila genomes drosmel : Dros. mel rel 5.5 genome with Affymetrix transcriptome data

wfleabase.org/docs/arthropod-gene-finding/ End note Acknowledgements I am grateful to support from NSF (DBI ) and the NIH, including TeraGrid award for making this work possible. Daphnia sequencing and portions of the analyses were provided by DOE Joint Genome Institute and in collaboration with the Daphnia Genomics Consortium (DGC). References Gilbert, New and old genes in Drosophila genomes. Gilbert, Daphnia gene duplicates. Gilbert, Tandem genes lost + found. Manak, JR et al., unannotated transcription in Dros. mel. Nature Genetics, doi: /ng1875 Vieira, F.G. et al analysis of the Odorant-Binding genes in Drosophila genomes. Genome Biology, doi: /gb r235