Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
PROMoter SCanning/ANalysis tool. Goal Creating a tool to analyse a set of putative promoter sequences and recognize known and unknown promoters, with.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genome Annotation BCB 660 October 20, From Carson Holt.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
 GEP Digital Laboratory Notebook Nick Reeves, Mt. San Jacinto Community College.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
 GEP Implementation at Mt. San Jacinto Community College Nick Reeves, Ph.D.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Annotation of Drosophila primer
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Genomics Education Partnership: a flexible approach to implement Genomic teachings and research in the classroom Matthew W. Wadsworth and Consuelo J. Alvarez,
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Primer on Annotation of Drosophila Genes GEP Workshop – January 2016 Wilson Leung and Chris Shaffer.
Copyright OpenHelix. No use or reproduction without express written consent1.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Lecture/Lab 7.31
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
Annotation of Drosophila
Annotation for D. virilis
Genomics and Personalized Care in Health Systems Lecture 7 Gene Finding (Part 2) Ab initio and Evidence-Based Gene Finding Leming Zhou, PhD School of.
Genomes and Their Evolution
GEP Annotation Workflow
Genome organization and Bioinformatics
Ensembl Genome Repository.
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Problems from last section
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Presentation transcript:

Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06

Outline Introduction to FlyBase Introduction to Ensembl Using web databases to assist annotation of novel sequences

Introduction to FlyBase Available at

Introduction to FlyBase FlyBase is primarily funded by the National Institutes of Health FlyBase consortium includes Drosophila researchers and computer scientists at Harvard University, Indiana University, and University of Cambridge, plus scientists worldwide In addition to the main site at there are also many mirror siteswww.flybase.org

What is FlyBase? It is a comprehensive database of genetic and molecular data for many Drosophila species: Information on genes and mutant alleles Expression and function of gene products Genetic, cytological, molecular map information Data from Berkeley Drosophila Genome Project Data from European Drosophila Genome Project

Introduction to Ensembl Available at

What is Ensembl? Ensembl is a joint project between the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute Ensembl seeks to develop an automated system for the production and maintenance of annotations on eukaryotic genomes These annotations should also be easily accessible to researchers

What is Ensembl? While originally developed for eukaryotes, the Ensembl system has also been used to analyze prokaryotic genomes EBI Genome Review (archaea and bacteria) Most recent version is v38 (Apr 2006) Genomes available include human, chimp, mouse, dog, C. elegans, fruit fly, honey bee, mosquito among others

Ensembl Gene Annotation System All Ensembl gene predictions are based on experimental evidence Predictions based on manually curated Uniprot/Swissprot/Refseq databases UTR’s are annotated only if they are supported by EMBL mRNA records Val Curwen, et al. The Ensembl Automatic Gene Annotation System Genome Res., May 2004; 14:

List of available species in the FlyBase BLAST service to use in a search for sequences homologous to your query Exon View in Ensembl: used to obtain sequence of a gene, exon-by-exon Using Web Databases for Annotation

Motivations for using FlyBase Learn the biological functions of the gene of interest Use FlyBase BLAST service to detect sequence homology to Drosophila species or species related to Drosophila Motivations for using Ensembl Obtain records of gene from multiple databases Obtain coding sequence of each exon of a gene

Walkthrough Typical use of web databases is to identify putative homolog to a D. melanogaster gene We have a novel 20 kb sequence from D. erecta Using RepeatMasker, we masked all drosophila- specific repeats from the sequence Using blastx, we searched this sequence against the Swissprot database blastx results indicate our sequence is similar to the Paired-box protein (Pax6) in D. melanogaster

Function of Pax-6 Clicking on the accession number of the first hit in the blastx output shows that Pax-6 is also known as eyeless We can learn more about eyeless using the FlyBase web Type in eyeless in the search field, then click on the hit “ey” (#17)

Function of Pax-6 This brings up the gene report for eyeless in D. melanogaster We find that eyeless is important for brain and eye development It is expressed in embryo, larva, and adult Phenotypic changes in mutants include changes in the antenna, arista, and eye of the fruit fly

Finding Homologs in Other Species Click on the BLAST button to access the BLAST service Search our masked sequence against D. melanogaster, D. yakuba, D. mojavensis, D. virilis genome assemblies using blastn Most of the species, other than D. melanogaster, are unannotated. Nonetheless, this is useful for finding putative orthologs and for discovering regulatory regions using multiple sequence alignments

Using the Ensembl Database Navigate to Click on “Drosophila melanogaster” to access the data specific for this species In the search box, type in the name “eyeless” then click “Go” We find only one match - CG1464 (the eyeless protein)

Transcripts of eyeless There are four different isoforms of eyeless in D. melanogaster We would typically annotate the most “comprehensive” isoform In this case, isoform D The Fruitfly GeneView provides a general overview of the gene structure and function of eyeless Links to FlyBase, RefSeq, Swiss-Prot, EMBL records of eyeless are also available on this page.

Obtaining Transcript Sequence Click on “Exon Info” for the transcript CG1464-RD This bring us to the exon report for this transcript 9 exons, 3024 bps, 898 residues The sequence is shown with each exon in its own block. Sequence is color-coded: Purple = UTR’s Black = Coding DNA sequences (CDS) Blue = intronic sequences Green = upstream or downstream sequences

Obtaining Peptide Sequence Click on the link “Protein Information” to obtain the peptide sequence of CG1464-RD This bring us to the protein report for this transcript “Protein Family” section shows that there are six gene members in this species Clicking on the link brings up the Family view - allows visualization of multiple sequence alignments of members of this family The peptide sequence has the following color-code: Black/Blue = Alternating text color for exons Red = Residue overlap splice site Green = Synonymous SNP Yellow = Non-synonymous SNP

Next Step Annotate the exact boundaries of each exon in our D. erecta sequence based on sequence homology to D. melanogaster eyeless gene Use exon-by-exon BLAST search with BLAST 2 Sequences (bl2seq)

Questions?

Walk- through example

Determining Exon Boundaries Use bl2seq to determine exon boundaries of the putative ortholog in our D. erecta sequence Go to and select bl2seqwww.ncbi.nlm.nih.gov/blast/ Copy D. erecta sequence and paste into the Sequence 1 box. Copy the first exon of DM eyeless and paste into the Sequence 2 box. Change program to tblastx. Click “BLAST”

Determining Exon Boundaries We find that the first exon corresponds to bases in our sequence We can repeat the previous steps to locate the other exons in our sequence