Drosophila Genomics Where are we now? Where are we going? Christopher Shaffer, Wilson Leung, Sarah Elgin Dept of Biology; Washington University in St.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

GENOME SEQUENCING AND OBJECTIVES
Genomics Chapter 18.
Sequence Similarity Searching Class 4 March 2010.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Richard, Rochelle, Zohal, Angie
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Central Dogma Information storage in biological molecules DNA RNA Protein transcription translation replication.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Applications of Genetics 1. Gene Therapy – Pg 248 & 267 introducing correct gene to “cure” genetic disease 2. Polymerase Chain Reaction Pg making.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Sequencing a genome (a) outline the steps involved in sequencing the genome of an organism; (b) outline how gene sequencing allows for genome-wide comparisons.
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008.
Mouse Genome Sequencing
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
AP Biology A Lot More Advanced Biotechnology Tools Sequencing.
The Strange Case of the Dot Chromosome of Drosophila Sarah C R Elgin Bio 4342 Copyright 2014, Washington University.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Programs and Web Tools Status Update GEP Alumni Workshop Wilson Leung 08/05/2011.
FINISHING WORKSHOP APRIL 2008 CHROMOSOME 7 THE FRENCH CONTRIBUTION TG216 TG438 T1112 T1355 T1328 T1428 T1962 T1414 T1497 T0676 TM18 CT54 T0966 T0731 TM15.
Chapter 21 Eukaryotic Genome Sequences
Construction of Substitution Matrices
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genomics Education Partnership: a flexible approach to implement Genomic teachings and research in the classroom Matthew W. Wadsworth and Consuelo J. Alvarez,
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Primer on Annotation of Drosophila Genes GEP Workshop – January 2016 Wilson Leung and Chris Shaffer.
Construction of Substitution matrices
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
What is BLAST? Basic BLAST search What is BLAST?
26 th July 2006 Christine Nicholson, Mapping Core Group Karen McLaren, Finishing Group Leader Wellcome Trust Sanger Institute Sequencing the Gene Space.
The Genomics Education Partnership, 2011 Charles Hauser 1, Wilson Leung 3, Chris Shaffer 3, David Lopatto 2, Sarah Elgin 3 and faculty and students of.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
Virginia Commonwealth University
Human Genome Project.
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Basics of BLAST Basic BLAST Search - What is BLAST?
Pre-genomic era: finding your own clones
GEP Annotation Workflow
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Projects from the D. grimshawi
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Identify D. melanogaster ortholog
Explore Evolution: Instrument for Analysis
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Evolution of Genomes Chapter 21.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Human Genome Project Seminal achievement. Scientific milestone.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
A Lot More Advanced Biotechnology Tools
Presentation transcript:

Drosophila Genomics Where are we now? Where are we going? Christopher Shaffer, Wilson Leung, Sarah Elgin Dept of Biology; Washington University in St. Louis

Chris Shaffer; Washington University 10 DAYS Inexpensive and easy to culture Simple genome, good reference sequence Metazoan: development, behavior, genetic disease Many species: population and evolution data wt gene mutant

Chris Shaffer; Washington University Most Drosophila species have a “dot” chromosome Dot has two parts Centromere Unbanded Banded

Chris Shaffer; Washington University Genomic sequencing Multiple steps to get to high quality data 1.Produce raw data –streamlined; robotic; cheap –good coverage; still many gaps; low quality regions 2.Finishing/sequence improvement –semi-automated and manual labor –expensive, slow

Chris Shaffer; Washington University 12 Drosophila genomes 12 Drosophila genomes have been sequenced. Only melanogaster has been “finished”. All others have been sequenced to much lower quality using a whole genome shotgun approach without any finishing or sequence improvement; many gaps and low quality regions remain.

Chris Shaffer; Washington University Research: Focus on Five Species Drosophila melanogaster Drosophila erecta Drosophila mojavensis Drosophila virilis Drosophila grimshawi

Chris Shaffer; Washington University Drosophila grimshawi After whole genome shotgun sequencing and assembly: scaffolds 214,362,671 total bases 5869 gaps 11,750,600 bases Average gap size: 2002 bases

Chris Shaffer; Washington University D. grimshawi “dot” chromosome Need to find which of the scaffolds make up the dot chromosome –Observations from our sequencing of Dvir suggest most “dot” genes in Dmel will stay on the dot through evolution. –Do BLAST similarity searches –Results identify two scaffolds, and 24861

Chris Shaffer; Washington University D. grimshawi scaffold ~1.0 million bases long Dmel dot gene similarities in entire scaffold This is the first region that will be the focus of the GEP 68 gaps; ~3.27% data missing in these gaps Low quality? Missasemblies?

Chris Shaffer; Washington University D. grimshawi scaffold ~0.1 million bases long Dmel dot gene similarities in the beginning 94kb This is the second region that will be the focus of the GEP 11 gaps; ~4.75% data missing in these gaps Low quality? Missasemblies?

Chris Shaffer; Washington University Obtaining data All reads generated by the NIH-funded genome centers are deposited in the NCBI trace archive

Chris Shaffer; Washington University Drosophila grimshawi traces There are ~3 million reads for D.grimshawi

Chris Shaffer; Washington University The assembly of D. grimshawi The assembly is also available for download. This includes a file with the position of every read. This is the “reads.placed” file. Problem: Trace archive names  read names

Chris Shaffer; Washington University Finding reads for scaffold List of read positions Info on traces Local database of traces with relevant information Trace archive

Chris Shaffer; Washington University Divide and conquer Scaffolds 25011, assembly of 3kb, 10kb and 40kb clones 41 fosmids (40 kb clones) make a “golden path” Reads placed FosmidsTraceProject

Chris Shaffer; Washington University Obtain fosmids (DNA) Fosmids can be ordered from the Drosophila Genomics Resource Center

Chris Shaffer; Washington University Where are we now? 1.Fosmids: DNA ready for sequencing 2.Raw data: “projects” sorted by fosmid 3.Students: learning to finish Time to do some genomics!