NHGRI/NCBI Short-Read Archive: Data Retrieval Gabor T. Marth Boston College Biology Department NCBI/NHGRI Short-Read.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Base quality and read quality: How should data quality be measured? Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
SOLiD Sequencing & Data
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Bioinformatics for next-generation DNA sequencing Gabor T. Marth Boston College Biology Department BC Biology new graduate student orientation September.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics tools for next-generation sequence analysis Gabor T. Marth Boston College Biology Department University of Michigan October 20, 2008.
A Contract Research and Services Organization. Ideas to Life! A Contract Research and Services Organization  Xcelris is a Specialty Contract Research.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
Todd J. Treangen, Steven L. Salzberg
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
Next Generation DNA Sequencing
Remember the limitations? –You must know the sequence of the primer sites to use PCR –How do you go about sequencing regions of a genome about which you.
Announcements: Proposal resubmission deadline 4/23 (Thursday).
3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.
Gene Expression Omnibus (GEO)
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Two powerful transgenic techniques Addition of genes by nuclear injection Addition of genes by nuclear injection Foreign DNA injected into pronucleus of.
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
No reference available
Informatics challenges for next-generation sequence analysis
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Next-generation sequencing: the informatics angle
Biotech. Southern Blotting Through a series of steps, DNA that has been separated by electrophoresis is applied to a membrane of nylon or nitrocellulose.
First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”
Next-generation sequencing: the informatics angle Gabor T. Marth Boston College Biology Department CHI Next-Generation Data Analysis meeting Providence,
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Research Techniques Made Simple: Next-Generation Sequencing:
Introduction to Bioinformatics Resources for DNA Barcoding
The is a Critical Resource for Developing and Refining Trait-Predictive DNA Tests Cameron Peace, Daniel Edge-Garza, Terry Rowland, Paul Sandefur.
Phylogeny - based on whole genome data
Section 3: Gene Technologies in Detail
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
DNA-based technology New and old technologies that are utilized in biotechnology DNA cloning DNA libraries Polymerase chain reaction (PCR) Genome sequencing.
Discovery tools for human genetic variations
Genome organization and Bioinformatics
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Biological Databases BI420 – Introduction to Bioinformatics
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

NHGRI/NCBI Short-Read Archive: Data Retrieval Gabor T. Marth Boston College Biology Department NCBI/NHGRI Short-Read Archive meeting Bethesda, MD, July 27, 2007

What level of raw data detail is required? images processed traces (color intensities) base sequence + quality values 3 rd party image analysis? 3 rd party base callers?

What metadata is needed for specific analyses? Alignment / Assembly: Attempted read length Potential sequence clipping (quality; vector, linker, primer, barcode) Reference genome sequence for re-sequencing applications (genome / transcriptome; whole genome, target region) General: Machine HW and read processing SW versions Organism Library construction details Genomic DNA, cDNA, bisulfite-treated DNA, etc.? Single-end or paired-end reads?

Metadata (cont’d) SNP discovery: Base quality values Source DNA (diploid genome / PCR vs. clonal; single individual vs. pooled) Sample phenotype / disease status (tumor / normal) Ethnicity? Most read attributes are shared within lane / run; very few individual read-specific attributes. Are read names needed? Structural variation detection: Fragment length range Mate-pair relationship Ploidy

Granularity – atomic units of retrieval single read a lane a run multiple lanes/runs from an individual

Data presentation – searching and browsing How will data users find relevant datasets?

Context-driven retrieval Concessions to serve reads that align to a specific region (e.g. gene) potentially from a number of different runs/lanes?

Retrieval mechanisms Web-based Programmatic (tied in with data views?)

Connection to assembly archive & data formats How to make the connection between reads in the read archive and the assembly archive? Application support for short-read data manipulation (data access libraries)?