Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Geuvadis RNAseq UNIGE Genetic regulatory variants
RNAseq.
Introduction to genomes & genome browsers
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
DETECTING CNV BY EXOME SEQUENCING Fah Sathirapongsasuti Biostatistics, HSPH.
BIOINFORMATICS Ency Lee.
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
High Throughput Sequencing
HL7 Clinical Sequencing Symposium Oncology Use Cases Ellen Beasley, Ph.D.September 14, 2011 VP, Ion Bioinformatics.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
NGS Workshop Variant Calling
Next generation sequencing Xusheng Wang 4/29/2010.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
NGS Cancer Systems Biology Workshop Variant Calling and Structural Variants from Exomes/WGS Ramesh Nair May 30, 2014.
Todd J. Treangen, Steven L. Salzberg
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
Next-Generation Sequencing
Epigenetics Heritable characteristics of the genome other than the DNA sequence Heritable during cell-division (mitosis) To a lesser extent also over generations.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Next-Generation Sequencing Eric Jorgenson Epidemiology 217 2/28/12.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
Genomics and Forensics
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Maxwell Lee National Cancer Institute Center for Cancer Research High-dimension Data Analysis Group March 19, 2014 Integrated Studies Of Breast, Esophageal,
Gene Mapping ROBERT SANTOS ENGLISH 100 ESP NOVEMBER
Lecture-3 EXOME SEQUENCING Huseyin Tombuloglu, Phd GBE423 Genomics & Proteomics.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Accessing and visualizing genomics data
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Big Data in Genomics, Diagnostics, and Precision Medicine
Gil McVean Department of Statistics
Gene expression.
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Human Cells Human genomics
Jin Zhang, Jiayin Wang and Yufeng Wu
Human Molecular Genetics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genomic alterations in breast cancer cell line MDA-MB-231.
Presentation transcript:

Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye

Data collection for osteoarthritis, cardiovascular disease and longevity Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic Transcriptomic Genetic (GWAS/sequence) Epigenetic Data Integration

Genetic & Epigenetic analyses Biochem analyses Expression analysis metabonomic analysis Glycosylation Cell responses Joost Kok Erik vd Akker Kai Ye Statistical analysis

About me 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute Currently assistant professor at MolEpi

A Pindel approach for identifying indels in Next-Gen sequencing data Paired-end reads in Next-gen sequencing Indel detection algorithms Pindel Cancer genome project 1000 genomes project

Paired-end reads in Next Generation sequencing ~ insert size

SNP Mapping paired-end reads  CNVs: copy number variations;  INDELs: insertions and deletions;  SVs: Structural variations

Gapped alignment for small indels ATCCGTATCACGGTCA-CAGATCAGTCCAGT ATCCGTATCACGGTCAGCAGATCAGTCCAGT indel

Read-depth for CNVs

Read-pair approach for SVs No Indel Deletion Insertion Sample Reference Sample Reference Sample Reference

Mapping paired-end reads read-pairs read-depth SNP or small indel

Mapping paired-end reads read-pairs read-depth SNP or small indel

test ref 1base - 1million bases Pindel: Deletions

18 May Pindel: Deletions ref Anchor

18 May ref Pindel: Deletions Anchor 2 x average distance

18 May ref Pindel: Deletions Anchor 2 x average distance Expected maximum deletion size + read length (36)

18 May reference Pindel: Deletions sample

18 May African male: NA18507 Bentley et al., Nature Gb of sequence ~4 billion paired 35-base reads After preprocessing: 56,161,333 pairs of one-end mapped reads Pindel – 142, bp insertions – 162,068 1bp-10kb deletions

18 May Deletion size distribution

Applications Cancer genome project 1000 genomes project

Cancer genome COLO-829 cells Normal ~30x paired-end 100bp reads Tumor ~40x paired-end 100bp reads Search for somatic (tumor specific) indels

1000genomes project Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x) Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome) Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).