Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye

Similar presentations


Presentation on theme: "Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye"— Presentation transcript:

1 Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye k.ye@lumc.nl

2 Data collection for osteoarthritis, cardiovascular disease and longevity Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic Transcriptomic Genetic (GWAS/sequence) Epigenetic Data Integration

3 Genetic & Epigenetic analyses Biochem analyses Expression analysis metabonomic analysis Glycosylation Cell responses Joost Kok Erik vd Akker Kai Ye Statistical analysis

4 About me 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute Currently assistant professor at MolEpi

5 A Pindel approach for identifying indels in Next-Gen sequencing data Paired-end reads in Next-gen sequencing Indel detection algorithms Pindel Cancer genome project 1000 genomes project

6 Paired-end reads in Next Generation sequencing ~ insert size

7 SNP Mapping paired-end reads  CNVs: copy number variations;  INDELs: insertions and deletions;  SVs: Structural variations

8 Gapped alignment for small indels ATCCGTATCACGGTCA-CAGATCAGTCCAGT ATCCGTATCACGGTCAGCAGATCAGTCCAGT indel

9 Read-depth for CNVs

10 Read-pair approach for SVs No Indel Deletion Insertion Sample Reference Sample Reference Sample Reference

11 Mapping paired-end reads read-pairs read-depth SNP or small indel

12 Mapping paired-end reads read-pairs read-depth SNP or small indel

13 test ref 1base - 1million bases Pindel: Deletions

14 18 May 201514 Pindel: Deletions ref Anchor

15 18 May 201515 ref Pindel: Deletions Anchor 2 x average distance

16 18 May 201516 ref Pindel: Deletions Anchor 2 x average distance Expected maximum deletion size + read length (36)

17 18 May 201517 reference Pindel: Deletions sample

18 18 May 201518 African male: NA18507 Bentley et al., Nature 2008 135Gb of sequence ~4 billion paired 35-base reads After preprocessing: 56,161,333 pairs of one-end mapped reads Pindel – 142,908 1-16bp insertions – 162,068 1bp-10kb deletions

19 18 May 201519 Deletion size distribution

20 Applications Cancer genome project 1000 genomes project

21 Cancer genome COLO-829 cells Normal ~30x paired-end 100bp reads Tumor ~40x paired-end 100bp reads Search for somatic (tumor specific) indels

22

23 1000genomes project Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x) Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome) Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).

24 www.ebi.ac.uk/~kye/pindel k.ye@lumc.nl


Download ppt "Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye"

Similar presentations


Ads by Google