A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.

Slides:



Advertisements
Similar presentations
Resequencing Genome Timothee Cezard EBI NGS workshop 16/10/2012.
Advertisements

GBS & GWAS using the iPlant Discovery Environment
DNAseq analysis Bioinformatics Analysis Team
Ruibin Xi Peking University School of Mathematical Sciences
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Bioinformatics Tips NGS data processing and pipeline writing
Before we start: Align sequence reads to the reference genome
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
NGS Analysis Using Galaxy
Steve Newhouse 28 Jan  Practical guide to processing next generation sequencing data  No details on the inner workings of the software/code &
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Polymorphism and Variant Analysis Lab
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
File formats Wrapping your data in the right package Deanna M. Church
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Bioinformatics trainings, Vietnam Hanoi, November, 2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
GenABEL: an R package for Genome Wide Association Analysis
Genome STRiP ASHG Workshop demo materials
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Personalized genomics
Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements.
Calling Somatic Mutations using VarScan
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Introduction to Variant Analysis of Exome- and Amplicon sequencing data Lecture by: Date: Training: Extended version see: Dr. Christian Rausch 29 May 2015.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is a federally funded research.
Inheritance Model testing Andrew Stubbs Dept. Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Tools for Targeted Sequencing and NGS analysis O. Harismendy, PhD BIOM262 – W2016.
Canadian Bioinformatics Workshops
Data and Hartwig Medical Foundation
Introductory RNA-seq Transcriptome Profiling
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
CSE 182 Project.
VCF format: variants c.f. S. Brown NYU
Introduction to RAD Acropora millepora.
Genome Wide Association Studies using SNP
EMC Galaxy Course November 24-25, 2014
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
BF528 - Biological Data Formats
BF528 - Genomic Variation and SNP Analysis
Canadian Bioinformatics Workshops
Variant Calling Chris Fields
Cancer Cell Line Encyclopedia
The Variant Call Format
Presentation transcript:

A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015

Tablet Graphical tools to visualize assemblies Accept many formats ACE, SAM, BAM A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

GATK (Genome Analysis ToolKit) Software package to analyse NGS data. Implemented to analyse human resequencing data, for medical purpose (1000 genomes, The Cancer Genome Atlas) Included: depth analyses, quality score recalibration, SNP/InDel detection Complementary with other pacjages: SamTools, PicardTools, VCFtools, BEDtools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

Global BAM with read group Cutadapt Mapping BWA VCF file Fastq (RC1) BAM with read group Mapping BWA Fastq (RC2) BAM with read group Mapping BWA Fastq (RC3) BAM with read group Mapping BWA Fastq (RC4) BAM with read group …. mergeSam Add or Replace Groups Cutadapt

Format VCF (Variant Call Format) ##fileformat=VCFv4.0 ##fileDate= ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO= ##FILTER= ##FORMAT= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA rs G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51, T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 Interest: variation description for each position + genotype assignations A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

Autres fonctionalités GATK Module DepthOfCoverage: Allows to get sequencing depth for each gene, each position and each individual Module ReadBackedPhasing: Allows to set, if possible, associations between alleles (phase and haplotypes) when we are in an heterozygote situation. Et non AGG GGA A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Other GATK functionalities

Format Pileup - Another format for variant calling (generated by samtools) - Describe alignment row by row (not line by line like in SAM format) - Used by VarScan like softwares (varscan pileup2snp) - Frequently used for rare variants, with a low frequency (e.g. pop virales) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février Based on NoSQL technology - Handles VCF files (Variant Call Format) and annotations - Supports multiple variant types: SNPs, InDels, SSRs, SV - Powerful genotyping queries - Easily scalable with MongoDB sharding - Transparent access - Takes phasing information into account when importing/exporting in VCF format Projet Gigwa, pour la gestion des données massives de variants (GBS, RADSeq, WGRS) « With NGS arise serious computational challenges in terms of storage, search, sharing, analysis, and data visualization, that redefine some practices in data management. »

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février

SNiPlay: Web application for polymorphism analyses A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

Upload a VCF file in SNiPlay Upload a VCF file (+ reference if not available in genome collection) Select rice genome The reference corresponce to mRNA

SNPs annotation using SnpEff A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

Cartesian coordinates Genotypage file Fichier de soumission pour Illumina Analyse with BeadStudio software Design de puces Illumina A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Illumina ship design Submission file for Illumina

Librairie EggLib Diversity analysis

Haplotype network Frequent haplotypes Less frequent haplotype Groupe distribution in this haplotype Distance between 2 haplotypes (#mutations) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

Individu, group Ind1, Table Ind2, Table Ind3, Table Ind4, East Ind5, East Ind6, East Ind7, East Ind8, West External file (optional) Allele sharing between groups A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Estimate association between a marker and a phenotypic character Manhattan plots: displays GWAS statistic tests (-log10 pvalue) along chromosomes TASSEL, MLMM sofwares False positives because of the studied structuration panel => correction using structure population et and kinship GWAS (Genome-Wide Association Studies)

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Analyse de structure de populations Test different values of K (estimates of probability that samples are structured in K populations) For the best value of K, the application shows Q estimates for each individual (admixture percent) Population structure analysis

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Relatedness between individuals (kinship matrix) TASSEL and plink softwares Estimation of relatedness between individuals using a distance matrix

A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 TD: Study of root charaters using GWAS in Oryza sativa japonica. Influence of a correction using structure and kinship