Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.

Similar presentations


Presentation on theme: "A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015."— Presentation transcript:

1 A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015

2 Tablet Graphical tools to visualize assemblies Accept many formats ACE, SAM, BAM A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

3 GATK (Genome Analysis ToolKit) Software package to analyse NGS data. Implemented to analyse human resequencing data, for medical purpose (1000 genomes, The Cancer Genome Atlas) Included: depth analyses, quality score recalibration, SNP/InDel detection Complementary with other pacjages: SamTools, PicardTools, VCFtools, BEDtools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

4 Global BAM with read group Cutadapt Mapping BWA VCF file Fastq (RC1) BAM with read group Mapping BWA Fastq (RC2) BAM with read group Mapping BWA Fastq (RC3) BAM with read group Mapping BWA Fastq (RC4) BAM with read group …. mergeSam Add or Replace Groups Cutadapt

5 Format VCF (Variant Call Format) ##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO= ##FILTER= ##FORMAT= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 20 17330. T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 Interest: variation description for each position + genotype assignations A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

6 Autres fonctionalités GATK Module DepthOfCoverage: Allows to get sequencing depth for each gene, each position and each individual Module ReadBackedPhasing: Allows to set, if possible, associations between alleles (phase and haplotypes) when we are in an heterozygote situation. Et non AGG GGA A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Other GATK functionalities

7 Format Pileup - Another format for variant calling (generated by samtools) - Describe alignment row by row (not line by line like in SAM format) - Used by VarScan like softwares (varscan pileup2snp) - Frequently used for rare variants, with a low frequency (e.g. pop virales) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

8 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 - Based on NoSQL technology - Handles VCF files (Variant Call Format) and annotations - Supports multiple variant types: SNPs, InDels, SSRs, SV - Powerful genotyping queries - Easily scalable with MongoDB sharding - Transparent access - Takes phasing information into account when importing/exporting in VCF format Projet Gigwa, pour la gestion des données massives de variants (GBS, RADSeq, WGRS) « With NGS arise serious computational challenges in terms of storage, search, sharing, analysis, and data visualization, that redefine some practices in data management. »

9 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 http://gigwa.southgreen.fr/gigwa/

10 SNiPlay: Web application for polymorphism analyses http://sniplay.cirad.fr A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

11 Upload a VCF file in SNiPlay Upload a VCF file (+ reference if not available in genome collection) Select rice genome The reference corresponce to mRNA

12 SNPs annotation using SnpEff A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

13 Cartesian coordinates Genotypage file Fichier de soumission pour Illumina Analyse with BeadStudio software Design de puces Illumina A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Illumina ship design Submission file for Illumina

14 Librairie EggLib Diversity analysis

15 Haplotype network Frequent haplotypes Less frequent haplotype Groupe distribution in this haplotype Distance between 2 haplotypes (#mutations) A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

16 Individu, group Ind1, Table Ind2, Table Ind3, Table Ind4, East Ind5, East Ind6, East Ind7, East Ind8, West External file (optional) Allele sharing between groups A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015

17 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Estimate association between a marker and a phenotypic character Manhattan plots: displays GWAS statistic tests (-log10 pvalue) along chromosomes TASSEL, MLMM sofwares False positives because of the studied structuration panel => correction using structure population et and kinship GWAS (Genome-Wide Association Studies)

18 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Analyse de structure de populations Test different values of K (estimates of probability that samples are structured in K populations) For the best value of K, the application shows Q estimates for each individual (admixture percent) Population structure analysis

19 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 Relatedness between individuals (kinship matrix) TASSEL and plink softwares Estimation of relatedness between individuals using a distance matrix

20 A. Dereeper, G. Sarah, F. Sabot, Y. HueberFormation Bio-informatique, 9 au 13 février 2015 TD: Study of root charaters using GWAS in Oryza sativa japonica. Influence of a correction using structure and kinship


Download ppt "A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015."

Similar presentations


Ads by Google