Introduction to RAD Acropora millepora.

Slides:



Advertisements
Similar presentations
Virus discovery-454 sequencing
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
DNAseq analysis Bioinformatics Analysis Team
SOLiD Sequencing & Data
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Cloning lab results Cloning the human genome Physical map of the chromosomes Genome sequencing Integrating physical and recombination maps Polymorphic.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Molecular Markers DNA & PROTEINS –mtDNA = often used in systematics; in general, no recombination = uniparental inheritance –cpDNA = often used in systematics;
Generation and Analysis of AFLP Data
- Delphine MUTHS & Jérôme BOURJEA - Connectivity of Marine Protected Areas in South-Western Indian Ocean: Using population genetics of reef fish to contribute.
NGS Analysis Using Galaxy
DNA Forensics. DNA Fingerprinting - What is It? Use of molecular genetic methods that determine the exact genotype of a DNA sample in a such a way that.
Reading the Blueprint of Life
Whole Exome Sequencing for Variant Discovery and Prioritisation
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
GBS Bioinformatics Pipeline(s) Overview
Module 1 Section 1.3 DNA Technology
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Molecular identification of living things. Molecular Markers Single locus marker Multi-locus marker RFLP Microsatellite DNA Fingerprinting AFLP RAPD.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
Phylogenomics “The intersection of phylogenetics and genomics”
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.

CASE7——RAD-seq for Grape genetic map construction
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
HRM REAL TIME PCR Presented by: Dadkhah Fahimeh SNP genotyping by HRM REAL TIME PCR.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
 Types of STR markers- 5 types based on sequence  STR allele nomenclature  Allelic ladder  Serological methods of identity profiling  Identity profiling.
Arun Kumar. B M.Sc 1st Year Biotechnology SSBS
Statistical Genomics Zhiwu Zhang Washington State University Lecture 6: Genotype.
From Reads to Results Exome-seq analysis at CCBR
RAD – technology overview Baird et al PLoS ONE.
Canadian Bioinformatics Workshops
Draft sequencing of 1,000 genomes to study the genetics of quantitative traits: data production Fabio Busonero1, Brendan J. Tarrier2, Elizabeth A. Ketterer2,
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Lesson: Sequence processing
Lecture 6: Genotype by sequencing
Molecular Marker Characterization of plant genotypes
Additional file 2: Supplementary Figures
Signatures of Selection
Invest. Ophthalmol. Vis. Sci ;52(6): doi: /iovs Figure Legend:
Lucas D. Baker1 Vikram E. Chhatre2 Hayley C. Lanier1
Genetic markers and their detection
Figure 2. Number of SNPs detected from empirical ddRAD-Seq analysis
Relationship between Genotype and Phenotype
Lecture 6: Genotype by sequencing
The student is expected to: (6H) describe how techniques such as DNA fingerprinting, genetic modifications, and chromosomal analysis are used to study.
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Lecture 9 Genome Mapping By Ms. Shumaila Azam
Lauren M. Mathews, Susan Y
A Comprehensive Analysis of Recently Integrated Human Ta L1 Elements
BF528 - Genomic Variation and SNP Analysis
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
DNA Profiling Vocabulary
BF528 - Sequence Analysis Fundamentals
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Fine mapping of SRT1. Fine mapping of SRT1. To fine map the SRT1 locus, we compared the sequencing data and developed a set of InDel markers in the 5 Mb.
Volume 41, Issue 2, Pages (January 2011)
The MLPA assay and application to diagnosis of DGS
The Variant Call Format
Presentation transcript:

Introduction to RAD Acropora millepora

First thing start downloading course achieves See downloading course archives

Plan Start downloading and unpacking data Set up our reference genome Go through some introduction slides Go through GATK demonstration files User choice, read preparation, de novo, or GATK for real

Genetic differentiation vs. distance Population Genetics N S O M K W A R2 = 0.76 Genetic differentiation vs. distance

Linkage Mapping Final map consists of 3816 markers and covers 99.4% (1539cM) of scallop genome with a resolution of 0.41cM.

2bRAD-based linkage mapping in scallop sex-related chromosomal region Based on the high-density map, we were able to identify a 2-cM chromosomal region that contained ~60 sex-related loci.

Which loci are under selection across populations? (Restriction-Associated DNA, RAD) Generate and sequence short tags randomly distributed across genome Fst along linkage group III: between freshwater populations (orange) and between freshwater and marine populations (black). Bars: significant (bootstrap), dots: SNPs

Population genomics: “genome scanning” for signatures of selection Nucleotide diversity (recent selection) Tajima’s D (not so recent selection) coalescent hard sw. LD2hs neutr balancing neutr soft sw. (recent or continuous selection) Excess LD (very recent selection) position along genome Hohenlohe et al 2010 Int. J. Plant Sci. 17:1059-1071.

Amplification and barcoding Adaptor ligation and indexing III Index (1 of 12) (mix indices by 12) Amplification and barcoding Barcode2 III Gel cleanup Gel cleanup III

Restriction Digest Type IIb restriction enzyme is used to cut out fragments of the genome genomic DNA: BcgI cut site 5’- NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN-3’ digested fragments represent a random subset of the genome 3’-NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN -5’

Ligation Step Adapters are ligated to the restriction fragments produced by the digest Adapter 1: contains degenerate bases to ID PCR duplicates 5Ill-NNRW + anti 5Ill-NNRW 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN-3’ 5’-GGWYNNAGATCGGAAGAGC/3InvdT1/-3’ Adapter 2: Contains a 3’ barcode (in red below) 3Ill-BC[1-12] + anti 3Ill-BC[1-12] 5’-CAGACGTGTGCTCTTCCGATCTACCANN-3’ 5’-TGGTAGATCGGA/3InvdT/ 5’- NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN-3’ 3’-NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN -5’ 5’CTACACGACGCTCTTCCGATCTNNRWCCNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGANNYWGG NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’

Using degenerate bases to identify PCR duplicates Adapter 1: 5Ill-NNRW + anti 5Ill-NNRW 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN-3’ 5’-GGWYNNAGATCGGAAGAGC/3InvdT1/-3’ N = A, T, G, or C R = A or G W = A or T Y = C or T 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGANNYWGG NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ Degenerate bases included in the 5Ill-NNRW adapter

Amplification (perform on pooled samples) Ill-[1-12]-bc CAAGCAGAAGACGGCATACGAGAT[barcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT Mpx2N AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT IC1-P5 AATGATACGGCGACCACCGA IC2-P7 CAAGCAGAAGACGGCATACGA Mpx2N Ill-bc AATGATACGGCGACCACCGA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC

Amplification Ill-[1-12]-bc CAAGCAGAAGACGGCATACGAGAT[2ndbarcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT Mpx2N AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT IC1-P5 AATGATACGGCGACCACCGA IC2-P7 CAAGCAGAAGACGGCATACGA P5 P7 AATGATACGGCGACCACCGA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC

2nd barcode was added during PCR Final Product p5 p7 Read primer sampled genomic DNA 1st barcode ligated onto fragment 2nd barcode was added during PCR 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN /3InvdT1/CGAGAAGGCTAGA NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TGGTAGATCGGA/3InvdT/ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT AATGATACGGCGACCACCGA TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC

Double Barcoding p5 p7 sampled genomic DNA Ligation indexes Read primer p5 p7 random bases for duplicate identification sampled genomic DNA 1st barcode ligated onto fragment 2nd barcode was added during PCR Ligation indexes PCR barcodes Setting unique ligation barcodes to columns and PCR barcodes to rows gives a unique combination for each sample A1 B1 If groups using strip tubes, each strips will be layed down as rows in this picture. Then each tube in the strip gets a different adapter barcode and the whole strip can be pooled downstream.

Pooling Samples Pooling allows for 12 uniquely barcoded samples to be prepared in a single tube. Saves work and pipet tips Use different Adapter 2 for each column (1-12) Pool samples so that each sample in the pool has a different ligation adapter 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ Pooled sample 1 If groups using strip tubes, each strips will be layed down as rows in this picture. Then each tube in the strip gets a different adapter barcode and the whole strip can be pooled downstream.

GATK https://www.broadinstitute.org/gatk/guide/best-practices

Genome reference pipeline (GATK) (http://www. broadinstitute Trimming/quality filtering Mapping to genome Realign around indels Primary variant calling Base quality recalibration Secondary variant calling Variant quality recalibration based on genotyping replicates Final filtering Assess quality (heterozygote discovery rate)

Genetic differentiation vs. distance Acropora millepora connectivity along the Great Barrier Reef R2 = 0.76 Genetic differentiation vs. distance N S O M K

SAM File Format Header Lines @HD: Header line (first line of file if present) gives the version number and sorting information, here version 1, unsorted alignments. @SQ: These lines give the reference sequence information. There will be as many of these as ‘chromosomes’ in your reference. In our case this is 10. Gives the name of the sequence and its length. @PG: Program line. Gives information about the program used to perform the alignment. In our case bowtie2 version 2.1.0.

SAM File Format Alignment Lines example from our files form ‘Sequence Alginment/Map Format Specifications’—The SAM/BAM Format Specification Working Group 2015 (see under additional resources)

VCF file Structure VCF = variant call format from ‘The Variant Call Format Specifications’ 2015

VCF file Structure VCF = variant call format 1 row for each variant position columns give individual sample data Fields that describe the variant Sample Data This individual is a heterozygote G/T With a GQ score of 21 Read depth of 6 Haplotype quality of 23, 27 Descriptions of what these these mean are included in the header (see previous slide) from ‘The Variant Call Format Specifications’ 2015

Recalibration Plots

Recalibration Plots

Recalibration Plots

Writing your own pipeline