Next Generation Sequencing

Slides:



Advertisements
Similar presentations
Lecture 41 Prof Duncan Shaw. Genetic Variation Already know that genes have different alleles - how do these arise? Process of mutation - an alteration/change.
Advertisements

Association Tests for Rare Variants Using Sequence Data
Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
The Inheritance of Complex Traits
Vocabulary Review Ch 12 Inheritance Patterns and Human Genetics.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Genome Variations & GWAS
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
The Biology and Genetic Base of Cancer. 2 (Mutation)
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
The International Consortium. The International HapMap Project.
12/16/14 StarterConnection/Exit: What is the true meaning of the word mutation? Are mutations bad / harmful? 12/16/14 Protein Synthesis Writing
Single nucleotide polymorphisms and Large scale variation
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Notes: Human Genome (Right side page)
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Interpreting exomes and genomes: a beginner’s guide
Single Nucleotide Polymorphisms (SNPs
SNPs and complex traits: where is the hidden heritability?
Higher Human Biology Sub topic 5 (a)
Genomic Analysis: GWAS
Week-6: Genomics Browsers
Press report 13/10/ publications selected.
Nucleotide variation in the human genome
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
THE ROLE OF NEXT GENERATION SEQUENCING IN CLINICAL PRACTICE
Causes of Variation in Substitution Rates
Interpretation Next Generation Sequencing (Bench Clinic)
Human Cells Human genomics
Types of Mutations.
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
What makes a mutant?.
Polymorphisms GWAS traits.
Genome-wide Associations
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
Some mutations affect a single gene, while others affect an entire chromosome.
Group A1 Caroline Kissel, Meg Sabourin, Kaylee Isaacs, Alex Maeder
Psychiatric Disorders: Diagnosis to Therapy
Chapter 7 Multifactorial Traits
Polymorphisms GWAS traits.
Exercise: Effect of the IL6R gene on IL-6R concentration
Sequences and their Properties
Medical genomics BI420 Department of Biology, Boston College
Psychiatric Disorders: Diagnosis to Therapy
BF528 - Genomic Variation and SNP Analysis
Medical genomics BI420 Department of Biology, Boston College
Rodney White Conference on Financial Decisions and Asset Markets
BF528 - Whole Genome Sequencing and Genomic Variation
Copyright Pearson Prentice Hall
Reminder The AP Exam registration is open in Naviance. The Exam is on Monday, May 13. I’ll let you know when the next test/homework will be.
Sequence Analysis - RNA-Seq 2
Analysis of protein-coding genetic variation in 60,706 humans
Presentation transcript:

Next Generation Sequencing Michelle Luciano www.ccace.ed.ac.uk

Outline What is next generation sequencing? Rare variants and general cognitive ability Rare variants and years of education Sequencing the Wellderly

1. Next generation sequencing NGS platforms perform sequencing of millions of small fragments of DNA in parallel Bioinformatics to assemble fragments by mapping the reads to the human reference genome Each of the 3 billion bases is sequenced multiple times, the greater the depth, the more accurate the data Can sequence entire genomes, specific areas (e.g., exomes) or individual genes Small base changes (substitutions), insertions and deletions of DNA, large genomic deletions of exons or whole genes and rearrangements such as inversions and translocations

www.ccace.ed.ac.uk Rare Genetic Variants < 0.5% frequency in population Variation is younger Mutations arise every generation at a rate of 1.1-3 ×10-8 per base. Given ~3×109 bases in the human genome, a person should have, on average, between 30-100 de novo mutations Increased population-specificity sharing of rare variants is about 10-30% among populations in different continents and 70-80% within the same continent www.ccace.ed.ac.uk

www.ccace.ed.ac.uk Rare Genetic Variants - Exomes Protein coding regions of genes 1-2% of human genome Stronger negative selection for rare alleles in coding, compared to intergenic regions Deep exome sequencing study (Tennessen et al., 2012) of 1351 Europeans and 1088 African Americans showed: Most variants were rare (86% had a minor allele frequency < .5%) 82% of variants were previously unknown 82% population specific ? Due to additive effects of explosive, recent accelerated population growth and weak purifying selection www.ccace.ed.ac.uk

Genetic variation in general cognitive ability (g) G related to fitness traits, genetic variation we observe today likely affected by directional selection Genetic variation can be maintained when new, mostly deleterious mutations occur at a rate equal to speed with which removed by selection: mutation–selection balance Predicts no common variants of large effect, as supported by GWAS GCTA shows at least half of genetic variance due to common SNPs or rare SNPs in LD with these More recent mutations, including family-specific and private de novo genetic variants, could explain remaining genetic variance

2) Rare variants and g www.ccace.ed.ac.uk

Age- and sex-residualised g scores Sample Selection Generation Scotland: Scottish Family Health Study High g: PCA of summed Logical Memory immediate and delayed, Digit Symbol, Verbal Fluency, and Mill Hill Vocabulary 1st unrotated principal component explained 42% variance; composite score formed, g Top 76 female g scores Top 74 male g scores 2.34 to 3.97 SD from mean Age- and sex-residualised g scores www.ccace.ed.ac.uk

www.ccace.ed.ac.uk Sample Selection Control Group 1 (GS): Major depression cases (81) or relatives of cases (27) with age- and sex-residualised g scores <.34 SD from mean Control Group 2 (GS): 223 Obesity controls Control Group 3: Scottish cancer patients reporting no education (N=123) OR high school senior certificate equivalent with low/intermediate SES based on postcode (N=32) www.ccace.ed.ac.uk

www.ccace.ed.ac.uk Exome-sequencing Exon capture: Illumina Hi-Seq machine (average read depth 38x and 86x in Generation Scotland & 39x in Cancer controls) Variant alignment to the 1000G (v37) reference genome (Li & Durbin, 2009) Genotype calls: GATK’s unified genotyper (DePristo et al., 2011) Putative false positive SNP calls filtered out using GATK's VariantRecalibrator algorithm After quality control filtering, variants were annotated using SnpEff (version 2.0.5; Cingolani et al., 2012) www.ccace.ed.ac.uk

www.ccace.ed.ac.uk Analysis Case (high g) - Control (low to average g) Design Lack of power to detect single nucleotide variants (SNVs) Do multiple SNVs in protein-coding genes contribute to the trait of interest? Burden: collapse rare variants into a single burden variable C-alpha: rare variants are a mixture of phenotypically deleterious, protective and neutral variants Biological Pathways Analysis GOrilla: enrichment for genes in specific biological pathways Biological processes – molecular events with defined beginning and end Molecular functions – activities that occur at molecular level Cellular components – within an anatomical structure or gene product group www.ccace.ed.ac.uk

All variants Analysis of all SNVs with <1% frequency in the cases versus combined controls included 24,514 gene sets comprising 339,231 variants No significant associations were found after FWER correction Pathways analysis revealed no enrichment for gene ontology terms (16,205 genes associated with a GO term) after FWER correction Similar results for <5% frequency

Non-Synonymous, Splice and Frameshift Variants Variants with frequency <1% (134,751 variants in 20,791 gene sets) or <5% were not significant

Synonymous Variants No gene associations were found for variants with frequency <1% (73,738 variants in 18,533 gene sets) or <5% (84,374 variants in 19,135 gene sets) Gene ontology not significant

Burden Range of total minor alleles <1% frequency per individual was 765 to 2,544: high g cases (M = 953.11, SD = 102.74) higher than controls (M = 933.11, SD = 87.56) Range of total minor alleles <5% frequency per individual was 2,265 to 4,479. High g cases (M = 2,564.18, SD = 126.76) higher than controls (M = 2,537.56, SD = 123.56) Burden tests including only non-synonymous variants (total N ranging 614-1192 for < 5% frequency and 175-705 for < 1% frequency) were not significant

Limitations Future Control samples not ideal Population based – can’t identify de novo mutational influences Extreme-trait designs important for identifying variants that are rare and that have modest to high effect sizes Future Plomin, Hsu, and Bowen – 1600 from the Study of Mathematically Precocious Youth & 500 recruited online; 4000 controls from the UK10K Project ‘Project Einstein’ (Rothberg & Tegmark) – Sequencing of 400 mathematicians and theoretical physicists

3. Rare variants and years of education

Research aim Ultra-rare inherited and de novo disruptive variants in highly constrained (HC) genes are enriched in neurodevelopmental disorders (autism, schizophrenia) H1: influence general cognitive abilities measured indirectly by years of education (YOE) 14,133 individuals with whole exome or genome sequencing data

URVs Variants observed only once (singletons) across each study and not observed in 60,706 exomes sequenced in the Exome Aggregation Consortium. To maximize the expected deleteriousness of the included variants (due to purifying selection) disruptive, putative loss-of-function variants including premature stop codons, essential splice site mutations and frameshift indels; observed 1 or more in 25% damaging, missense variants classified as damaging by 7 different in silico prediction algorithms; 24% negative control, synonymous variants not predicted to change the encoded protein; 78%.

Analysis Generalized linear regression model controlling for year of birth, sex, first 10 ancestry principal components, and schizophrenia status to test for association of YOE with the number of disruptive or damaging URVs in HC genes Meta-analysed the results across studies Gene-expression data to restrict to genes enriched for brain expression Gene-based burden test implemented in SKAT (sequence kernel association test) and using an exome-wide significance threshold of p<1x10- 6

3.1 months less for each additional mutation

4. Sequencing the Wellderly

Wellderly sequencing study Healthy ageing is a complex polygenic trait related but distinct from longevity Healthy ageing is associated with decreased genetic risk for select diseases Healthy ageing is potentially linked to protection against cognitive decline

The Wellderly >80 years with no chronic diseases nor on chronic medication

Methods Whole genome sequence of 600 Wellderly (56x) compared to 1,507 adults from the Inova Translational Medicine Institute (ITMI) aged 20 to 44 years (55x) >94% European ancestry, maximum relatedness of 12.5% 511 Wellderly vs 686 ITMI ~57 million raw variants to 24,205,551 after filtration

Results Longevity variants did not differ in frequency between Wellderly and controls (ITMI or 1000Genomes European sample) No difference in cancer, stroke or type 2 diabetes genetic risk Lower genetic risk for Alzheimer disease (P=9.84x10-4) and coronary heart disease (P=2.54x10-3) No common variants associated in GWAS, correcting for population stratification Top region contained SNPs associated with cognitive traits Rare monogenic disease variants, pathogenic cancer and hereditary dementia (<0.5% frequency) unrelated to Wellderly

Results Rare coding variants (MAF<1%) tested using SKAT-O method No genome-wide significant associations (correcting for 10,447 individual gene tests) Top SNP was COL25A1 (P=1.56x10-5) 9 ultra-rare variants carried by 10 Wellderly individuals, 8 variants observed as singletons and 1 observed in two individuals No variants observed in ITMI sample Many of the mutations result in highly non-conservative amino acid substitutions COL25A1 is a brain- specific, secreted collagenous protein associated with amyloid plaques

Prospects of NGS 100s TB of data SNP and structural variants Meta-analyses to increase power Functional genomics: gene expression profiling, genome annotation, small ncRNA discovery and profiling, and detection of aberrant transcription

Questions?