DNA Copy Number Analysis Qunyuan Zhang Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Advertisements

We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Microarray Normalization
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
Methods for copy number variation: hidden Markov model and change- point models.
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
DNA Copy Number Analysis Qunyuan Zhang, Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
What Can You Do With qPCR?
Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Gene expression array and SNP array
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Reading the Blueprint of Life
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
DNA basics DNA is a molecule located in the nucleus of a cell Every cell in an organism contains the same DNA Characteristics of DNA varies between individuals.
Genetic and Molecular Epidemiology Lecture III: Molecular and Genetic Measures Jan 19, 2009 Joe Wiemels HD 274 (Mission Bay)
AP Biology: Chapter 14 DNA Technologies
GENOMIC COPY NUMBER Rudy Guerra Department of Statistics Rice University April 14, 2008.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Module 1 Section 1.3 DNA Technology
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel:
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
Identification of Copy Number Variants using Genome Graphs
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Analysis of protein-DNA interactions with tiling microarrays
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
CGH Data BIOS Chromosome Re-arrangements.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Biochemistry April Lecture DNA Microarrays.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
DNA Fingerprinting Maryam Ahmed Khan February 14, 2001.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
The Haplotype Blocks Problems Wu Ling-Yun
Simple-Sequence Length Polymorphisms
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Part 3 Gene Technology & Medicine
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Fig. 8. Recurrent copy number amplification of BRD4 gene was observed across common cancers. Recurrent copy number amplification of BRD4 gene was observed.
Linking Genetic Variation to Important Phenotypes
تهیه کننده بهارا رستمی نیا بهار 94
Getting the numbers comparable
Histology and genomic copy number alterations in TRAMP tumors.
Relationship between Genotype and Phenotype
Presentation transcript:

DNA Copy Number Analysis Qunyuan Zhang Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine – 2010 GEMS Course: M Computational Statistical Genetics 1

What is Copy Number ? Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes. From Wikipedia 2

DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. From Nature Reviews Genetics, Feuk et al DNA Copy Number ≠ DNA Tandem Repeat Number (e.g. microsatellites) <10 bases DNA Copy Number ≠ RNA Copy Number RNA Copy Number = Gene Expression Level DNA transcription mRNA Copy Number is the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications. 3

Why study Copy Number ? Motive 1: Genetic Polymorphisms - restriction fragment length polymorphism (RFLP) - amplified fragment length polymorphism (AFLP) - random amplification of polymorphic DNA (RAPD) - variable number of tandem repeat (VNTR; e.g., mini- and microsatellite) - single nucleotide polymorphism (SNP) - presence/absence of transportable elements … - structural alterations (deletions, duplications, insertions, inversions … ) - DNA copy number variant (CNV) Association with phenotypes/diseases genes/genetic factors 4

Motive 2: Genetic Aberrations in Tumor Cells Mutation, LOH, Copy Number Aberration (CNA) Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Normal cell Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 CN=2 5

How to measure/quantify Copy Number? Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA low fluorescent intensity more CN amplification more DNA high fluorescent intensity (one fragment each time) Microarray : DNA Hybridization (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA arrayed probes low intensities more CN amplification more DNA arrayed probes high intensities (multiple/different fragments, mixed pool) Hybridization 6

Array Comparative Genomic Hybridization (CGH) Tumor: red intensity Normal: green intensity Red < Green: Deletion (CN<2) Red > Green: Amplification (CN>2) Red = Green: No Alteration (CN=2) more DNA copy number more DNA hybridization higher intensity 7

SNP Array TumorNormal Affymetrix Mapping 250K Sty- I chip ~250K probe sets ~250K SNPs CN=1 CN=0 CN>2 CN=2 probe set (24 probes) Deletion Amplification more DNA copy number more DNA hybridization higher intensity 8

Genotyping & Copy Number Calling CN=0 CN=1 CN=2 CN=3 CN=4 2 copy deletion, genotype (_//_) 1 copy deletion, genotype (_//B) 1 copy amplification, genotype (AA//B) Normal, genotype (A//B) 2 copy amplification, genotype (AA//BB) 9

BB BBBB AB AABB AA A_ 10

Copy Number Analysis Data Pre-processing Individual Sample Analysis Population Analysis 11

An Example Finished chips (scanner) Raw image data [.DAT files] (experiment info [.EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis Preprocessing : chip description file [.CDF] 12

Background Adjustment/Correction Reduces unevenness of a single chip Makes intensities of different positions on a chip comparable Before adjustment After adjustment Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B) For each region i, B(i) = Mean of the lowest 2% intensities in region i AffyMetrix MAS

Eliminates non-specific hybridization signal Obtains accurate intensity values for specific hybridization Background Adjustment/Correction PM only, PM-MM, Ideal MM, etc. quartet probe set sense or antisense strands 25 oligonucleotide probes 14

Normalization Reduces technical variation between chips Makes intensities from different chips comparable Before normalization After normalization Base Line Array (linear); Quantile Normalization etc. S – Mean of S S’ = STD of S S’ ~ N(0,1 ) 15

Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses. Summarization Average methods: PM only or PM-MM, allele specific or non-specific Model based method : Li & Wong, 2001 Gene Expression Index 16

Raw Copy Number Data S : Summarized raw intensity S’ : Log transformation, S’ = log 2 (S) Log ratio of sample i / sample ref. CN_log2 = log 2 (S i /S ref ) CN = 2(S i /S ref ) before Log transformation S after Log transformation Log(S) Raw CN 17

Individual Level Analysis Individual Level Analysis  Smoothing  Significance test of amplification and deletion  Segmentation  CN estimation 18

Sliding Window Sliding Window ….. … … …… …….. … … …… ….. …… ….. Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 Window 7 Window 8 Window 9 Window 10 Window N Window k ……….. Each window (k) contains n consecutive SNPs (k, k+1, k+2, k+3, …, k+n-1) 19

Smoothing (sliding window=30 snps) Smoothing (sliding window=30 snps) Affymetrix Illumina Chrom. 7 Mbp CN Mbp Chrom. 7 CN Mbp CN Mbp CN 20

Significance Test of CN Changes Significance Test of CN Changes CN SD Mbp CN Mbp -log FDR Mbp -log P Mbp 21

Window Selection (FDR < 0.05) CN Mbp -log FDR Mbp epidermal growth factor receptor (EGFR) 22

Segmentation Segmentation (Break chrom. into CN-homologous pieces) BioConductor R Packages ( DNAcopy package, circular binary segmentation (CBS) GLAD package, adaptive weights smoothing (AWS) 23

CBS Algorithm CBS Algorithm 1,2,3, ….,i-1, i, i+1,…,j-1,j, j+1,...n Iterate until Zc is not significant. Olshen et al. Biostatistics Oct;5(4):

CN Estimation: Hidden Markov Model (HMM) CNAT( dChip ( ; CNAG ( CN=? log ratio … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … position hidden status (unknown CN ) observed status (raw CN = log ratio of intensities) CN estimation: finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm (can be Iterative) Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=0,1,2,3,4,…, n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x i | CN_i=x j ); x=0,1,2,3,4,…, or n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=0,1,2,3,4, …, or n 25

HMM Results (An Example) Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal Blue: HMM estimated CNs in Tumor Tissue CN=2CN=1 CN=4 CN=3 26

References for Single Sample Analysis Hsu et al Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 6: Hupe et al. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20: Jong et al Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics 20: Lai et al Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21: Lai et al A statistical method to detect chromosomal regions with DNA copy number alterations using SNP-array-based CGH data. Comput Biol Chem 29: Olshen et al. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: Picard et al A statistical approach for array CGH data analysis. BMC Bioinformatics 6: 27. Shah et al Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23: i Nilsson et al. Bioinformatics Apr 15;25(8): Epub 2009 Feb

Population Level Analysis Population Level Analysis Common/Reocurrent Region Identification samples 28 Nature 2007, 450,

Genome-wide Raw Copy Number Changes (sliding window plot, averaged over ~400 pairs ) 29

Frequency Test 30 Diskin et al STAC, Genome Res 16: Permutation test

Amplitude Test 31 GISTIC Beroukhim et al Proc Natl Acad Sci U S A 104: Weir et al. Nature 2007, 450,

Population-based One-step Analysis 32 CMDS Method Q Zhang et al. Bioinformatics, 2009 doi: /bioinformatics/btp708

References for Multiple Sample Analysis (GISTIC ) Beroukhim et al Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci U S A 104: (STAC) Diskin et al STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Res 16: (MSA) Guttman et al Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays. PLoS Genet 3: e143. (GFA) Lipson et al Efficient calculation of interval scores for DNA copy number data analysis. J Comput Biol 13: (MAR) Rouveirol et al Computation of recurrent minimal genomic alterations from array-CGH data. Bioinformatics 22: (CMDS) Zhang et al. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics, 2009 doi: /bioinformatics/btp708 33

Sequencing Data coverage/depth based analysis 34 Nature Genetics 41, (2009)

Sequencing Data paired-end data based analysis 35 Science 2007:Vol pp DOI: /science

Homework Download the data file dsgweb.wustl.edu/qunyuan/data/cn_data.csv Use any published or self-developed method/software to analyze/present the data Write a report of your analysis Send to in two 36