We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.

Slides:



Advertisements
Similar presentations
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Advertisements

Visualising and Exploring BS-Seq Data
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
REAL TIME PCR ………A step forward in medicine
Microarray Normalization
Peter Tsai Bioinformatics Institute, University of Auckland
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data.
The Challenges Of Sequencing FFPE DNA Using NGS
Mutual Information Mathematical Biology Seminar
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Evaluating Hypotheses
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Biostatistics Unit 2 Descriptive Biostatistics 1.
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
Method Comparison A method comparison is done when: A lab is considering performing an assay they have not performed previously or Performing an assay.
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Chapter 5 Errors In Chemical Analyses Mean, arithmetic mean, and average (x) are synonyms for the quantity obtained by dividing the sum of replicate measurements.
Large Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization Devin P. Locke, Richard Segraves, Lucia Carbone,
Verna Vu & Timothy Abreo
Microarray - Leukemia vs. normal GeneChip System.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Numerical Measures of Variability
Identification of Copy Number Variants using Genome Graphs
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
The International Consortium. The International HapMap Project.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
SNPs and complex traits: where is the hidden heritability?
Canadian Bioinformatics Workshops
The 2007 Microarray Research Group Project
Comparison of Comparative Genomic Hybridization Technologies Across Microarray Platforms Susan Hester1, Laura Reid2, Agnes Viale3, Norma Nowak4, Herbert.
Global Variation in Copy Number in the Human Genome
A Genome-Wide High-Resolution Array-CGH Analysis of Cutaneous Melanoma and Comparison of Array-CGH to FISH in Diagnostic Evaluation  Lu Wang, Mamta Rao,
Comparison of Clinical Targeted Next-Generation Sequence Data from Formalin-Fixed and Fresh-Frozen Tissue Specimens  David H. Spencer, Jennifer K. Sehn,
MEASURES OF CENTRAL TENDENCY
Chromosomal Microarray Detection of Constitutional Copy Number Variation Using Saliva DNA  Jennifer Reiner, Lisa Karger, Ninette Cohen, Lakshmi Mehta,
Chapter 7 Multifactorial Traits
Eric Samorodnitsky, Jharna Datta, Benjamin M
The Fine-Scale and Complex Architecture of Human Copy-Number Variation
A Genome-Wide High-Resolution Array-CGH Analysis of Cutaneous Melanoma and Comparison of Array-CGH to FISH in Diagnostic Evaluation  Lu Wang, Mamta Rao,
Linkage Disequilibrium and Heritability of Copy-Number Polymorphisms within Duplicated Regions of the Human Genome  Devin P. Locke, Andrew J. Sharp, Steven.
Optimal gene expression analysis by microarrays
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
High-Throughput Identification and Quantification of Candida Species Using High Resolution Derivative Melt Analysis of Panfungal Amplicons  Tasneem Mandviwala,
Iuliana Ionita, Raoul-Sam Daruwala, Bud Mishra 
Volume 5, Issue 4, Pages (November 2013)
SNPs and CNPs By: David Wendel.
Development of a Novel Next-Generation Sequencing Assay for Carrier Screening in Old Order Amish and Mennonite Populations of Pennsylvania  Erin L. Crowgey,
Presentation transcript:

We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability based on raw data before CNV calling. the data sets were analyzed with one or more CNV calling algorithms to determine the number of calls, between-replicate reproducibility and size distribution. we compared the CNV calls to well-characterized and validated sets of variants, in order to examine the propensity for false-positive and false- negative calls and to estimate the accuracy of CNV boundaries.

Measures of array variability and signal-to-noise ratio The derivative log2 ratio spread statistic describes the absolute value of the log2 ratio variance from each probe to the next, averaged over the entire genome. The interquartile range is a measure of the dispersion of intensities in the center of the distribution, and is therefore less sensitive to outliers.

The data show a correlation between probe-length and variability, with longer probes giving less variance in fluorescence intensity. for Affymetrix results, we found large differences in quality control values depending on the baseline reference library used. CGH arrays generally show better signal-to-noise ratios compared to SNP arrays, probably as a consequence of longer probes on the former platform. Older arrays tend to perform less well than newer arrays from the same producer. For the Affymetrix 500K platform, experiments performed for the 250K Sty assay failed quality control.

CNV calling The platforms with higher resolution, as well as those specifically designed for detection of previously annotated CNVs, identified substantially higher numbers of variants compared to lower resolution arrays. arrays that specifically target CNVs detected in previous studies (e.g., Illumina 660W) have a very uniform distribution of number of probes per CNV call compared to arrays such as Illumina 1M and Affymetrix 6.0. Another aspect of the CNV calls that differ widely between platforms is the ratio of copy number gains to losses. Certain arrays are very biased toward detection of deletions, with the Illumina 660W showing the highest ratio of deletions to duplications. For platforms with a higher resolution, a lower proportion of CNVs overlap genes

Between-replicate CNV reproducibility A CNV call was considered replicated if there was a reciprocal overlap of at least 80% between CNVs in a pair-wise comparison. Reproducibility was measured by calculating the sample level Jaccard similarity coefficient. The reproducibility is found to be <70% for most platform and algorithm combinations. In general, the most recently released arrays perform better, resulting in more CNV calls reproducibly obtained between replicates.

We observed that the variability in CNV calls was larger when using different CNV calling algorithms on the same raw data, compared to when the same algorithm is used on the data from different laboratories. We find that results originating from different laboratories tend to cluster together, indicating that the site where the experiment was performed has less effect on resulting data than the choice of platform or algorithm. Interlaboratory variability correlates with reproducibility, and platforms exhibiting high reproducibility in replicates also seem more robust to interlaboratory variability. The exceptions to this are the Affymetrix arrays, where CNV calls are highly dependent on the reference data set used for analysis. We observe that the sample-level concordance of CNV calls between any combinations of two algorithms is typically 25–50% within a platform, and even lower for comparisons across platforms

reproducibility is generally similar for large (>50kb) and small CNVs (1-50kb). We note that the reproducibility of large CNV calls is disproportionally affected by genome complexity as they tend to overlap SegDups to a larger extent than small CNVs. the fraction of calls missed by each platform (of the regions detected by at least two other arrays), ranges from 15% for Agilent 2×244K to ~73–77% for Illumina SNP arrays. These differences between platforms may to some extent be explained by overlap with SegDups. We also find that many of the calls missed by SNP arrays but detected by CGH arrays are duplications.

Comparison to independent CNV data sets To estimate the accuracy of CNV calls, we compared the variants from each array and algorithm to existing CNV maps (DGV; CNV calls from HapMap samples; Paired-end sequencing). For most platforms, at least 50% of the variants have been previously reported. There is better overlap with regions previously reported by array studies than regions originating from sequencing studies, which might be expected as all our CNV data stems from arrays. It is important to note that all samples included in the current study have also been interrogated in other studies represented in DGV using different platforms. This likely contributes to a higher overlap than what would be found with previously uncharacterized samples.

Estimation of breakpoint accuracy Algorithms with predefined variant sets (e.g., Birdsuite) and algorithms searching for clustering across samples (such as iPattern) show substantially better reproducibility in breakpoint estimation for common CNVs than do algorithms treating each sample as an independent analysis. The data show that all platforms tend to underestimate the size of CNVs. This might be expected as the results reported for each algorithm correspond to the last probes within the variant showing evidence of a copy number difference. Arrays targeting known CNVs show the highest accuracy, as a result of their high probe density at these loci.

Discussion the newer arrays, with a subset of probes specifically targeting CNVs, outperformed older arrays both in terms of the number of calls and the reproducibility of those calls. Different algorithms give substantially different quantity and quality of CNV calls. algorithms developed specifically for a certain data type (e.g., Birdsuite for Affymetrix 6.0 and DNA Analytics for Agilent data) generally perform better than platform-independent algorithms (e.g., Nexus Copy Number) or tools that have been readapted for newer versions of an array (e.g., dCHIP on Affymetrix 6.0 data).