Interval Scores for Quality Annotated CGH Data Doron Lipson 1, Anya Tsalenko 2, Zohar Yakhini 1,2 and Amir Ben-Dor 2 1 Technion, Haifa, Israel 2 Agilent.

Slides:



Advertisements
Similar presentations
Amy Y-Y Chen, MD Andrew Chen, MD
Advertisements

Overview research topics and techniques
ICSA, 6/2007 Pei Wang, 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Comparative genomic hybridization (CGH) is a technique for studying chromosomal changes in cancer. As cancerous cells multiply, they can undergo dramatic.
Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course Term 3, 2010/2011 Lecture on Wednesday, February 2 nd,
27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Some slides adapted from J. Fridlyand BioSys course: DNA Microarray Analysis – Lecture, 2007 Analysis of Array CGH Data by Hanni Willenbrock.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Algorithms for Smoothing Array CGH data
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Human Cancer Genome Project Computational Systems Biology of Cancer: (III)
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Fuzzy K means.
 MicroRNAs (miRNAs) are a class of small RNA molecules, about ~21 nucleotide (nt) long.  MicroRNA are small non coding RNAs (ncRNAs) that regulate.
ChrX probes Autosomal probes ChrX probes Autosomal probes Autosomal probes ChrX probes Effect of hybridization temperature on microarray performance Figure.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
Manifestation of Novel Social Challenges of the European Union in the Teaching Material of Medical Biotechnology Master’s Programmes at the University.
Comparative Genomic Hybridization Srikesh G. Arunajadai Division of Biostatistics University of California – Berkeley PH 296 Presentation Fall 2002 December.
CDNA Microarrays MB206.
Microarrays and Their Uses Brad Windle, Ph.D
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Lecture 8. Functional Genomics: Gene Expression Profiling using DNA microarrays. Part II Clark EA, Golub TR, Lander ES, Hynes RO.(2000) Genomic analysis.
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Pedagogical Objectives Bioinformatics/Neuroinformatics Unit Review of genetics Review/introduction of statistical analyses and concepts Introduce QTL.
CGH Data BIOS Chromosome Re-arrangements.
L.S. Rector 1, N.A. Yamada 2, M.E. Aston 1, M.C. Sederberg 1 R.A. Ach 2, P. Tsang 2, E. Carr 2, A. Scheffer-Wong 2, N. Sampas 2, B. Peter 2, S. Laderman.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
CSCI2950-C Lecture 2 September 11, Comparative Genomic Hybridization (CGH) Measuring Mutations in Cancer.
Gene Expression Profiling Brad Windle, Ph.D
Simple-Sequence Length Polymorphisms
CSCI2950-C Lecture 10 Cancer Genomics: Duplications
Integrated Cytogenetic and High-Resolution Array CGH Analysis of Genomic Alterations Associated with MYCN Amplification Cytogenet Genome Res 2011;134:27–39.
Figure 1. Validation of the chromosome 22 array
Volume 1, Issue 1, Pages (February 2002)
Microarray Technology and Applications
Interval Scores for Quality Annotated CGH Data
Comprehensive Screening of Gene Copy Number Aberrations in Formalin-Fixed, Paraffin-Embedded Solid Tumors Using Molecular Inversion Probe–Based Single-
Harvey A. Greisman, Noah G. Hoffman, Hye Son Yi 
Mariëlle I. Gallegos Ruiz, MSc, Hester van Cruijsen, MD, Egbert F
CSCI2950-C Lecture 3 September 13, 2007.
Timon P. H. Buys, BSc, Sarit Aviel-Ronen, MD, Thomas K
Microarray Techniques to Analyze Copy-Number Alterations in Genomic DNA: Array Comparative Genomic Hybridization and Single-Nucleotide Polymorphism Array 
The Fine-Scale and Complex Architecture of Human Copy-Number Variation
Cyclin E1 Is Amplified and Overexpressed in Osteosarcoma
Diverse abnormalities manifest in RNA
Bassem A. Bejjani, Lisa G. Shaffer 
Presentation transcript:

Interval Scores for Quality Annotated CGH Data Doron Lipson 1, Anya Tsalenko 2, Zohar Yakhini 1,2 and Amir Ben-Dor 2 1 Technion, Haifa, Israel 2 Agilent Laboratories, Palo Alto, CA References 1.Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, Tsang P, Curry B, Baird K, Meltzer PS, Yakhini Z, Bruhn L, and Laderman S., Comparative Genomic Hybridization using Oligonucleotide Microarrays and Total Genomic DNA. PNAS 2004; 101(51): Lipson D, Aumann Y, Ben-Dor A, Linial N, and Yakhini Z., Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis. Ninth Annual International Conference on Research in Computational Molecular Biology, RECOMB 2005 (Cambridge, MA). 3.Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, and Brown PO. Microarray Analysis Reveals a Major Direct Role of DNA Copy Number Alteration in the Transcriptional Program of Human Breast Tumors. PNAS 2002; 99(20): Dehan E, Ben-Dor A, Liao W, Lipson D, Rienstein S, Simansky D, Krupsky M, Yaron P, Friedman E, Rechavi G, Perlman M, Aviram-Goldring A, Bittner M, Yakhini Z, and Kaminski N. Chromosomal Aberrations and Gene Expression Profiles in Non Small Cell Lung Cancer. In preparation. Most human cancers arise as a result of an acquired genomic instability and the subsequent evolution of clonal populations of cells with accumulated genetic errors. Accordingly, most cancers and some premalignant tissues contain multiple genomic abnormalities not present in cells within the normal tissues from which the neoplasias arose. These abnormalities include gains and losses of chromosomal regions that vary extensively in their sizes, up to and including whole chromosomes. Increases in genomic copy number can lead to overexpression of tumor promoter genes (oncogenes) while losses are associated with disruption of normal cell regulatory processes (e.g through the loss of tumor suppressor genes). The Cancer Genome Normal Human Genome Stable diploid copy number even in most diseases, e.g. cardiovascular, neurological. Cancer Genome Multiple genome-wide chromosome aberrations including copy number changes and rearrangements Array-based Comparative Genomic Hybridization (aCGH) DNA copy number alterations have been measured using fluorescence in situ hybridization-based techniques. The development of a genome wide technique – Comparative Genomic Hybridization (CGH) – allowed to jointly measure multiple chromosomal alterations present in cancer cells. Differentially labeled tumor and normal DNA are co-hybridized to normal metaphase chromosomes and ratios between the two labels allow the quantification of changes in DNA copy number. In a more advanced method termed array CGH (aCGH), the metaphase chromosomes are replaced by a microarray of thousands of genomic BAC, cDNA or oligonucleotide probes, greatly enhancing the resolution at which changes in DNA copy number may be detected. HT-29 colon carcinoma cell line [1] The Interval Score Let C=(c 1 …c n ) be a vector of all log(R/G) measurements along some chromosome. if the target contains an aberration then we expect to see many consecutive positive or negative entries in C. On the other hand, if the target is normal we expect no localized effects. Intuitively, we look for intervals (sets of consecutive probes) where signal sums are significantly higher or lower than expected at random. As a null model we assume that no aberration is present in the target, and therefore the variation in C represents only the noise of the measurement. Assuming that the measurement noise along the chromosome is independent for distinct probes and normally distributed, let µ and  denote the mean and standard deviation of the normal genomic data. Given an interval I spanning k probes, we define its score as: MaxInterval Algorithm I: LookAhead Assume you are given: m – An upper bound for the value of a single element c i t – A lower bound on the maximum score If we are currently considering an interval I =[i,…,i+k-1] with a sum of s =  j  I c j, then the score of I is: The score of an interval I’ = [i,…,i+k+x-1] is then bounded by: Complexity: Expected O(n 1.5 ) (unproved) Solve for first x for which S( I ) may exceed t. sum length score s k s+mx k+x II’I’ Applications: Common Aberrations Finding common aberrations in a set of samples can be performed directly by using variants of the interval score (see [2] for details). >0 <0 Chromosome 3 of 26 lung tumor samples on mid-density cDNA array. Data from Dehan et al [4]. Common deletion located in 3p21 and common amplification – in 3q. Chromosomes 8 and 11 of 37 breast tumor samples on mid-density cDNA array. Data from Pollack et al [4]. Common deletion located in 8p and common amplification – in 11q. Samples Applications: Single Samples Chromosome 16 of HCT116 colon carcinoma cell line on high-density oligo array (n=5,464). Data from Barrett et al [1]. Chromosome 17 of several breast carcinoma cell lines on mid-density cDNA array (n=364). Data from Pollack et al [3] Mbp ERBB2 Log 2 (ratio) 0 1 FRA16BA2BP1 050Mbp2575 Log 2 (ratio) Quality Weighted Interval Scores For an interval I, spanning k probes, compute a weighted mean: Variance of individual loci: Variance due to consistency within the interval: And finally, the interval score: Consider the vector V=((c 1,q 1 ),(c 2,q 2 ),… (c n,q n )) where at each locus i the number c i is the measured log(R/G) and the number q i represents the standard deviation of this particular measurement. For every I set w i =(q i ) -2. Chr. 17 of MDA-MB-453 breast cancer cell-line sample Data from Barrett et al [1]. Analysis using simple interval score: Analysis that accounts the signal consistency within the interval (  con ) and single locus variance (  loci ). Note the difference in the aberrations called for the genomic regions 58-75Mbp, and 8-15Mbp. Radii of the datapoints proportional to w i The MaxInterval Problem For convenience of algorithmic analysis we define the MaxInterval problem of finding the maximal scoring interval. Other intervals with high scores may be found by recursively calling this function. Input:A vector C=(c 1 …c n ) Output:An interval I  [1…n], that maximizes S( I ) 12 Identification and Mapping of Genomic Alteration Events A common first step in analyzing DNA copy number data consists of identifying aberrant (amplified or deleted) regions in each individual sample. Given a series of log(R/G) measurements along some genomic region, e.g. a chromosome, we would like to identify intervals within this vectors that consistently contain significantly high values (amplifications) or significantly low values (deletions) Log 2 (R/G) DeletionAmplification Genomic position MaxInterval Algorithm II: Geometric Family Approximation (GFA) For  >0 define the following geometric family of intervals: kjkj jj  (j 1 )  (j 2 )  (j 3 ) Theorem [2]: Let I * be the optimal scoring interval. Let J be the leftmost longest interval of  fully contained in I *. Then S(J) ≥ S( I *)/ , where    -2. Complexity: O(n) 7 Benchmarking Benchmarking results of the Exhaustive, LookAhead and GFA algorithms on synthetic vectors of varying lengths. Linear regression suggests that the complexities of the Exhaustive, LookAhead and GFA algorithms are O(n 2 ), O(n 1.5 ), O(n), respectively