Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic.

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.
04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
EE150a – Genomic Signal and Information Processing Seminar series –lectures on first 3 meetings, followed by students presentations –statistical signal.
Design and Optimization of Universal DNA Arrays Ion Mandoiu CSE Department & BME Program University of Connecticut.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Probe design for microarrays using OligoWiz. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
A Data Mining Method to Predict Transcriptional Regulatory Sites Based On Differentially Expressed Genes in Human Genome HSIEN-DA HUANG, HUEI-LINA and.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Finding associated genes in large collections of microarrays.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
Analysis of microarray data
Microarray Preprocessing
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Chapter 14 Jizhong Zhou and Dorothea K. Thompson.
with an emphasis on DNA microarrays
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Affymetrix vs. glass slide based arrays
Gene expression & Clustering (Chapter 10)
1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics.
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Technology for Systems Biology. Nucleic Acid Hybridization In principle complementary strands will associate Chemistry is quite different on surfaces.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
MICROARRAY TECHNOLOGY
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 12 RNA – seq analysis.
EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
The State of Microarrays The Scientist: 2003 By: Hien Dang.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction to Oligonucleotide Microarray Technology
Other uses of DNA microarrays
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Probabilistic Sparse Matrix Factorization
Lecture 11 By Shumaila Azam
Probabilistic Sparse Matrix Factorization
Making Use of Associations Tests
Presentation transcript:

Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic and Statistical Inference Group, University of Toronto (2) Banting & Best Department of Medical Research, University of Toronto Jim Huang (1),

Description and Applications of DNA Microarrays Microarrays consist of a 2-D array of probes, each with a short DNA sequence attached. These sequences are called oligonucleotide sequences. The output of each probe is approximately proportional to the amount of DNA that binds to the probe from a given tissue; the data for each probe is an N-dimensional expression profile vector, where N is the number of tissues used on the array. DNA microarrays can be used to measure the level of gene expression across these N tissues.

Hybridization and cross- hybridization The process of 2 complementary DNA strands binding is called hybridization; Ideally, an oligonucleotide probe will only bind to the DNA sequence for which it was designed and to which it is complementary; However, many DNA sequences are similar to one another and can bind to other probes on the array; This phenomenon is called cross-hybridization; AGCTAGGATAGCTAGGAT TCGATCCTATCGATCCTA ATCTAGAATATCTAGAAT TCGATCCTATCGATCCTA Hybridization Cross-hybridization Oligonucleotide Probe DNA from tissue sample

The trouble with cross- hybridization With cross-hybridization, each probe will signal the presence of multiple sequences other than that it was designed for; This skews the observed data from the expected data. Expected expression profile vector (no hybridization) Observed expression profile vector (cross-hybridized) =+

Detecting cross-hybridization (1) To test for whether cross-hybridization is impacting the gene expression data, we perform a BLAST sequence match on all oligonucleotide probe sequences used on the microarray; Many probes will be matched with sequences for which it wasn’t specifically designed.

Detecting cross-hybridization (2) We compute the Pearson correlation coefficient ρ between matched probe sequence expression profiles and between the profiles of randomly-paired probes; Approximately 33% of the BLAST-matched probes have ρ > 0.95, whereas only 2% of randomly-matched probes have ρ >0.95; This difference in the 2 distributions indicates that cross-hybridization indeed has a significant impact on the observed gene expression data.

Compensating for cross- hybridization We model the observed, cross-hybridized expression profile vector x as a matrix product of a hybridization matrix Λ and an unobserved expression profile vector z in which there is no cross-hybridization. The elements λ ij of the Λ matrix are set as parameterized functions of the Gibbs free energy ΔG ij between probes i and j. To compensate for cross-hybridization, we use a generalized Expectation-Maximization algorithm in which we solve for z and Λ iteratively.