Probe design for microarrays using OligoWiz Rasmus Wernersson, Assistant Professor Center for Biological Sequence Analysis Technical University of Denmark.

Slides:



Advertisements
Similar presentations
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Advertisements

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Probe design for microarrays using OligoWiz. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Statistical Analysis of Microarray Data
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CSE182-L12 Gene Finding.
Selection of Optimal DNA Oligos for Gene Expression Arrays Reporter : Wei-Ting Liu Date : Nov
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduce to Microarray
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
©2003/04 Alessandro Bogliolo Primer design. ©2003/04 Alessandro Bogliolo Outline 1.Polymerase Chain Reaction 2.Primer design.
Affymetrix vs. glass slide based arrays
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Experimental Design and Setup. Experimental Design What is the question? Which experiments will give the answer? How many replicates do we need?
1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics.
Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Primer Design and Computer Program Does it really matter? Principles of Primer Design Can I trust my gut feeling? What should I do? Sean Tsai ©1999, National.
Probe Design Using Exact Repeat Count August 8th, 2007 Aaron Arvey.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays Henrik Bjorn Nielsen, Rasmus Wernersson and Steen.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
From Genomes to Genes Rui Alves.
Introduction to RNAseq
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Introduction to Microarrays. The Central Dogma.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Motif Search and RNA Structure Prediction Lesson 9.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Introduction to Oligonucleotide Microarray Technology
D. Darban, Ph.D Department of Microbiology School of Medicine Alborz University of Medical Sciences 1 Probe and Primer Design.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Primer design.
Selection of Oligonucleotide Probes for Protein Coding Sequences
Lecture 4: Probe & primer design
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Getting the numbers comparable
Fitness measures for DNA Computing
Presentation transcript:

Probe design for microarrays using OligoWiz Rasmus Wernersson, Assistant Professor Center for Biological Sequence Analysis Technical University of Denmark

Probe design -What is a Probe -OligoWiz -Probe Design -Cross Hybridization and Complexity -Affinity -Position for microarrays

Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

An Ideal Probe - Discriminate well between its intended target and all other targets in the target pool - Detect concentration differences under the hybridization conditions must

comparisons AdvantagesDisadvantages PCR products Inexpensive Linkers can be applied Handling problems Hard to design to avoid cross- hybridization Unequal amplification Oligos Can be designed for many criteria Easy to handle Normalized concentrations Linkers can be applied Expensive (Dkk per oligo) Affymetrix GeneChip High quality data Standardized arrays Fast to set up Multiple probes per gene Expensive Arrays available for limited number of species Probe Type

OligoWiz a Tool for flexible probe design

OligoWiz 2.0 is a client-server application for designing oligonucleotides for microarrays The OligoWiz client (the graphical interface) is written in Java 1.4 and runs on virtually all platforms The OligoWiz Server performs the heavy-duty computation and is hosted on a multi-CPU Altix server at CBS. OligoWiz is created by Henrik Bjørn Nielsen and Rasmus Wernersson both at the Center for Biological Sequence Analysis at the Technical University of Denmark. About OligoWiz How and Who

About the OligoWiz scores All scores are normalize to a value between 0.0 (worst) and 1.0 (best). All scores are independent and is assigned a user-adjustable weight. A total score is calculated as the sum of all weighted scores and is normalized to a value between 0.0 and 1.0.

How to Avoid From Kane et al. (2000) we learn that a 50’mer probe can detect significant false signal from a target that has >75-80% homology to a 50’mer oligo or a continuous stretch of >15 complementary bases If we have substantial sequence information on the given organism, we can try to avoid this by choosing oligos that are not similar to any other expressed sequences. cross-hybridization

Hughes et al Probe Specificity

Mapping Regions 5’ BLAST hits >75% & longer than 15bp 3’ The Sequence we want to design a probe for 50 bp Regions suitable for probes without similarity to other transcripts

BLAST hits >75% & longer than 15bp 5’ 3’ Sequence identical or very similar to the query sequence Therefore no BLAST hits with homology > 97% and with a ‘hit length vs. query length’ ratio > 0.8, are considered. 50 bp Filtering Self Detecting BLAST hits out The Sequence we want to design a oligo for

Only BLAST hits that passed filtering are considered If m is the number of BLAST hits considered in position i. Let h=(h1 i,...,hm i ) be the BLAST hits in position i in the oligo Where n is the length of the oligo Cross-hybridization Oligo BLAST hits { Max hit in pos. i 100% 0 expressed as a ‘homology score’

Similar Affinity Another way of ensuring a optimal discrimination between target and non-target under hybridization is to design all the oligos on an array with similar affinity for their targets. This will allow the experimentalist to optimize the hybridization conditions for all oligos by choosing the right hybridization temperature and salt concentration. Commonly Melting Temperature (Tm) is used as a measure for DNA:DNA or RNA:DNA hybrid affinity. for all oligos

Where  H (Kcal/mol) is the sum of the nearest neighbor enthalpy, A is a constant for helix initiation corrections,  S is the sum of the nearest neighbor entropy changes, R is the Gas Constant (1.987 cal deg-1 mol-1) and Ct is the total molar concentration of strands. Where N is all oligos in all sequences. Melting Temperature difference

Tm distributions for 30’mers and 50’mers

 Tm Distribution for oligo length intervals

Avoid self annealing oligos Probes that form strong hybrids with it self i.e. probes that fold should be avoided. But, accurate folding algorithms like the one employed by mFOLD or RNAfold, is too time consuming, for large scale folding of oligos. Sensitivity may be influenced Time consumption: mFOLD ~2 sec / 30’mer Pr. gene (500bp) ~16 min.

Folding an oligonucleotide AT TG CT CG GT TT AT TG CT CG GT TT Minimal loop size border Dynamic programming: alignment to inverted self The alignment is based on dinucleotides { { { {{{ Substitution matrix is based on binding energies an approximation

Folding a lot of oligos AT TG CT CG GT TT AT TG CT CG GT TT Dynamic programming calculation for second etc. probe Full dynamic programming calculation for first probe Super-alignment matrix Minimal loop size border Last probe a fast heuristic implementation

Reasonably folding prediction compared to mFOLD

Probes With Very Common Oligo with low-complexity: AAAAAAAGGAGTTTTTTTTCAAAAAACTTTTTAAAAAAGCTTTAGGTTTTTA (Human) Oligo without low-complexity: CGTGACTGACAGCTGACTGCTAGCCATGCAACGTCATAGTACGATGACT (Human) sub sequences may result in unspecific signal If the sub-fractions of an oligo are very common we define it as ‘low-complex’

Where norm is a function that normalizes to between 1 and 0, L is the length of the oligo and W i is the pattern in position i. expressed as a score For a given transcriptome a list of information content from all ‘words’ with length wl (8bp) is calculated: Where f(w) is the number of occurrences of a pattern and tf(w) is the total number of patterns of length wl. A low-complexity score for a given oligo is defined as: Low-complexity = 1-norm Low-complexity

Location of Oligo within transcript Labeling include reverse transcription of the mRNA and is sensitive to: - RNA degradation - Premature termination of cDNA synthesis - Premature termination of cRNA transcription (IVT) A ‘Position Score’ reflecting this (eukaryotes): Position score= (1-drp)  3’end Where drp is the chance of labeling termination pr. base

Species databases 215 species currently available The species databases are built from complete genomic sequences or UniGene collections in the case of Vertebrates. The databases are used for: Cross hybridization Low-complexity

Sequence Features -Special purpose arrays -Example: Detecting Differential splicing Intron/Exon structure, UTR regions etc. Exon Intron Exon

Annotation String Single letter code. Sequence:ATGTCTACATATGAAGGTATGTAA Annotation:(EEEEEEEEEEEEEE)DIIIIIII E: Exon I: Intron (: Start of exon ): End of exon D: Donor site A: Accepter site - single letter code

Extracting annotation -FeatureExtract server - from GenBank files

Excercise Running OligoWiz 2.0 Java or better required Input data Sequence only (FASTA) Sequence and annotation Rule-based placement of multiple probes Distance criteria Annotation criteria Please go to the exercise web-page linked from the course programme.