Microarray Yuki Juan NTUST May 26, 2003.

Slides:



Advertisements
Similar presentations
Microarray Technology and Applications
Advertisements

Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Gene Expression Chapter 9.
Getting the numbers comparable
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Microarrays Dr Peter Smooker,
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray analysis Golan Yona ( original version by David Lin )
Chip arrays and gene expression data. With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions.
The Human Genome Project and ~ 100 other genome projects:
Chip arrays and gene expression data. Motivation.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Information Aspects of Nucleic Acids Measurement Technologies Description of nucleic acid measurement technologies Algorithmic, optimization, data analysis.
Alternative Splicing As an introduction to microarrays.
Introduce to Microarray
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Analysis of microarray data
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Whole Genome Expression Analysis
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Microarray Technology
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
What Is Microarray A new powerful technology for biological exploration Parallel High-throughput Large-scale Genomic scale.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Genomics I: The Transcriptome
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART I) DR. AYAT B. AL-GHAFARI MONDAY 3 RD MUHARAM 1436.
Microarray Introduction.
Part 3 Gene Technology & Medicine
Gene Expression Analysis
Microarray - Leukemia vs. normal GeneChip System.
Functional Genomics in Evolutionary Research
Microarray Technology and Applications
Getting the numbers comparable
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

Microarray Yuki Juan NTUST May 26, 2003

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray Discussion

The Biology Background of Microarray The central dogma of life forms DNA RNA Monitoring the expression of genes

Central Dogma DNA Replication RNA Transcription Protein Translation --ACGCGA-- --TGCGCT-- RNA Transcription --UGCGCU-- Protein Translation --CYSALA--

DNA replication transcription translation DNA RNA Protein

DNA The double helix Nucleotide Base pair Oligonucleotide stable A, T, G, C Base pair A – T G – C Oligonucleotide short DNA (tens of nucleotides, or bps) (http://www.nhgri.nih.gov/)

DNA Strand DNA has canonical orientation read from 5’ to 3’ antiparallel: one strand has direction opposite to its complement’s 5’ … TACTGAA … 3’ 3’ … ATGACTT … 5’

Hydrogen Bond Makes DNA Binding Specifically 5’ 3’ 5’ 3’

Hydrogen Bond Makes DNA Binding Specifically The force between base pair is hydrogen bond, This force let A-T(U), C-G can specifically match together.

RNA replication transcription translation DNA RNA Protein

RNA Types messenger RNA ribosomal RNA (rRNA) transfer RNA (tRNA) Gene is expressed by transcribing DNA into single-stranded mRNA

RNA (Detailed) (http://www.nhgri.nih.gov/)

Reverse Transcription replication transcription translation DNA RNA Protein Reverse Transcription By reverse transcriptase, we can convert RNA into cDNA.

The Southern Blot Basic DNA detection technique that has been used for over 30 years, known as Southern blots: A “known” strand of DNA is deposited on a solid support (i.e. nitocellulose paper) An “unknown” mixed bag of DNA is labelled (radioactive or flourescent) “Unknown” DNA solution allowed to mix with known DNA (attached to nitro paper), then excess solution washed off If a copy of “known” DNA occurs in “unknown” sample, it will stick (hybridize), and labeled DNA will be detected on photographic film

mRNA Represent Gene Function When measure the level of a mRNA, we are monitoring the activity of a gene. Thus, if we can understand all the level of mRNAs, we can study the expression of whole genome. Microarray takes the advantage of getting over 10000 of blotting data in a single experiment, which makes monitoring the genome activity possible.

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray Discussion

Design of Microarray Microarray in different context The idea of microarray Main type of array chips

mRNA Levels Compared in Many Different Contexts Different tissues, same organism (brain v. liver) Same tissue, same organism (tumor v. non-tumor) Same tissue, different organisms (wt v. mutant) Time course experiments (development) Other special designs (e.g. to detect spatial patterns).

Idea of Microarray Cell A Cell B Labeled cDNA from geneX Hybridizaton to chip Spot of geneX with complementary sequence of colored cDNA This spot shows red color after scanning.

Over 10,000 Hybridization Could Be Down at One Time

Several Types of Arrays Spotted DNA arrays Developed by Pat Brown’s lab at Stanford PCR products of full-length genes (>100nt) Affymetrix gene chips Photolithography technology from computer industry allows building many 25-mers Ink-jet microarrays from Agilent 25-60-mers “printed directly on glass slides Flexible, rapid, but expensive

Array Fabrication Spotting Use PCR to amplify DNA Robotic "pen" deposits DNA at defined coordinates approximately 1-10 ng per spot Experimentation with oligos (40, 70 bp)

This machine can make 48 microarrays simultaneously.

Array Fabrication Photolithography Light activated synthesis synthesize oligonucleotides on glass slides 107copies per oligo in 24 x 24 um square Use 20 pairs of different 25-mers per gene Perfect match and mismatch

Array Fabrication Photolithography

Affymetrix Microarrays Raw image 1.28cm 50um ~107 oligonucleotides, half perfectly match mRNA (PM), half have one mismatch (MM) Raw gene expression is intensity difference: PM - MM

Agilent cDNA microarray and oligonucelotides microarray Agilent delivering printed 60-mer microarrays in addition to 25-mer formats. The inkjet process uses standard phosphoramidite chemistry to deliver extremely small volumes (picoliters) of the chemicals to be spotted.

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray

The Workflow of Microarray sample Plate Plate Preparation RNA extraction Array Fabrication cDNA synthesis and labeled Array Hybridization Labeled cDNA Hybridized Array Scanning

cDNA Synthesis And Directly Labeling

Cy3 and Cy5 cDNA Hybridization On To The Chip e.g. treatment / control normal / tumor tissue Sample loading 1.Loading from the corner of the cover slip It is time consuming and easily producing bubbles. 1 2. Loading sample at the center of array then put the slip smoothly Faster, and have lower chance of bubble producing then the last one. 2 Sample loading 3. Loading sample at the side of the array then put the slip on. Solution would attach to the slip right after the slip contact with it, and would diffuse with the movement of slip when we slowly move down. 3 Sample loading

Scan Green: down regulate Red: up regulate Yellow: equal level

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray Discussion

Image analysis To find a spot Convert feature into numeric data Image normalization

The Algorithms 1. Find spots: Finds the location of each spot on the microarray. 2. Cookie cutter algorithm: (1).Suppose the distribution of pixels vs intensity is Gaussian curve (2).Using SD or IQR to identify the feature and background of each spot (3).Calculates statistics for the pixel population

Interquartile Range(IQR) D K=IQR/2 1.42 IQR Boundary for rejection 25% 50% 75% Boundary for rejection IQR

Feature or cookie D Local background Exclusion zone

Data Quality Irregular size or shape Irregular placement Low intensity Saturation Spot variance Background variance miss alignment artifact indistinguishable saturated bad print

Convert Feature Into Numeric Value Green background Green b.g.-corrected Red b.g.-corrected (R. b.g.-c)/(G. b.g.-c) Red intensity Green intensity Systematic name Red b.g. Gene function

Data Normalization Normalize data to correct for variances Dye bias Location bias Intensity bias Pin bias Slide bias Control vs. non-control spots

Data Normalization Uncalibrated, red light under detected Calibrated, red and green equally detected

Data Normalization Assumptions Overall mean average ratio should be 1 Most genes are not differentially expressed Total intensity of dyes are equivalent

Intensity Dependent Normalization

After Normalization

Additional Normalization Pin dependent Similar to intensity dependent fit. Compute individual lowess fits for each pin group Within slide normalization After pin dependent normalization, log ratios for each pin are centered around 0 Scale variance for each pin Uses MAD (median absolute deviation)

Additional Normalization Dye swap Combine relative expression levels without explicit normalization Compute lowess fit for log2(RR’/GG’)/2 vs. log2(A + A’)/2 Normalized ratio is log2(R/G) - c(A) where c(A) is the lowess prediction

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray Discussion

Data analysis Data filtering Fold change analysis Classification Clustering Future direction

Microarray Data Classification Microarray chips Images scanned by laser Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 Datasets New sample Data Mining and analysis Prediction:

The Threshold of Spots Filtering - remove genes with insufficient variation Remove insufficient spot: saturated, None uniform, too high background… Remove extreme signal: e.g. MaxVal - MinVal < 500 and MaxVal/MinVal < 5 Statistical filtering (e.g. p-value<0.01) biological reasons feature reduction for algorithmic

Microarray Data Analysis Types Different gene expression Fold change analysis Classification (Supervised) identify disease predict outcome / select best treatment Clustering (Unsupervised) find new biological classes / refine existing ones exploration …

Differential Gene Expression n-fold change n typically >= 2 May hold no biological relevance Often too restrictive 2 expression Calculate standard deviation  Genes with expression more than 2 away are differentially expressed

Fold Changes-Scatter Plot 21

Fold Changes Table 23

Classification: Multi-Class Similar Approach: select top genes most correlated to each class select best subset using cross-validation build a single model separating all classes Advanced: build separate model for each class vs. rest choose model making the strongest prediction

Popular Classification Methods Decision Trees/Rules find smallest gene sets, but also false positives Neural Nets - work well if number of genes is reduced SVM good accuracy, does its own gene selection, hard to understand K-nearest neighbor - robust for small number genes Bayesian nets - simple, robust

Multi-class Data Example Brain data, Pomeroy et al 2002, Nature (415), Jan 2002 42 examples, about 7,000 genes, 5 classes Selected top 100 genes most correlated to each class Selected best subset by testing 1,2, …, 20 genes subsets, leave-one-out x-validation for each

Classification – Other Applications Combining clinical and genetic data Outcome / Treatment prediction Age, Sex, stage of disease, are useful e.g. if Data from Male, not Ovarian cancer

Clustering Goals Find natural classes in the data Identify new classes / gene correlations Refine existing taxonomies Support biological analysis / discovery Different Methods Hierarchical clustering, SOM's, etc

SOM clustering SOM - self organizing maps Preprocessing filter away genes with insufficient biological variation normalize gene expression (across samples) to mean 0, st. dev 1, for each gene separately. Run SOM for many iterations Plot the results

SOM & K Mean By GeneSpring 27

Hierarchical Clustering The most popular hierarchical clustering method used in microarray data analysis is the so called agglomerative method works with the data in a bottom-up manner. Initially, each data point forms a cluster and the algorithm works through the cluster sets by repeatedly merging the two which are the most similar or have the shortest distance. algorithm involves the computation of the distance or similarity matrix O(N^2) complexity and thus is not very efficient.

Hierarchical clustering

Future directions Algorithms optimized for small samples (the no. of samples will remain small for many tasks) Integration with other data biological networks medical text protein data cost-sensitive classification algorithms error cost depends on outcome (don’t want to miss treatable cancer), treatment side effects, etc.

Integrate biological knowledge when analyzing microarray data (from Cheng Li, Harvard SPH) Right picture: Gene Ontology: tool for the unification of biology, Nature Genetics, 25, p25

Content Biology background of microarray Design of microarray The workflow of microarray Image analysis of microarray Data analysis of microarray Discussion

Microarray Potential Applications Biological discovery new and better molecular diagnostics new molecular targets for therapy finding and refining biological pathways Mutation and polymorphism detection Recent examples molecular diagnosis of leukemia, breast cancer, ... appropriate treatment for genetic signature potential new drug targets

Microarray Limitations Cross-hybridization of sequences with high identity Chip to chip variation True measure of abundance? Does mRNA levels reflect protein levels? Generally, do not “prove” new biology - simply suggest genes involved in a process, a hypothesis that will require traditional experimental verification. What fold change has biological relevance? Need cloned EST or some sequence knowledge -- rare messages may be undetected Expensive!! Not every lab can afford experiment repeat. The real limitation is Bioinformatics

Additional Information Review papers on microarray Genomics, gene expression and DNA arrays (Nature, June 2000) Microarray - technology review (Natural Cell Biology, Aug. 2001) Magic of Microarray (Scientific American, Feb. 2002) Molecular biology tutorial http://www.lsic.ucla.edu/ls3/tutorials/

Biological data retrieval systems: Entrez http://www. ncbi. nlm. nih A retrieval system for searching a number of inter-connected databases at the NCBI. It provides access to: PubMed: The biomedical literature (Medline) Genbank: Nucleotide sequence database Protein sequence database Structure: three-dimensional macromolecular structures Genome: complete genome assemblies PopSet: population study data sets OMIM: Online Mendelian Inheritance in Man Taxonomy: organisms in GenBank Books: online books ProbeSet: gene expression and microarray datasets 3D Domains: domains from Entrez Structure UniSTS: markers and mapping data SNP: single nucleotide polymorphisms CDD: conserved domains 2. Entrez allows users to perform various searches.