13 Reverse Transcription replicationtranscriptiontranslationDNARNAProteinReverse TranscriptionBy reverse transcriptase, we can convert RNA into cDNA.
14 The Southern BlotBasic DNA detection technique that has been used for over 30 years, known as Southern blots:A “known” strand of DNA is deposited on a solid support (i.e. nitocellulose paper)An “unknown” mixed bag of DNA is labelled (radioactive or flourescent)“Unknown” DNA solution allowed to mix with known DNA (attached to nitro paper), then excess solution washed offIf a copy of “known” DNA occurs in “unknown” sample, it will stick (hybridize), and labeled DNA will be detected on photographic film
15 mRNA Represent Gene Function When measure the level of a mRNA, we are monitoring the activity of a gene.Thus, if we can understand all the level of mRNAs, we can study the expression of whole genome.Microarray takes the advantage of getting over of blotting data in a single experiment, which makes monitoring the genome activity possible.
16 Content Biology background of microarray Design of microarray The workflow of microarrayImage analysis of microarrayData analysis of microarrayDiscussion
17 Design of Microarray Microarray in different context The idea of microarrayMain type of array chips
18 mRNA Levels Compared in Many Different Contexts Different tissues, same organism (brain v. liver)Same tissue, same organism (tumor v. non-tumor)Same tissue, different organisms (wt v. mutant)Time course experiments (development)Other special designs (e.g. to detect spatial patterns).
19 Idea of Microarray Cell A Cell B Labeled cDNA from geneX Hybridizaton to chipSpot of geneX with complementary sequence of colored cDNAThis spot shows red color after scanning.
20 Over 10,000 Hybridization Could Be Down at One Time
21 Several Types of Arrays Spotted DNA arraysDeveloped by Pat Brown’s lab at StanfordPCR products of full-length genes (>100nt)Affymetrix gene chipsPhotolithography technology from computer industry allows building many 25-mersInk-jet microarrays from Agilent25-60-mers “printed directly on glass slidesFlexible, rapid, but expensive
22 Array Fabrication Spotting Use PCR to amplify DNARobotic "pen" deposits DNA at defined coordinatesapproximately 1-10 ng per spotExperimentation with oligos (40, 70 bp)
23 This machine can make 48 microarrays simultaneously.
24 Array Fabrication Photolithography Light activated synthesissynthesize oligonucleotides on glass slides107copies per oligo in 24 x 24 um squareUse 20 pairs of different 25-mers per genePerfect match and mismatch
26 Affymetrix Microarrays Raw image1.28cm50um~107 oligonucleotides,half perfectly match mRNA (PM),half have one mismatch (MM)Raw gene expression is intensity difference: PM - MM
27 Agilent cDNA microarray and oligonucelotides microarray Agilent delivering printed 60-mer microarrays in addition to 25-mer formats.The inkjet process uses standard phosphoramidite chemistry to deliver extremely small volumes (picoliters) of the chemicals to be spotted.
28 Content Biology background of microarray Design of microarray The workflow of microarrayImage analysis of microarrayData analysis of microarray
29 The Workflow of Microarray samplePlatePlate PreparationRNA extractionArray FabricationcDNA synthesisand labeledArrayHybridizationLabeled cDNAHybridized ArrayScanning
31 Cy3 and Cy5 cDNA Hybridization On To The Chip e.g. treatment / controlnormal / tumor tissueSample loading1.Loading from the corner of the cover slipIt is time consuming and easily producing bubbles.12. Loading sample at the center of array then put the slip smoothlyFaster, and have lower chance of bubble producing then the last one.2Sample loading3. Loading sample at the side of the array then put the slip on.Solution would attach to the slip right after the slip contact with it, and would diffuse with the movement of slip when we slowly move down.3Sample loading
32 ScanGreen: down regulateRed: up regulateYellow: equal level
33 Content Biology background of microarray Design of microarray The workflow of microarrayImage analysis of microarrayData analysis of microarrayDiscussion
34 Image analysis To find a spot Convert feature into numeric data Image normalization
35 The Algorithms1. Find spots: Finds the location of each spot on the microarray.2. Cookie cutter algorithm:(1).Suppose the distribution of pixels vs intensity is Gaussian curve(2).Using SD or IQR to identify the feature and background of each spot(3).Calculates statistics for the pixel population
36 Interquartile Range(IQR) DK=IQR/21.42 IQRBoundary for rejection25%50%75%Boundary for rejectionIQR
37 Feature or cookieDLocal backgroundExclusion zone
38 Data Quality Irregular size or shape Irregular placement Low intensity SaturationSpot varianceBackground variancemiss alignmentartifactindistinguishablesaturatedbad print
39 Convert Feature Into Numeric Value Green backgroundGreen b.g.-correctedRed b.g.-corrected(R. b.g.-c)/(G. b.g.-c)Red intensityGreen intensitySystematic nameRed b.g.Gene function
40 Data Normalization Normalize data to correct for variances Dye biasLocation biasIntensity biasPin biasSlide biasControl vs. non-control spots
41 Data Normalization Uncalibrated, red light under detected Calibrated, red and green equally detected
42 Data Normalization Assumptions Overall mean average ratio should be 1 Most genes are not differentially expressedTotal intensity of dyes are equivalent
45 Additional Normalization Pin dependentSimilar to intensity dependent fit.Compute individual lowess fits for each pin groupWithin slide normalizationAfter pin dependent normalization, log ratios for each pin are centered around 0Scale variance for each pinUses MAD (median absolute deviation)
46 Additional Normalization Dye swapCombine relative expression levels without explicit normalizationCompute lowess fit forlog2(RR’/GG’)/2 vs. log2(A + A’)/2Normalized ratio islog2(R/G) - c(A)where c(A) is the lowess prediction
47 Content Biology background of microarray Design of microarray The workflow of microarrayImage analysis of microarrayData analysis of microarrayDiscussion
48 Data analysis Data filtering Fold change analysis Classification ClusteringFuture direction
49 Microarray Data Classification Microarray chipsImages scanned by laserGene ValueD26528_atD26561_cds1_atD26561_cds2_atD26561_cds3_atD26579_atD26598_atD26599_atD26600_atD28114_atDatasetsNewsampleData Miningand analysisPrediction:
50 The Threshold of SpotsFiltering - remove genes with insufficient variationRemove insufficient spot:saturated, None uniform, too high background…Remove extreme signal:e.g. MaxVal - MinVal < 500 and MaxVal/MinVal < 5Statistical filtering (e.g. p-value<0.01)biological reasonsfeature reduction for algorithmic
51 Microarray Data Analysis Types Different gene expressionFold change analysisClassification (Supervised)identify diseasepredict outcome / select best treatmentClustering (Unsupervised)find new biological classes / refine existing onesexploration…
52 Differential Gene Expression n-fold changen typically >= 2May hold no biological relevanceOften too restrictive2 expressionCalculate standard deviation Genes with expression more than 2 away are differentially expressed
55 Classification: Multi-Class Similar Approach:select top genes most correlated to each classselect best subset using cross-validationbuild a single model separating all classesAdvanced:build separate model for each class vs. restchoose model making the strongest prediction
56 Popular Classification Methods Decision Trees/Rulesfind smallest gene sets, but also false positivesNeural Nets -work well if number of genes is reducedSVMgood accuracy, does its own gene selection, hard to understandK-nearest neighbor - robust for small number genesBayesian nets - simple, robust
57 Multi-class Data Example Brain data, Pomeroy et al 2002, Nature (415), Jan 200242 examples, about 7,000 genes, 5 classesSelected top 100 genes most correlated to each classSelected best subset by testing 1,2, …, 20 genes subsets, leave-one-out x-validation for each
58 Classification – Other Applications Combining clinical and genetic dataOutcome / Treatment predictionAge, Sex, stage of disease, are usefule.g. if Data from Male, not Ovarian cancer
59 Clustering Goals Find natural classes in the data Identify new classes / gene correlationsRefine existing taxonomiesSupport biological analysis / discoveryDifferent MethodsHierarchical clustering, SOM's, etc
60 SOM clustering SOM - self organizing maps Preprocessing filter away genes with insufficient biological variationnormalize gene expression (across samples) to mean 0, st. dev 1, for each gene separately.Run SOM for many iterationsPlot the results
62 Hierarchical Clustering The most popular hierarchical clustering method used in microarray data analysis is the so called agglomerative methodworks with the data in a bottom-up manner.Initially, each data point forms a cluster and the algorithm works through the cluster sets by repeatedly merging the two which are the most similar or have the shortest distance.algorithm involves the computation of the distance or similarity matrixO(N^2) complexity and thus is not very efficient.
64 Future directionsAlgorithms optimized for small samples (the no. of samples will remain small for many tasks)Integration with other databiological networksmedical textprotein datacost-sensitive classification algorithmserror cost depends on outcome (don’t want to miss treatable cancer), treatment side effects, etc.
65 Integrate biological knowledge when analyzing microarray data (from Cheng Li, Harvard SPH) Right picture: Gene Ontology: tool for the unification of biology, Nature Genetics, 25, p25
66 Content Biology background of microarray Design of microarray The workflow of microarrayImage analysis of microarrayData analysis of microarrayDiscussion
67 Microarray Potential Applications Biological discoverynew and better molecular diagnosticsnew molecular targets for therapyfinding and refining biological pathwaysMutation and polymorphism detectionRecent examplesmolecular diagnosis of leukemia, breast cancer, ...appropriate treatment for genetic signaturepotential new drug targets
68 Microarray Limitations Cross-hybridization of sequences with high identityChip to chip variationTrue measure of abundance?Does mRNA levels reflect protein levels?Generally, do not “prove” new biology - simply suggest genes involved in a process, a hypothesis that will require traditional experimental verification.What fold change has biological relevance?Need cloned EST or some sequence knowledge -- rare messages may be undetectedExpensive!! Not every lab can afford experiment repeat.The real limitation is Bioinformatics
69 Additional Information Review papers on microarrayGenomics, gene expression and DNA arrays (Nature, June 2000)Microarray - technology review (Natural Cell Biology, Aug. 2001)Magic of Microarray (Scientific American, Feb. 2002)Molecular biology tutorial
70 Biological data retrieval systems: Entrez http://www. ncbi. nlm. nih A retrieval system for searching a number of inter-connected databases at the NCBI. It provides access to:PubMed: The biomedical literature (Medline)Genbank: Nucleotide sequence databaseProtein sequence databaseStructure: three-dimensional macromolecular structuresGenome: complete genome assembliesPopSet: population study data setsOMIM: Online Mendelian Inheritance in ManTaxonomy: organisms in GenBankBooks: online booksProbeSet: gene expression and microarray datasets3D Domains: domains from Entrez StructureUniSTS: markers and mapping dataSNP: single nucleotide polymorphismsCDD: conserved domains2. Entrez allows users to perform various searches.