Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics

Similar presentations


Presentation on theme: "Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics"— Presentation transcript:

1 Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu http://biosun01.biostat.jhsph.edu/~ririzarr

2 Outline Scientific questions Review of technology Role of statistics Two case studies

3 Scientific Questions Expression Differential expression Expression patterns “ To understand gene function, it is helpful to know when and where it is expressed and…” “…under what circumstances the expression level is affected.” “… questions concerning functional pathways and how cellular components work together to regulate and carry out cellular processes.” Lipshutz et al. (1999) Nature genetics, 21, pp. 20-21

4 What do Microarrays do? Interrogate labeled nucleic acid samples model systems, microdissections, cell lines, human tissue bank kanR UPTAG DOWNTAG RNA samples Oligonucleotide barcodes

5 How do they do it? Probes Labeled targets

6 cDNA clones (probes) PCR product amplification purification printing microarray Hybridize target to microarray mRNA target excitation laser 1 laser 2 emission scanning analysis 0.1nl/spot overlay image and normalize cDNA Arrays

7 High Density Oligonucleotide Arrays 24µm Millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Image of Hybridized Probe Array >200,000 different complementary probes Single stranded, labeled RNA target Oligonucleotide probe * * * * *1.28cm GeneChip Probe Array Hybridized Probe Cell Compliments of D. Gerhold

8 Role of Statistics

9 Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination Quantify Expression

10 Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale

11 Does one size fit all?

12 Segmentation: limitation of the fixed circle method SRGFixed Circle Inside the boundary is spot (fg), outside is not.

13 Some local backgrounds We use something different again: a smaller, less variable value. Single channel grey scale

14 Quantification of Expression For each spot on the slide we calculate Red intensity = Rfg – Rbg fg = foreground, bg = background, and Green intensity = Gfg – Gbg and combine them in the log (base 2) ratio Log 2 ( Red intensity / Green intensity) we now have one differential expression for each gene for each array

15

16 Top 2.5%of ratios red, bottom 2.5% of ratios green The red-green ratios can be spatially biased

17 Another example

18 Oligo Array Image Analysis About 100 pixels per probe cell These intensities are combined to form one number representing expression for the probe cell oligo

19 Normalization at Probe Level

20

21 Dilution Experiment Data

22

23 PM MM

24 Default until 2002 GeneChip ® software uses Avg.diff with A a set of “suitable” pairs chosen by software. Log ratio version is also used. For differential expression Avg.diffs are compared between chips.

25 What is the evidence? Lockhart et. al. Nature Biotechnology 14 (1996)

26 Two case studies

27 Spike-In Experiments Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips. Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

28 Set A: Probe Level Data (12 chips)

29 Spike-In B Probe SetConc 1Conc 2Rank BioB-51000.51 BioB-30.525.02 BioC-52.075.03 BioB-M1.035.74 BioDn-31.550.05 DapX-335.73.06 CreX-350.05.07 CreX-512.52.08 BioC-325.01009 DapX-55.01.510 DapX-M3.01.011 Later we consider 23 different combinations of concentrations

30 Observed Ranks GeneAvDiffMAS 5.0Li&WongAvLog(PM-BG) BioB-562771 BioB-3161332 BioC-5746225 BioB-M30363 BioDn-3445274 DapX-3239247967 CreX-33337338611 CreX-5327633439 BioC-3270985721210300 DapX-527091025917 DapX-M16519306

31

32 kanR A Transformation into deletion pool Select for Ura + transformants Genomic DNA preparation Circular pRS416 PCR Cy5 labeled PCR productsCy3 labeled PCR products Oligonucleotide array hybridization B EcoRI linearized PRS416 NHEJ Defective MCS CEN/ARS URA3 ttaa aatt CEN/ARS URA3 UPTAG DOWNTAG

33 .

34 Average Red and Green Scatter Plot

35 Average Red and Green MVA plot

36 Histograms

37 QQ-Plot

38 Z-Scores

39 Average Red and Green MVA Plot

40 Average Red and Green Scatter Plot

41 Summary Simple data exploration useful tool for quality assessment Statistical thinking helpful for interpretation Statistical models may help find signals in noise

42 Acknowledgements UC Berkeley Stat Ben Bolstad Sandrine Dudoit Terry Speed Jean Yang MBG (SOM) Jef Boeke Siew-Loon Ooi Marina Lee Forrest Spencer Biostatistics Karl Broman Leslie Cope Carlo Coulantoni Giovanni Parmigiani Scott Zeger Gene Logic Francois Colin Uwe Scherf’s Group PGA Tom Cappola Skip Garcia Joshua Hare WEHI Bridget Hobbs Natalie Thorne


Download ppt "Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics"

Similar presentations


Ads by Google