Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image Analysis Class web site: Statistics for Microarrays.

Similar presentations


Presentation on theme: "Image Analysis Class web site: Statistics for Microarrays."— Presentation transcript:

1 Image Analysis Class web site: http://statwww.epfl.ch/davison/teaching/Microarrays/ETHZ/ Statistics for Microarrays

2 Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination R, G 16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg)

3 Microarray Experiment

4 Scanner Laser PMT Dye Glass Slide Objective Lens Detector lens Pinhole Beam-splitter

5 Scanner Process Dye Photons ElectronsSignal LaserPMT A/D Convertor excitation amplification Filtering Time-space averaging

6 Yeast genome on a chip

7 Quantification of Expression For each spot on the slide, calculate Red intensity = Rfg - Rbg (fg = foreground, bg = background) and Green intensity = Gfg - Gbg and combine them in the log (base 2) ratio Log 2 (Red/Green)

8 Practical Problems 1 Comet Tails Likely caused by insufficiently rapid immersion of the slides in the succinic anhydride blocking solution

9 Practical Problems 2 Blotches, unequal spot sizes, overlapping spots

10 Practical Problems 3 High Background 2 likely causes: –Insufficient blocking –Precipitation of the labeled probe Weak Signals

11 Practical Problems 4 Spot overlap Likely cause: Too much rehydration during post - processing.

12 Practical Problems 5 Dust

13 Images from scanner Resolution –standard 10  m [currently, max 5  m] –100  m spot on chip = 10 pixels in diameter Image format –TIFF (tagged image file format) 16 bit (65,536 levels of gray) –1cm x 1cm image at 16 bit = 2Mb (uncompressed) –other formats exist e.g. SCN (used at Stanford) Separate image for each fluorescent sample –channel 1, channel 2, etc.

14 Images in analysis software The two 16-bit images (Cy3, Cy5) are compressed into 8-bit images Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image RGB image : –Blue values (B) are set to 0 –Red values (R) are used for Cy5 intensities –Green values (G) are used for Cy3 intensities Qualitative representation of results

15 Images : examples Cy3 Cy5 Spot colorSignal strengthGene expression yellowControl = Treatedunchanged redControl < Treatedinduced greenControl > Treatedrepressed Pseudo-color overlay

16 Steps in Images Processing Addressing (or Gridding) –Assigning coordinates to each spot Segmentation –Classification of pixels as either foreground (signal) or background Information Extraction –Foreground fluorescence intensity pairs (R, G) –Background intensities –Quality measures

17 Addressing This is the process of assigning coordinates to each of the spots. Automating this part of the procedure permits high throughput analysis. 4 by 4 grids 19 by 21 spots per grid

18 Addressing 4 by 4 grids Within the same batch of print runs; estimate translation of grids Other problems: — Misregistration — Rotation — Skew in the array

19 Addressing — Registration

20 Problems in automatic addressing Misregistration of the red and green channels Rotation of the array in the image Skew in the array Rotation

21 ScanAlyze Parameters to address spot positions –Separation between rows and columns of grids –Individual translation of grids –Separation between rows and columns of spots within each grid –Small individual translation of spots –Overall position of the array in the image Addressing (I) Basic structure of images known (determined by the arrayer)

22 Addressing (II) The measurement process depends on the addressing procedure Addressing accuracy can be enhanced by allowing user intervention (at the cost of time) Most software systems now provide for both manual and automatic gridding procedures

23 Segmentation Classification of pixels as foreground or background  fluorescence intensities are calculated for each spot as measure of transcript abundance Production of a spot mask : set of foreground pixels for each spot

24 Segmentation Methods Fixed circles Adaptive circles Adaptive shape –Edge detection –Seeded Region Growing (R. Adams and L. Bishof (1994): Regions grow outwards from seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region Histogram methods

25 Segmentation Methods in some programs Fixed circle ScanAlyze, GenePix, QuantArray Adaptive circle GenePix, Dapple, SignalViewer (uses ellipse) Adaptive shape Spot, region growing and watershed HistogramImaGene, QuantArray, DeArray and adaptive thresholding

26 Fixed circle segmentation Fits a circle with a constant diameter to all spots in the image Easy to implement The spots should be of the same shape and size May not be good for this example

27 Adaptive circle segmentation The circle diameter is estimated separately for each spot Dapple finds spots by detecting edges of spots (second derivative) Problematic if spot exhibits oval shapes

28 Limitation of circular segmentation —Small spot —Not circular Results from SRG

29 Limitation of fixed circles SRGFixed Circle

30 Adaptive shape segmentation Specification of starting points or seeds Bonus: already know geometry of array Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region

31 Seeds

32 Histogram segmentation Choose target mask larger than any spot Fg and bg intensities determined from the histogram of pixel values for pixels within the masked area Example : QuantArray –Background : mean between 5th and 20th percentile –Foreground : mean between 80th and 95th percentile May not work well when a large target mask is set to compensate for variation in spot size BkgdForeground

33 Information Extraction Spot Intensities –mean of pixel intensities –median of pixel intensities –Pixel variation (e.g. IQR) Background values –None –Local –Constant (global) –Morphological opening Quality Information Take the average

34 Background intensity The measured fluorescence intensity includes a contribution of non-specific hybridization and other chemicals on the glass Fluorescence from regions not occupied by DNA should be different from regions occupied by DNA  one solution is to use local negative controls (spotted DNA that should not hybridize)

35 BG: None Do not consider the background –Probably not accurate in many cases, but may be better than some forms of local background determination

36 BG: Local Focusing on small regions surrounding the spot mask Median of pixel values in this region Most software package implement such an approach ImaGeneSpot, GenePixScanAlyze By not considering the pixels immediately surrounding the spots, the background estimate is less sensitive to the performance of the segmentation procedure

37 BG: Constant Global method which subtracts a constant background for all spots Some evidence that the binding of fluorescent dyes to ‘negative control spots’ is lower than the binding to the glass slide  More meaningful to estimate background based on a set of negative control spots –If no negative control spots : approximation of the average background = third percentile of all the spot foreground values

38 BG: Morphological opening Non-linear filtering, used in Spot Use a square structuring element with side length at least twice as large as the spot separation distance Compute local minimum filter, then compute local maximum filter –This removes all spots and generates an image that is an estimate of the background for the entire slide For individual spots, the background is estimated by sampling this background image at the nominal center of the spot Lower, less variable bg estimate

39 Background matters From Spot From GenePix

40 Quality Measurements Array –Correlation between spot intensities –Percentage of spots with no signals –Distribution of spot signal area Spot –Signal / Noise ratio –Variation in pixel intensities –Identification of “bad spot” (spots with no signal) Ratio (2 spots combined) –Circularity Flag or weight spots based on these (or other appropriate) criteria

41 Quality of Array Distribution of areas - Judge by eye - Look at variation. (e.g. SD) Cy3 area mean 57 median 56 SD 20.67 Cy5 area mean 59 median 57 SD 24.34

42 Summary The choice of background correction method often has a larger impact on the log- intensity ratios than the segmentation method used The morphological opening method provides a better estimate of background than other methods –Low within- and between-slide variability of the log 2 R/G Background adjustment has a larger impact on low intensity spots Spot, GenePix ScanAlyze M = log 2 R/G A = log 2 √(RG)

43 Selected references Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2001), ‘Comparisons of methods for image analysis on cDNA microarray data’. Technical report #584, Department of Statistics, University of California, Berkeley. http://www.stat.berkeley.edu/users/terry/zarray/Html/papersindex.html Yang, Y. H., Buckley, M. J. and Speed, T. P. (2001), ‘Analysis of cDNA microarray images’. Briefings in bioinformatics, 2 (4), 341-349. Excellent review in concise format!

44 Acknowledgments Terry Speed Michael Buckley Sandrine Dudoit Natalie Roberts Ben Bolstad Brian Stevenson CSIRO Image Analysis Group Ryan Lagerstorm Richard Beare Hugues Talbot Kevin Cheong Matt Callow (LBL) Percy Luu (USB) Dave Lin (USB) Vivian Pang (USB) Elva Diaz (USB)


Download ppt "Image Analysis Class web site: Statistics for Microarrays."

Similar presentations


Ads by Google