Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting Started with CellProfiler Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA.

Similar presentations

Presentation on theme: "Getting Started with CellProfiler Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA."— Presentation transcript:

1 Getting Started with CellProfiler Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA

2 2 Software Overview Available from Free, open source (Python) Software available for Windows, Mac and Linux Image Analysis & Quantification Image-centric Data Analysis

3 3 CellProfiler: Overview ProcessProcess large sets of images Identifies and measuresIdentifies and measures objects ExportExport data for further analysis Goal: Provide powerful image analysis methods with a user-friendly interface Philosophy: Measure everything, ask questions later... Support data analysis based on individual cells

4 4 CellProfiler: Basic Concepts Example pipeline –Load –Load images –Identify –Identify primary, secondary and tertiary objects –Measure –Measure object features –Export –Export measurement and image data Glossary –Pipeline: Steps of image processing in CellProfiler –Module: One step of a pipeline –Primary Objects: Key objects used to identify cells –Secondary Objects: Other parts of cells, attached to primary objects

5 5 Typical CellProfiler Pipeline Workflow For image-based assays, the basic objective is always to –Identify cells/organisms –Measure feature(s) of interest The uniqueness of each assay comes in –Deciding what compartments to identify and how to identify them –Determining which measure(s) are most useful to identify interesting samples

6 6 Typical CellProfiler Pipeline Workflow 1.Identify anomalies in the images (for example, uneven illumination of the sample or autofocus errors) and correct when possible 2.Identify individual nuclei in each image (using images of a DNA stain) 3.Identify cell boundaries in each image (using images of a cellular stain, if available) 4.Identify subcellular organelles (using images of the organelle/structure stain) 5.Identify cellular subcompartments as desired (for example, cytoplasm and nuclear or plasma membrane) 6.Count the cells, measure the size and shape of each compartment or organelle, and measure the intensity and texture of each type of staining in each subcompartment 7.Exported measurements to a database and are then available for downstream analysis using various computational tools

7 7 The CellProfiler Interface Pipeline panel: Displays modules in pipeline –Modules executed in order from top to bottom Change module position Add or remove modules Module help

8 8 Load pipeline by double-clicking on it View images by double-clicking on the filename The CellProfiler Interface File panel: Displays files in default image folder

9 9 The CellProfiler Interface The figure window has additional menu options Toolbar menu: Pan, zoom in/out CellProfiler Image Tools –Image Tool (also displayed by clicking on image) –Interactive zoom –Show pixel data (location, intensity)

10 10 The CellProfiler Interface Folder panel: Change default input and output directories –Usually these should be separate folders Input folder: Contains images to be analyzed Output folder: Contains the output file plus exported data and images

11 11 The CellProfiler Interface Settings panel: View and change settings for each module –Clicking on a different module updates the settings view

12 12 Module Categories File processing: Image input, file output Image processing: Often used for pre-processing prior to object identification Object processing: Identification, modification of objects of interest Measurement: Collection of measurements from objects of interest Data Tools: Measurement exploration, measurement output

13 13 The First Module: LoadImages Related how? Depending on the imaging device, one file may represent –One channel at one imaging location –Multiple channels at one imaging location –Multiple channels at multiple locations –Etc… Loads an image set which is a group of related images, in preparation for further processing DNAGFP

14 14 The First Module: LoadImages Can use text matching to define the difference between images in a set All images stained for GFP have the text Channel1- in the name Same for DNA images (Channel2-) Assign each image a meaningful name name for downstream reference

15 15 Loading Images Into CellProfiler Standard image formats –TIFF, GIF, PNG, JPG... –Cellomics.DIB,.c01 –Movies and image sequences: AVI, STK, multiframe TIFF, FLEX A few tips –Use LoadImages to load any number of channels, even in subdirectories –The total number of files for each channel must match –Many microscopes produce 12-bit images in a 16-bit format RescaleIntensity –Handle color images by splitting them into RGB components ColorToGray

16 16 1616 What Is An Image? Images from Carolina Wahlby

17 17 Object Identification Once the images are loaded, how do you find objects of interest? Step 1: Distinguish the foreground from the background by picking a good threshold Step 2: Identify objects as regions brighter than the threshold Step 3: Cut and join objects to improve their shape

18 18 Primary Object Identification Many options for thresholding, cut and join methods, etc.

19 19Thresholding Definition: Division of the image into background and foreground Method: Pick the method that provides the best results –Otsu: Default - Good for readily identifiable foreground / background –Background, RobustBackground: Good for images in which most of the image is comprised of background What is the best threshold value for dividing the intensity histogram into foreground and background pixels… Here? Or here? Pixel values Frequency

20 20Thresholding Correction factor –Multiplication factor applied to threshold –Adjusts threshold stringency/leniency –Setting this factor is empirical Upper/lower bounds –Set safety limits on automatic threshold to guards against false positives –Helpful for unexpected images: Empty wells, images with dramatic artifacts, etc

21 21 Object Separation Once the foreground objects have been identified, we need to distinguish multiple objects contained in the same clump Images from Carolina Wahlby

22 22 Object Separation Two step process in de-clumping 1.Identification of the objects in a clump 2.Drawing boundaries between the clumped objects Adjust settings to de-clump objects

23 23 Object Separation –Intensity: Works best if objects are brighter at center, dimmer at edges –Shape: Works best if objects have indentations where clumps touch (esp. if objects are round) Peaks 2 1 2 Indentations Clump identification: Two options 1 1

24 24 Object Separation –Distance: Draws boundary lines midway between object centers –Intensity: Draws boundary lines at dimmest line between objects Test mode allows users to view results of all setting combinations Drawing boundaries: Two options 1

25 25 Object Separation Additional separation settings: Adjust these settings if objects are being incorrectly split into pieces or merged together Original imageSmoothing filter size = 4 Smoothing filter size = 8 Smoothing: Increase to reduce intensity irregularities which produce over-segmentation of objects

26 26 Object Separation Suppress Local Maxima –Smallest distance allowed between object intensity peaks to be considered one object rather than a clump –Decrease to reduce improper merging of objects in clumps Original imageMaxima distance = 4 Maxima distance = 8 Maxima

27 27 Object Separation Adjusting these parameters can produce more improper segmentation than it solves The proper settings are usually a matter of trial and error –The automatic settings are a good starting point, though However…. Original image Smoothing filter size = 4 Smoothing filter size = 8

28 28 Filtering Invalid Objects See FilterObjects module for more advanced filtering options Discard objects that fail size criterion or touch the image border

29 29 Primary Object Identification Colors used to label each segmented object –Shows if each object has been identified and separated properly Outlines highlight valid objects –Green: Valid –Yellow: Invalid – Touching border –Red: Invalid – Size criterion Gives object count as a measurement

30 30 Secondary Object Identification Goal: Identify individual cell boundaries by growing primary objects using a staining channel –Nuclei typically more uniform in shape, more easily separated than cells Segment nuclei first, then use segmented nuclei to start cell segmentation

31 31 Secondary Object Identification Methods –Distance-N: Ignores image information Useful in cases where no cell stain is present –Watershed, propagate, Distance-B: Uses image information Finds dividing lines between objects and background / neighbors Test mode allows user to view results of all methods Propagation Distance-N

32 32 Secondary Object Identification Regularization: Controls the precise dividing line between cells that touch each other –Performed by balancing between intensity and distance –Usually not adjusted Correction factor, lower/upper bounds on threshold: Same purpose as in IdentifyPrimaryObjects Regularization = 0 Regularization =

33 33 Tertiary Object Identification Goal: Identify tertiary objects by removing the primary objects from secondary objects –Subtract the nuclei objects from cell objects to obtain cytoplasm CellsNucleiCytoplasm

34 34 Measurement Modules: Object Morphology Select the objects to measure

35 35 Module: MeasureObjectAreaShape Goal: Measure morphological features such as –Area –Perimeter –Eccentricity –MajorAxisLength –MinorAxisLength –Orientation –FormFactor: Compactness measure, circle = 1, line = 0

36 36 Measurement Modules: Object Intensity Select the image to measure from Select the objects to measure

37 37 Module: MeasureObjectIntensity Goal: Measure object intensity features such as –Integrated intensity: Sum of the pixel intensities within an object –Mean, median, standard deviation intensities –Maximal and minimal pixel intensities –Lower/Upper quartile The object intensity may be obtained from any image, not just the image used to identify the object –Example: Ph3 intensity may be measured using the nuclei objects

38 38 Measurement Modules: Object Texture Select the image to measure from Select the objects to measure Select the spatial scale

39 39MeasureObjectTexture Goal: Determine whether the staining pattern is smooth on a particular scale Selection of the appropriate texture scale is essentially empirical –A higher number measures larger patterns of texture –Smaller numbers measure more localized (finer) patterns of texture Can also add several texture modules to the pipeline, each measuring a different texture scale

40 40 Other Measurement Modules CalculateMath: Arithmetic operations for measurements CalculateStatistics: Assay quality (V and Z' factors) and dose response data (EC50) for all measurements Image-based measures –MeasureImageAreaOccupied –MeasureImageGranularity –MessureImageIntensity Object-based measures –MeasureCorrelation –MeasureObjectNeighbors –MeasureRadialDistribution

41 41 Data Export Modules User may output images or image measurements Select the objects to export

42 42 Measurement Display The average measurements for all objects in the image are displayed in the figure window However, the individual measurements for each object are stored in the output file

43 43 Data Export Modules Goal: Retain images of intermediate image processing steps for quality control or save measurements for later analysis and exploration SaveImages: Writes an image to a file –Intermediate images in the pipeline are not saved unless requested –Choice of many image formats to write module can be used as an image format converter ExportToSpreadsheet: Export measurements as a comma-separated file readable by spreadsheet programs ExportToDatabase: Export measurements as a per- object and per-table plus configuration file for upload to a MySQL database

44 44 Measurement Export Some types of analysis can be performed with built-in data tools –In these cases, the raw output file is used –Data need not be exported via a module in the pipeline If you wish to use CellProfiler Analyst for data exploration and phenotype classification, databases are preferred, although local.csv files may also be used For screens larger than you want to analyze in CellProfiler or a spreadsheet, it is preferable to export the data to a database

45 45 Illumination Correction The physical limitations of any microscope produce nonuniformities in the optical path of the sample, microscope, and/or camera Example: Tiling raw images shows that there is uneven illumination from left to right in each image –This heterogeneity can lead to inaccurate intensity measurements –A cell located at (a) is brighter than one at (b) even if the cells have the same amount of fluorescent material (a) (b) Carpenter et al, Genome Biology 2006, 7:R100

46 46 Illumination Correction Illumination correction ensures that object segmentation and measurements (e.g. DNA content) are more accurate Carpenter et al, Genome Biology 2006, 7:R100

47 47 Illumination Correction Two modules –Correct Illumination Calculate: Creates a illumination correction function –Correct Illumination Apply: Applies the function to your images Available options –Correct each image individually, or all images together as an ensemble? –Calculate the illumination function by using foreground pixels or background pixels? –Apply the function using division or subtraction? Additional considerations –Create a new illumination correction function if you image on a different microscope or change plates –Correct each channel since absolute illumination intensities may differ between channels –First, create and save the function from image set, then load and apply it prior to identification

48 48 Cluster Computing If processing time is too great on a single computer, then run the pipeline on a cluster –Download and install CellProfiler on a computing cluster –Add the ExportToDatabase module –Add the CreateBatchFiles module to the end of the pipeline and configure it appropriately –Run the first image cycle locally –Submit the batches to your cluster for processing –Check the progress of processing For really big screens, it is necessary to process images in batches on a computing cluster.

49 49 Data Analysis At the end of a pipeline, you may have 500+ features per cell –Size, shape, staining intensity, texture (smoothness), etc Remember our Philosophy: Measure everything, ask questions later...

50 50 Data Analysis What does this data set look like? Cytological profile, or Cytoprofile Shows all the measurements acquired –For each individual cell –In every image –In the entire experiment. +1 0 Cell #6111617 -.2.7 -.1 0.2 -.9

51 51 CellProfiler Analyst: Overview ExploreExplore data large sets of images IdentifyIdentify interesting subpopulations and see the original images IdentifyIdentify interesting phenotypes automatically Goal: Provide the user with a powerful suite of image exploration and machine learning methods

52 52 The CellProfiler Analyst Interface CellProfiler Analyst (CPA) allows you to explore the data with a variety of tools Upon startup, CPA request a properties file which contains –Locations of the measurement tables –How the images are referenced –Other assorted information

53 53 Plate Viewer Displays data in plate layout –96- or 384-well format –Measurements are shown as color-coded wells or mouse tool-tips –Right-clicking on well reveals list of images to display

54 54 Image Viewer Displays an image referenced by number Color display –Colors are assigned to each channel of image data –Shown as a merged color image –Toggle channel visibility and color scaling

55 55 Plotting Tools Various plotting tools allow user to explore and sift through the measurements and make discoveries

56 56 Data Analysis Why make so many measurements? –For many screens, only a few measurements are necessary to obtain the phenotype X-axis: DNA content Y-axis: phospho-H3 staining

57 57 Data Analysis Unfortunately, for other phenotypes, the proper features are not so simple to find… Wild-type HT29 cells Cells on the move Crescent-shaped nuclei Peas in a pod Crooked projections Actin dots at junctions Long projections Hyphae-like projections

58 58 Data Analysis Concentrating on single cells allows us to avoid problems of heterogeneous populations, and to detect rare events (such as mitosis) However, determining which combinations of features and values are appropriate for a phenotype is tedious and impractical We have included a machine learning classification tool to automatically chose the features and values require to score a rare or subtle phenotype

59 59 Automated Cell Image Processing Cytoprofile of 500+ features measured for each cell 10 4 images, 10 3 cells in each: Total of 10 7 cells/experiment Thousands of wells Each cell with cytoprofile

60 60 Iterative Machine Learning System presents ~500 cells to biologists for scoring System defines rule based on cytoprofile of scored cells Yes Rule Iteration No

61 61 Iterative Machine Learning Scored cells are sorted by well: Identify samples with a high proportion of positive cells Scored 10 7 cells Rule

62 62 Final Notes Where to get help –Access help from the CellProfiler main window –Ask for help on the forum

63 63 Image assay development Apply image analysis methods to biological questions Mark Bray Anne Carpenter David Logan Algorithm development & software engineering Develop & test new image analysis and data mining methods and create open-source software tools IT/Administration Peggy (Margaret) Anthony Kate Madden Ray Jones Vebjørn Ljoså Auguste Genovesio (begins 2010) Adam Fraser Carolina Wählby The Team Lee Kamentsky Director

Download ppt "Getting Started with CellProfiler Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA."

Similar presentations

Ads by Google