Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting Started with CellProfiler

Similar presentations


Presentation on theme: "Getting Started with CellProfiler"— Presentation transcript:

1 Getting Started with CellProfiler
Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA

2 Software Overview Available from www.cellprofiler.org
Image Analysis & Quantification Image-centric Data Analysis Available from Free, open source (Python) Software available for Windows, Mac and Linux

3 CellProfiler: Overview
Process large sets of images Identifies and measures objects Export data for further analysis Goal: Provide powerful image analysis methods with a user-friendly interface Philosophy: Measure everything, ask questions later... Support data analysis based on individual cells

4 CellProfiler: Basic Concepts
Example pipeline Load images Identify primary, secondary and tertiary objects Measure object features Export measurement and image data Glossary Pipeline: Steps of image processing in CellProfiler Module: One step of a pipeline Primary Objects: Key objects used to identify cells Secondary Objects: Other parts of cells, attached to primary objects

5 Typical CellProfiler Pipeline Workflow
For image-based assays, the basic objective is always to Identify cells/organisms Measure feature(s) of interest The uniqueness of each assay comes in Deciding what compartments to identify and how to identify them Determining which measure(s) are most useful to identify interesting samples

6 Typical CellProfiler Pipeline Workflow
Identify anomalies in the images (for example, uneven illumination of the sample or autofocus errors) and correct when possible Identify individual nuclei in each image (using images of a DNA stain) Identify cell boundaries in each image (using images of a cellular stain, if available) Identify subcellular organelles (using images of the organelle/structure stain) Identify cellular subcompartments as desired (for example, cytoplasm and nuclear or plasma membrane) Count the cells, measure the size and shape of each compartment or organelle, and measure the intensity and texture of each type of staining in each subcompartment Exported measurements to a database and are then available for downstream analysis using various computational tools

7 The CellProfiler Interface
Module help Add or remove modules Change module position Pipeline panel: Displays modules in pipeline Modules executed in order from top to bottom

8 The CellProfiler Interface
Load pipeline by double-clicking on it View images by double-clicking on the filename File panel: Displays files in default image folder

9 The CellProfiler Interface
The figure window has additional menu options Toolbar menu: Pan, zoom in/out CellProfiler Image Tools Image Tool (also displayed by clicking on image) Interactive zoom Show pixel data (location, intensity)

10 The CellProfiler Interface
Input folder: Contains images to be analyzed Output folder: Contains the output file plus exported data and images Folder panel: Change default input and output directories Usually these should be separate folders

11 The CellProfiler Interface
Settings panel: View and change settings for each module Clicking on a different module updates the settings view

12 Module Categories File processing: Image input, file output
Image processing: Often used for pre-processing prior to object identification Object processing: Identification, modification of objects of interest Measurement: Collection of measurements from objects of interest Data Tools: Measurement exploration, measurement output

13 The First Module: LoadImages
Loads an “image set” which is a group of related images, in preparation for further processing DNA GFP Related how? Depending on the imaging device, one file may represent One channel at one imaging location Multiple channels at one imaging location Multiple channels at multiple locations Etc…

14 The First Module: LoadImages
Can use text matching to define the difference between images in a set All images stained for GFP have the text Channel1- in the name Assign each image a meaningful name name for downstream reference Same for DNA images (Channel2-)

15 Loading Images Into CellProfiler
Standard image formats TIFF, GIF, PNG, JPG... Cellomics .DIB, .c01 Movies and image sequences: AVI, STK, multiframe TIFF, FLEX A few tips Use LoadImages to load any number of channels, even in subdirectories The total number of files for each channel must match Many microscopes produce 12-bit images in a 16-bit format → RescaleIntensity Handle color images by splitting them into RGB components → ColorToGray

16 What Is An “Image”? Images from Carolina Wahlby

17 Object Identification
Once the images are loaded, how do you find objects of interest? Step 1: Distinguish the foreground from the background by picking a good threshold Step 2: Identify objects as regions brighter than the threshold Step 3: Cut and join objects to “improve” their shape

18 Primary Object Identification
Many options for thresholding, cut and join methods, etc.

19 Thresholding Definition: Division of the image into background and foreground Here? What is the best threshold value for dividing the intensity histogram into foreground and background pixels… Frequency Or here? Pixel values Method: Pick the method that provides the best results Otsu: Default - Good for readily identifiable foreground / background Background, RobustBackground: Good for images in which most of the image is comprised of background

20 Thresholding Correction factor Upper/lower bounds
Multiplication factor applied to threshold Adjusts threshold stringency/leniency Setting this factor is empirical Upper/lower bounds Set safety limits on automatic threshold to guards against false positives Helpful for unexpected images: Empty wells, images with dramatic artifacts, etc

21 Object Separation • • • • • • • •
Once the foreground objects have been identified, we need to distinguish multiple objects contained in the same “clump” Images from Carolina Wahlby

22 Object Separation Two step process in “de-clumping”
Adjust settings to “de-clump” objects Two step process in “de-clumping” Identification of the objects in a clump Drawing boundaries between the clumped objects

23 Object Separation • Clump identification: Two options
Intensity: Works best if objects are brighter at center, dimmer at edges Shape: Works best if objects have indentations where clumps touch (esp. if objects are round) Clump identification: Two options Peaks 1 1 2 Indentations 1 2

24 Object Separation • Drawing boundaries: Two options
Distance: Draws boundary lines midway between object centers Intensity: Draws boundary lines at dimmest line between objects 1 Test mode allows users to view results of all setting combinations

25 Object Separation Additional separation settings: Adjust these settings if objects are being incorrectly split into pieces or merged together Original image Smoothing filter size = 4 Smoothing filter size = 8 Smoothing: Increase to reduce intensity irregularities which produce over-segmentation of objects

26 Object Separation Suppress Local Maxima
Original image Maxima distance = 4 Maxima distance = 8 Suppress Local Maxima Smallest distance allowed between object intensity peaks to be considered one object rather than a clump Decrease to reduce improper merging of objects in clumps

27 Object Separation However….
Original image Smoothing filter size = 4 Smoothing filter size = 8 Adjusting these parameters can produce more improper segmentation than it solves The proper settings are usually a matter of trial and error The automatic settings are a good starting point, though

28 Filtering Invalid Objects
Discard objects that fail size criterion or touch the image border Measurements may not be meaningful for partial objects See FilterObjects module for more advanced filtering options

29 Primary Object Identification
Colors used to label each segmented object Shows if each object has been identified and separated properly Outlines highlight valid objects Green: Valid Yellow: Invalid – Touching border Red: Invalid – Size criterion Gives object count as a measurement

30 Secondary Object Identification
Goal: Identify individual cell boundaries by “growing” primary objects using a staining channel Nuclei typically more uniform in shape, more easily separated than cells Segment nuclei first, then use segmented nuclei to start cell segmentation

31 Secondary Object Identification
Methods Distance-N: Ignores image information Useful in cases where no cell stain is present Watershed, propagate, Distance-B: Uses image information Finds dividing lines between objects and background / neighbors Test mode allows user to view results of all methods Distance-N Propagation

32 Secondary Object Identification
Regularization: Controls the precise dividing line between cells that touch each other Performed by balancing between intensity and distance Usually not adjusted Regularization = ∞ Regularization = 0 Correction factor, lower/upper bounds on threshold: Same purpose as in IdentifyPrimaryObjects

33 Tertiary Object Identification
Goal: Identify tertiary objects by removing the primary objects from secondary objects “Subtract” the nuclei objects from cell objects to obtain cytoplasm Cells Nuclei Cytoplasm

34 Measurement Modules: Object Morphology
Select the objects to measure

35 Module: MeasureObjectAreaShape
Goal: Measure morphological features such as Area Perimeter Eccentricity MajorAxisLength MinorAxisLength Orientation FormFactor: Compactness measure, circle = 1, line = 0

36 Measurement Modules: Object Intensity
Select the image to measure from Select the objects to measure

37 Module: MeasureObjectIntensity
Goal: Measure object intensity features such as Integrated intensity: Sum of the pixel intensities within an object Mean, median, standard deviation intensities Maximal and minimal pixel intensities Lower/Upper quartile The object intensity may be obtained from any image, not just the image used to identify the object Example: Ph3 intensity may be measured using the nuclei objects

38 Measurement Modules: Object Texture
Select the image to measure from Select the objects to measure Select the spatial scale

39 MeasureObjectTexture
Goal: Determine whether the staining pattern is smooth on a particular scale Selection of the appropriate texture scale is essentially empirical A higher number measures larger patterns of texture Smaller numbers measure more localized (finer) patterns of texture Can also add several texture modules to the pipeline, each measuring a different texture scale

40 Other Measurement Modules
CalculateMath: Arithmetic operations for measurements CalculateStatistics: Assay quality (V and Z' factors) and dose response data (EC50) for all measurements Image-based measures MeasureImageAreaOccupied MeasureImageGranularity MessureImageIntensity Object-based measures MeasureCorrelation MeasureObjectNeighbors MeasureRadialDistribution

41 Data Export Modules User may output images or image measurements
Select the objects to export User may output images or image measurements

42 Measurement Display The average measurements for all objects in the image are displayed in the figure window However, the individual measurements for each object are stored in the output file

43 Data Export Modules Goal: Retain images of intermediate image processing steps for quality control or save measurements for later analysis and exploration SaveImages: Writes an image to a file Intermediate images in the pipeline are not saved unless requested Choice of many image formats to write → module can be used as an image format converter ExportToSpreadsheet: Export measurements as a comma-separated file readable by spreadsheet programs ExportToDatabase: Export measurements as a per-object and per-table plus configuration file for upload to a MySQL database

44 Measurement Export Some types of analysis can be performed with built-in data tools In these cases, the raw output file is used Data need not be exported via a module in the pipeline For screens larger than you want to analyze in CellProfiler or a spreadsheet, it is preferable to export the data to a database If you wish to use CellProfiler Analyst for data exploration and phenotype classification, databases are preferred, although local .csv files may also be used

45 Illumination Correction
The physical limitations of any microscope produce nonuniformities in the optical path of the sample, microscope, and/or camera (a) (b) Example: Tiling raw images shows that there is uneven illumination from left to right in each image This heterogeneity can lead to inaccurate intensity measurements A cell located at (a) is brighter than one at (b) even if the cells have the same amount of fluorescent material Carpenter et al, Genome Biology 2006, 7:R100

46 Illumination Correction
Illumination correction ensures that object segmentation and measurements (e.g. DNA content) are more accurate Carpenter et al, Genome Biology 2006, 7:R100

47 Illumination Correction
Two modules Correct Illumination Calculate: Creates a illumination correction function Correct Illumination Apply: Applies the function to your images Available options Correct each image individually, or all images together as an ensemble? Calculate the illumination function by using foreground pixels or background pixels? Apply the function using division or subtraction? Additional considerations Create a new illumination correction function if you image on a different microscope or change plates Correct each channel since absolute illumination intensities may differ between channels First, create and save the function from image set, then load and apply it prior to identification

48 Cluster Computing If processing time is too great on a single computer, then run the pipeline on a cluster Download and install CellProfiler on a computing cluster Add the ExportToDatabase module Add the CreateBatchFiles module to the end of the pipeline and configure it appropriately Run the first image cycle locally Submit the batches to your cluster for processing Check the progress of processing For really big screens, it is necessary to process images in batches on a computing cluster.

49 At the end of a pipeline, you may have 500+ features per cell
Data Analysis At the end of a pipeline, you may have features per cell Size, shape, staining intensity, texture (smoothness), etc Remember our Philosophy: “Measure everything, ask questions later...”

50 Data Analysis What does this data set look like?
Cytological profile, or Cytoprofile Shows all the measurements acquired For each individual cell In every image In the entire experiment. +1 -1 Cell #

51 CellProfiler Analyst: Overview
Explore data large sets of images Identify interesting subpopulations and see the original images Identify interesting phenotypes automatically Goal: Provide the user with a powerful suite of image exploration and machine learning methods

52 The CellProfiler Analyst Interface
CellProfiler Analyst (CPA) allows you to explore the data with a variety of tools Upon startup, CPA request a properties file which contains Locations of the measurement tables How the images are referenced Other assorted information

53 Plate Viewer Displays data in plate layout 96- or 384-well format
Measurements are shown as color-coded wells or mouse tool-tips Right-clicking on well reveals list of images to display

54 Image Viewer Displays an image referenced by number Color display
Colors are assigned to each channel of image data Shown as a merged color image Toggle channel visibility and color scaling

55 Plotting Tools Various plotting tools allow user to explore and sift through the measurements and make discoveries

56 Y-axis: phospho-H3 staining
Data Analysis Why make so many measurements? For many screens, only a few measurements are necessary to obtain the phenotype Y-axis: phospho-H3 staining X-axis: DNA content

57 Data Analysis Unfortunately, for other phenotypes, the proper features are not so simple to find… Wild-type HT29 cells Cells on the move Peas in a pod Crescent-shaped nuclei Long projections Crooked projections Actin dots at junctions Hyphae-like projections

58 Data Analysis Concentrating on single cells allows us to avoid problems of heterogeneous populations, and to detect rare events (such as mitosis) However, determining which combinations of features and values are appropriate for a phenotype is tedious and impractical We have included a machine learning classification tool to automatically chose the features and values require to score a rare or subtle phenotype

59 Automated Cell Image Processing
Thousands of wells 104 images, 103 cells in each: Total of 107 cells/experiment Each cell with cytoprofile Cytoprofile of 500+ features measured for each cell

60 Iterative Machine Learning
System presents ~500 cells to biologists for scoring Iteration Rule Yes No System defines rule based on cytoprofile of scored cells

61 Iterative Machine Learning
107 cells Rule Scored Scored cells are sorted by well: Identify samples with a high proportion of positive cells

62 Final Notes Where to get help
Access help from the CellProfiler main window Ask for help on the CellProfiler.org forum

63 Image assay development Algorithm development & software engineering
The Team Director Image assay development Apply image analysis methods to biological questions IT/Administration Anne Carpenter Peggy (Margaret) Anthony David Logan Mark Bray Kate Madden Algorithm development & software engineering Develop & test new image analysis and data mining methods and create open-source software tools Ray Jones Vebjørn Ljoså Adam Fraser Lee Kamentsky Carolina Wählby Auguste Genovesio (begins 2010)


Download ppt "Getting Started with CellProfiler"

Similar presentations


Ads by Google