Ulrike Naumann Ulrike.Naumann1@gmail.com High-throughput flow cytometry data and how to load, transform and visualise data and gate populations in Bioconductor.

Slides:



Advertisements
Similar presentations
© 2013 IBM Corporation Reducing Cost with R in IBM Storage Products Manufacturing Elaine Jones Integrated Supply Chain Engineering.
Advertisements

Applications of one-class classification
Machine Learning Homework
CHAPTER 13: Alpaydin: Kernel Machines
Introduction to Flow Cytometry
© 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years.
An Introduction to Bioconductor Bethany Wolf Statistical Computing I April 4, 2013.
Welcome to a tour of Limb Volumes Professional 6.0 LVP6.0 is the latest version of the world famous and widely used lymphedema tracking and reporting software.
ImageJ tutorial showing the operations needed to calculate air-filled porosity for an example soil column.
Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Locally Constraint Support Vector Clustering
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
FLOW CYTOMETRY Dr. MOHAMMED H SAIEMA LDAHR KAAU FACULTY OF APPLIED MEDICAL SCIENCES MEDICAL TECHNOLOGY DEPT. 2 ND YEAR MT INSTROMINTATION EXT
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Patrick Kemmeren Using EP:NG.
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Hydrologic Statistics
GLIDER
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
An Introduction to Bioconductor Bethany Wolf Statistical Computing I April 9, 2014.
Cell viability studies Sepideh Khoshnevis. The Goal To distinguish live cells from dead and apoptotic cells in order to calculate the the percentage of.
Term 2, 2011 Week 1. CONTENTS Types and purposes of graphic representations Spreadsheet software – Producing graphs from numerical data Mathematical functions.
National Immune Monitoring Laboratory University of Montreal, Quebec, Canada Biosystemix Ltd. Biosystemix, Ltd.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
Qualitative and Quantitative Research Quantitative Deductive: transforms general theory into hypothesis suitable for testing Deductive: transforms general.
Reconstructing 3D mesh from video image sequences supervisor : Mgr. Martin Samuelčik by Martin Bujňák specifications Master thesis
Introduction To Flow Cytometry By Noha Kamel. Flow cytometry is a method of measuring multiple physical and chemical characteristics of particles by optical.
Dr Gihan Gawish Hydrodynamic focusing is a technique used to provide more accurate results from flow cytometers or Coulter counters for determining the.
Basic Principles in Flow Cytometry
 Flow cytometry is a technique for counting, examining, and sorting microscopic particles suspended in a stream of fluid.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
Procedures for managing workflow components Workflow components: A workflow can usually be described using formal or informal flow diagramming techniques,
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Flow Cytometry Basic Training. What Is Flow Cytometry? Flow ~ cells in motion Cyto ~ cell Metry ~ measure Measuring properties of cells while in a fluid.
Frame with Cutout Random Load Fatigue. Background and Motivation A thin frame with a cutout has been identified as the critical component in a structure.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
BD FACS CAP™ Profile the expression of surface proteins specific to a cell type –Currently 189 antibodies –Produces data similar to a “protein chip” –Expect.
Digital Image Processing
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
1 ITN clinical trial analysis design – Transplantation example Cross-sectional and longitudinal clinical trials to assess tolerance Disease areas: Transplantation.
Whole Slide Image Stitching for Osteosarcoma detection Ovidiu Daescu Colaborators: Bogdan Armaselu and Harish Babu Arunachalam University of Texas at Dallas.
Cluster Active Archive 1 Drift Velocity/Electric Field Comparison: CIS, EDI, EFW, FGM Jonathan Kissi-Ameyaw Cluster Active Archive ESTEC.
1 Berger Jean-Baptiste
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
Date of download: 6/27/2016 Copyright © 2016 SPIE. All rights reserved. Sample CP-A and CP-B cells labeled with fluorescent protoporphyrin IX (PpIX) in.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Exploring Data: Summary Statistics and Visualizations
Flowcytometry.
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Fitting Curve Models to Edges
Outlier Discovery/Anomaly Detection
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Volume 6, Issue 5, Pages e5 (May 2018)
Tree Net algorithm contruction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Ulrike Naumann Ulrike.Naumann1@gmail.com High-throughput flow cytometry data and how to load, transform and visualise data and gate populations in Bioconductor (R) Ulrike Naumann Ulrike.Naumann1@gmail.com

Content What is flow cytometry? Bioconductor Software packages in Bioconductor for flow cytometry data Read data into R Transform the data, take out extremes Steps in the analysis process Example of output How to summarise the output visually and interpret it References

Flow Cytometry (FCM) Flow Cytometry is a technique for counting, examining and sorting microscopic particles suspended in a stream of fluid.

Flow Cytometry Cells have been stained with monoclonoal antibodies. Several Detectors are aimed at the point where the stream of fluid passes through a light beam and pick up the reflected/scattered light The detectors differ in what they measure: FSC (Forward Scatter) detector measures correlates with the cell volume and SSC (Side Scatter) measures correlate with the inner complexity of the particle

High-throughput flow cytometry 21st Century technology large numbers of flow cytometric samples can be processed and analysed in a short period of time Challenge: high-dimensional complex data. Manual analysis time-consuming, subjective and error-prone  Development of automatic gating methods

Software for Analysis of Flow Cytometry Data Beckman Coulter Kaluza Software Cellular Symphony Flow Cytometry Software Millipore Flow Cytometry Software SPICE Data Mining & Visualization Software WEASEL Flow Cytometry Software … Today I’m only going to explain briefly how Flow Cytometry Data can be processed using Bioconductor

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. uses the R statistical programming language, and is open source and open development Installation of basic Bioconductor packages in R: > source("http://bioconductor.org/biocLite.R") > biocLite() > biocLite("flowCore ") > biocLite("curvHDR") install curvHDR package over the menue. This will also install the following packages: ‘abind’ ‘akima’ ‘magic’ locfit’ ‘ash’ ‘mvtnorm’ ‘feature’ ‘geometry’ ‘hdrcde’ ‘ks’ ‘misc3d’ ‘ptinpoly ‘rgl ‘

Software packages in Bioconductor for flow cytometry data These packages use standard FCS files, including infrastructure, utilities, visualization and semi-automated gating methods for the analysis of flow cytometry data. flowCore: Basic structures for flow cytometry data flowViz: Visualization of flow cytometry flowQ: Quality control for flow cytometry flowStats: Statistical methods for the analysis of flow cytometry data flowUtils: Utilities for flow cytometry flowFP: Fingerprint generation of flow cytometry data, used to facilitate the application of machine learning and data mining tools for flow cytometry. flowTrans: Profile maximum likelihood estimation of parameters for flow cytometry data transformations. iFlow: Tool to explore and visualize flow cytometry

Algorithms for clustering flow cytometry data are found in these packages: flowClust: Robust model-based clustering using a t-mixture model with Box-Cox transformation. flowMeans: Identifies cell populations in Flow Cytometry data using non-parametric clustering and segmented-regression-based change point detection. flowMerge: Merging of mixture components for model-based automated gating of flow cytometry data using the flowClust framework. SamSPECTRAL: Given a matrix of coordinates as input, SamSPECTRAL first builds the communities to sample the data points. A typical workflow using the packages flowCore, flowViz, flowQ and flowStats is described in detail in flowWorkFlow.pdf. The data files used in the workflow can be downloaded from here.

Load the Data Loading data is a complex step, that can take a long time. Flow cytometry experiments typically involve data from several patients several time points a number of antibody stain combinations 1. Understand the structure of the data 2. Read in the data. We decided to create for each patient a separate R workspace file (.Rdata)

Format, and organising the data to read it into R Time of this presentation is too short to give a detailed account so I will go only into some issues we encountered. The available data is a time series. The number of days differ for each participant. The days available differ for each study participant! 10 different Antibody-combinations were used in our data.

Transform the data, take out extremes Determine minima and maxima of each flow cytometry sample. Remove recordings that accumulate on the boundaries (usually upper boundaries) possibly transform samples to reduce their skewness

Gating Flow Cytometry Data with curvHDR - Steps in the analysis process curvHDR – package in R/Bioconductor for gating FCM data and displaying gates Functions: curvHDRfilter and plot The most important parameters of the function curvHDRfilter are the dataset x, HDRlevel, growthFac and signifLevel x a numerical vector or a matrix or data frame having 1-3 columns. HDRlevel number between 0 and 1 corresponding to the level of the highest density region within each high curvature region. growthFac growth factor parameter. High curvature regions are grown to have ‘volume’ growthFac times larger than the original region. signifLevel number between 0 and 1 corresponding to the significance level for curve region determination.

Steps in the analysis process Defaults: HDRlevel = 0.1 growthFac = 5^(d/2) where d is the dimension of the input data; signifLevel = 0.05 Start with trial values of signifLevel and growthFac and HDRlevel, set at defaults. Example: xBiva <- cbind(c(rnorm(1000,-2),rnorm(1000,2)), c(rnorm(1000,-2),rnorm(1000,2))) gate2a <- curvHDRfilter(xBiva)

Example of output Output: xBiva[gate2a$insideFilter,] contains the data inside of the gate. data the input data (for use in plotting). insideFilter logical variable indicating the rows of the input data matrix corresponding to points inside the curvHDR filter. polys the curvHDR filter. Depending on the dimension d this is a list of intervals (d=1), polygons (d=2) or polyhedra (d=3). HDRlevel highest density region level

Example of Output We can visualise the output of curvHDRfilter with the plot command. plot(gate2a) In our actual dataset, we combined respectively 3 gates to obtain our final combined gates – a 2-dim gate on (FSC,SSC), a 1-dim gate Ab3, a 2-dim gate on (Ab1,Ab2). Our intention was to match the results of an already existing analysis (Brinkman 2007). We selectively tested out a number of choices for the parameters HDRlevel, growthFac and signifLevel. Applying the same set of these 3 parameters globally for the entire dataset provided good results.

How to summarise the output visually and interpret it We want obtain for each patient and each patient day and each anti-body combination, a summary of the data that passed through the gate. In curvHDR: get the proportion of gated cells for each patient, each day and each AB combination example: length(xBiva[gate2a$insideFilter,1]) Apply variance stabilising transformation on the proportion data

Summary using Lattice graphics - ggplot

Findings – Signatures for Graft-versus-Host Disease Estimated contrast curves (cellular signatures) arising from fitting the longitudinal data. The shading around each curve corresponds to approximate point-wise 95% confidence intervals.

References Brinkman, R.R., Gasparetto, M., Lee, S.-J.J., Ribickas, A.J., Perkins, J., Janssen, W., Smiley, R. and Smith, C. (2007). High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. Biology of Blood and Marrow Transplantation, 13, 691–700. Ellis, B., Haaland, P., Hahne, F., Le Meur, F. & Gopalakrishnan, N. (2009). flowCore 1.10.0. Basic structures for flow cytometry data Bioconductor package. http://www.bioconductor.org. Shapiro, H.M. (2003). Practical Flow Cytometry, 4th Edition. John Wiley & Sons, New York. Lo, K., Brinkman, R.R. & Gottardo, R. (2008). Automatic gating of flow cytometry data via robust model-based clustering. Cytometry Part A, 321–332

References Naumann U. & Wand M.P. (2009). Automation in high-content flow cytometry screening. Cytometry Part A, (2009), 75A, 789-797. Naumann U., Luta G. & Wand M.P. (2009). The curvHDR method for gating flow cytometry samples. BMC Bioinformatics, (2010), 11:44, 1-13. Gentleman R., Carey V., Huber W., Irizarry R., Dutoit S., (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor Gentleman R., (2008). R Programming for Bioinformatics Aghaeepour N., Finak G., The FlowCAP Consortium, The DREAM Consortium, Hoos H., Mosmann T. R., Brinkman R., Gottardo R. & Scheuermann R. H., (2013). Critical assessment of automated flow cytometry data analysis techniques. Nature Methods (Advanced Online Publication)