Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioConductor - R for Microarray Analysis

Similar presentations


Presentation on theme: "BioConductor - R for Microarray Analysis"— Presentation transcript:

1 BioConductor - R for Microarray Analysis
Claudio Lottaz Computational Diagnostics Group Computational Molecular Biology Department Max Planck Institute for Molecular Genetics

2 Overview Introduction File Formats Data structures Analysis methods
16-Apr-17 Overview Introduction File Formats Data structures Analysis methods Summary

3 The R-Project S/Splus: commercial statistics software package
Introduction 16-Apr-17 The R-Project S/Splus: commercial statistics software package origins in the academic community now commercialized serious effort in graphical user interface and the like R: public domain statistics software package based on the public roots of S still compatible with the S language command-line like user interface

4 BioConductor R is extendable through packages
Introduction 16-Apr-17 BioConductor R is extendable through packages packages may be written in R (language) programming interface to C available BioConductor is a collection of packages various contributors various methods on different types of data heterogeneous usage

5 File Formats Importing red/green experiments data
16-Apr-17 File Formats Importing red/green experiments data intensities from image processing output .spot or .gpr files (Spot or GenePix packages) Textual information on probes and targets .gal and .gdl files generated by GenePix Importing Affymetrix data: reads Affymetrix CEL-files needs copyright protected CDF-files for interpretation Exporting tab delimited ASCII-files

6 Red/Green Specific Data Structures
16-Apr-17 Red/Green Specific Data Structures marrayLayout objects: contain information on Probes and their locations House-keeping genes marrayRaw objects: intensities for a batch of arrays red/green, Foreground/back ground information on applied targets marrayNorm objects: post normalization data Average log intensities, normalized log ratios Normalization factors

7 Affymetrix Specific Data Structures
16-Apr-17 Affymetrix Specific Data Structures Cdf objects: chip description Cel objects: contains probe data of one chip Cel.container object: a set of Cel objects PPSet object: all probes for a particular target PPSet.container object: a set of PPSet objects For convenience: Plobs (probe level objects) contain a Cdf and a Cel-container object Simple use, less flexible access

8 Common Data Structures
16-Apr-17 Common Data Structures exprSet objects: hold expression data matrix of expression data and standard errors link to phenotype data and gene annotations geneNames to identify the genes phenoData objects: hold phenotype/patient data list of variables for each phenotype matrix of data: row per case, column per variable Some packages use their own data structures

9 Utilities Utilities for resampling Aggregators Summary statistics
Data Structures 16-Apr-17 Utilities Utilities for resampling Aggregators e.g. cumulate results in a cross-validation Summary statistics Convenient methods for graphical output histograms, scatter plots, gene location, boxplots... on various subsets of data

10 Red/Green Specific Analysis
Analysis Methods 16-Apr-17 Red/Green Specific Analysis Diagnostical plots to find printing, hybridization or scanning artifacts boxplots, scatter plots and spatial images Foreground, background, log-ratio... Normalization (Yang et al. 2001, 2002) location normalization: local weighted regression, intensity dependent or 2D spatial Scale normalization: median absolute deviation (MAD)

11 Affymetrix Specific Analysis
Analysis Methods 16-Apr-17 Affymetrix Specific Analysis Exploring probe level data (package affy) probe names, perfect match/mismatch intensities,... Normalization (on probe data) MVA plots for Affymetrix data Various methods, default is quantile normalization Determining expression levels Various methods: Affymetrix (1999), Li&Wong (2001), Irizarry (2002) Standard errors are determine per expression value

12 Common Analysis Gene filtering: e.g.
Analysis Methods 16-Apr-17 Common Analysis Gene filtering: e.g. find high expressed genes find differentially expressed genes (also more than 2 groups) Find genes with similar expression patterns to given gene of interest Receiver operating characteristic (ROC) Annotation: chromosome location, gene ontology

13 Common Analysis (continued)
Analysis Methods 16-Apr-17 Common Analysis (continued) Expression density diagnostics gene-wise compare distributional shapes to find differences between groups Multiple hypothesis testing family-wise error rates, false discovery rate minP and maxT procedures, step-up procedures based on various statistic (t-, F-, Wilcoxon...) adjusted p-values for genes declared differentially expressed, obtained through permutation

14 Summary Public domain software, reproducible methods
16-Apr-17 Summary Public domain software, reproducible methods Open source, references to publications Sophisticated methods available Rather specific input formats needed, license problem on Affymetrix chip description files Some heterogeneity in implementation Blurry definition of the R language


Download ppt "BioConductor - R for Microarray Analysis"

Similar presentations


Ads by Google