Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Bioconductor Aedín Culhane

Similar presentations

Presentation on theme: "Overview of Bioconductor Aedín Culhane"— Presentation transcript:

1 Overview of Bioconductor Aedín Culhane

2 Bioconductor Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2.9 (release coincide with R 2.14) To install use script on Bioconductor Website source("") biocLite()

3 Packages Overview BioConductor web site Bioconductor BiocViews Task viewTask view Software Annotation Data Experimental Data

4 What Packages do I need? Specific to you data and analysis pipeline but for examples: Bioconductor Workshops Bioconductor Workflows

5 Main types of Annotation Packages Gene centric AnnotationDbi packages: –Organism: –Technology/Platform: hgu133plus2.db. –GeneSets and Pathway (biology level): GO.db or KEGG.db –.db packages can be queried with sql or accessed using annotation package (totable, get, mget) Genome centric GenomicFeatures packages: –Transriptome level: TxDb.Hsapiens.UCSC.hg19.knownGene –Generic features: Can generate via GenomicFeatures biomaRt: –Query web-based `biomart' resource for genes, sequence, SNPs, and etc. See

6 Bioconductor resources Mailing List (sign up for daily digest) Documentation, workshop/course material online –Slides from talks, pdf of tutorials, R code Help available for each software package –Each package MUST contain vignette (howto)‏ Other resources

7 Vignette Tutorials, provide worked example of package Required in Bioconductor packages Written in Sweave (Leisch, 2002). – L A T E X dynamic reports in which R code is embedded and executable –All R code in vignette is checked (and executed) by R CMD check – library("Biobase") library("GOstats") # Load package of interest openVignette()

8 S4 classes and ExpressionSet Within Bioconductor, you will encounter packages are structured around S4 object- oriented programming proposed by John Chambers (developer of S) A class provides a software abstraction of a real world object. A method performs an action on a class (Think of a class as a noun, and method as verb)

9 Object (S4) An object is an instance of a class. Descriptions are stored in slots slotNames(ob1) lists all slots in object, or use str(). To access slots – – slotname(ob1), or –slot(ob1, “slotname")

10 Example: ExpressionSet library(ALL) data(ALL) slotNames(ALL) phenoData(ALL) class(ALL) ?ExpressionSet > ALL ExpressionSet (storageMode: lockedEnvironment) assayData: features, 128 samples element names: exprs protocolData: none phenoData sampleNames: LAL4 (128 total) varLabels: cod diagnosis... date last seen (21 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' pubMedIds: Annotation: hgu95av2

11 Method which act on a S4 class showMethods(class= "ExpressionSet") getMethod("write.exprs", "ExpressionSet") Or if you wish to see how the package really works, download and look the source code

12 Getting Data into R & Bioconductor Aedín Culhane

13 Simple Excel SpreadSheet data Simple table –read.table() –read.csv() –scan() However more datatype specialized. See Technologies on BiocViews. – ews.html ews.html Large data files. Also see 13

14 Some common data types Microarray SNP NGS May

15 A Microarray Overview 15

16 Reading Affymetrix Data library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA() May

17 Sample R code 17

18 ExpressionSet Class in R May

19 Assessing Data Quality May

20 Public Microarray Data ArrayExpress Studies (622,617 profiles,) GEO 22,735 Studies (558,074 profiles) Statistics May 2011

21 R Code May

22 More on GEOquery May require(GEOquery) Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity.GDS810 GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810))

23 Affy SNP Arrays May

24 Process – Affy SNP Arrays (Oligo package) May

25 Other Arrays Illumina –Lumi package 2 color spotted arrays –Limma package Other arrays – go-arrays/ May

26 Next Generation Sequencing Data

27 R Code May

28 Exercise Install the library GEOquery Download the dataset GSE1297 using getGEOGSE1297 This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs Use ArrayQualityMetrics to Assess the data quality of these data May

29 R basics: Getting help To get help –?mean –help(mean)“mean”)‏ apropos("mean") example(mean)‏

30 With thanks to May

Download ppt "Overview of Bioconductor Aedín Culhane"

Similar presentations

Ads by Google