biocLite() This will install a subset of packages by default. Some optional parameters are accepted by the script: pkgs –character vector of Bioconductor packages to install. destdir –the directory where the downloaded packages will be stored. lib –character vector giving the library directories under which packages may be installed. Recycled as needed.">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to lecture 7: An introduction to bioconductor IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of.

Similar presentations


Presentation on theme: "Welcome to lecture 7: An introduction to bioconductor IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of."— Presentation transcript:

1 Welcome to lecture 7: An introduction to bioconductor IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of Chemistry & Biochemistry, UCLA

2 What is bioconductor? From the bioconductor webpage (www.bioconductor.org):www.bioconductor.org Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. The project was started in the Fall of 2001. The Bioconductor core team is based primarily at the Fred Hutchinson Cancer Research Center. Other members come from various US and international institutions.core teamFred Hutchinson Cancer Research Center. Bioconductor is primarily based on the R programming language but we do accept contributions in any programming language. There are two releases of Bioconductor every year (they appear shortly after the corresponding R release). At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are a large number of meta-data packages available. They are mainly, but not solely oriented towards different types of microarrays.R release versiondevelopment versionmeta-data packages

3 Where to get bioconductor? From the bioconductor website: Right now, the easiest way to install BioConductor packages is using the biocLite.R installation script. Using the R CLI, simply type the following: >source("http://www.bioconductor.org/biocLite.R") > biocLite() This will install a subset of packages by default. Some optional parameters are accepted by the script: pkgs –character vector of Bioconductor packages to install. destdir –the directory where the downloaded packages will be stored. lib –character vector giving the library directories under which packages may be installed. Recycled as needed.

4 Bioconductor principles The following slides were compiled from the bioconductor vignettes and short courses available through the bioconductor website. Of special note are the presentations by Sandrine Dudoit and Robert Gentleman, 2002 (http://www.bioconductor.org/workshops/Summer02Cou rse/RBioC/RBioC.pdf) and the EMBO short course on microarray analysis, with emphasis on the bioconductor project (http://www.bioconductor.org/workshops/EMBO03), also by Sandrine Dudoit and Robert Gentleman, 2003.http://www.bioconductor.org/workshops/Summer02Cou rse/RBioC/RBioC.pdf

5 Bioconductor design Object-oriented class/method design. Allows efficient representation and manipulation of large and complex biological datasets of multiple types. Widgets. Specific, small scale, interactive components providing graphically driven analyses - point & click interface.

6 Bioconductor tools Interactive tools for linking experimental results to annotation and literature WWW resources in real time. E.g. PubMed, GenBank, LocusLink. Scenario: For a list of differentially expressed genes obtained from multtest or genefilter, use the annotate package to retrieve PubMed abstracts for these genes and to generate an HTML report with links to LocusLink for each gene.

7 Bioconductor packages General infrastructure – Biobase – annotate, AnnBuilder – tkWidgets. Pre-processing for Affymetrix data – affy Pre-processing for cDNA data – marrayClasses, marrayInput, marrayNorm, marrayPlots Differential expression

8 Bioconductor training R vignettes system – comprehensive repository of step-by-step tutorials covering a wide variety of computational objectives in /doc subdirectory; – integrated statistical documents intermixing text, code, and code output (textual and graphical); – documents can be automatically updated if either data or analyses are changed. Modular training segments – short courses: lectures and computer labs; – interactive learning and experimentation with the software platform and statistical methodology.

9 R programming, extended In order to deliver high quality software the Bioconductor project relies on a few programming techniques that might not be familiar: – environments and closures; – object oriented programming. We review these here for interested programmers (understanding them is not essential but is often very helpful).

10 Environments and closures An environment is an object that contains bindings between symbols and values. It is very similar to a hash table. Environments can be accessed using the following functions: – ls(env=e) # get a listing. – get(“x”, env=e) # get the value of the object in e with name x. – assign(“x”,y,env=e) # assign to the name x the value y in the environment e.

11 Environments and closures Since these operations are used a great deal in Bioconductor we have provided two helper functions: – multiget – multiassign These functions get and assign multiple values into the specified environment.

12 Environments and closures Environments can be associated with functions: When an environment is associated with a function, then that environment is used to obtain values for any unbound variables. The term closure refers to the coupling of the function body with the enclosing environment. The annotate, genefilter, and other packages take advantage of environments

13 Environments and closures > x <- 4 > e1 <- new.env() > assign(“x”,10, env=e1) > f <- function() x > environment(f) <- e1 > x # returns 4 > f() # returns 10!

14 OOP The Bioconductor project has adopted the OOP paradigm presented in Programming with Data, J. M. Chambers, 1998. Tools for programming using the class/method mechanism are provided in the methods package.

15 OOP A class provides a software abstraction of a real world object. It reflects how we think of certain objects and what information these objects should contain. A class defines the structure, inheritance, and initialization of objects. Classes are defined in terms of slots which contain the relevant data. An object is an instance of a class.

16 OOP A method is a function that performs an action on data (objects). A generic function is a dispatcher, it examines its arguments and determines the appropriate method to invoke. Examples of generic functions include plot summary print

17 OOP To obtain documentation (on-line help) about: – a class: class?classname so, class?exprSet, will display the help file for the exprSet class. – a method: methods?methodname so, methods?print, will display the help file for the print methods.

18 OOP The methods package contains a number of functions for defining new classes and methods (e.g. setClass, setMethod) and for working with these classes and methods. A tutorial is available at http://www.omegahat.org/RSMethods/index.html

19 OOP > setClass(“simple", representation(x="numeric",y="matrix“), prototype = list(x=numeric(),y=matrix(0))) > z <- new("simple", x=1:10, y=matrix(rnorm(50),10,5)) > z@x [1] 1 2 3 4 5 6 7 8 9 10

20 Biobase The Biobase package provides class definitions and other infrastructure tools that will be used by other packages. The two important classes defined in Biobase are: – phenoData: sample level covariate data. – exprSet: the sample level covariate data combined with the expression data and a few other quantities of interest.

21 Biobase: exprSet Slots for the exprSet class: exprs : a matrix of expression measures genes are rows, samples are columns. se.exprs : standard errors for the expression measures, if available. phenoData : an object of class phenoData that describes the samples. annotation : a character vector. description : a character vector. notes : a character vector.

22 Biobase: exprSet One of the most important tasks is to align the expression data and the phenotypic data (and to keep that alignment through the analysis). To achieve this, the exprSet class combines these two data sources into one object, and provides subsetting and access methods that make it easy to manipulate the data while ensuring that they are correctly aligned.

23 Biobase: exprSet A design principle that was adopted for the exprSet and other classes was that they should be closed under the subset operation. So any subsetting, either of rows or columns, will return a valid exprSet object. This makes it easier to use exprSet in other software packages

24 Biobase: exprSet Some methods for the exprSet class: show: controls the printing (you seldom want a few hundred thousand numbers rolling by). subset, [ and $, are both designed to keep correct subsets of the exprs, se.exprs, and phenoData objects. split, splits the exprSet into two or more parts depending on the vector used for splitting.

25 Biobase: exprSet Some methods for the exprSet class: geneNames, retrieves the gene names (row names of exprs). phenoData, pData, and sampleNames provide access to the phenoData slot. write.exprs, writes the expression values to a file for processing or storage.

26 Biobase: phenoData Slots for the phenoData class: pData: a dataframe, where the samples are rows and the variables are columns (this is the standard format). varLabels: a vector containing the variable names (as they appear in pData) and a longer description of the variables.

27 Biobase: phenoData Methods for the phenoData class include: – [, the subset operator; this method ensures that when a subset is taken, both the pData object and the varLabels object have the appropriate subsets taken. – $, extracts the appropriate column of the pData slot (as for a dataframe). – show, a method to control printing, we show only the varLabels (and the size).

28 Biobase: exprSet Some methods for the exprSet class: show: controls the printing (you seldom want a few hundred thousand numbers rolling by). subset, [ and $, are both designed to keep correct subsets of the exprs, se.exprs, and phenoData objects. split, splits the exprSet into two or more parts depending on the vector used for splitting.

29 OOP A method is a function that performs an action on data (objects). A generic function is a dispatcher, it examines its arguments and determines the appropriate method to invoke. Examples of generic functions include plot summary print

30 Let’s take an example simple affymetrix analysis using the Golub data set – AML/ALL leukemia dataset (1999) ALL AML Hu6800 GeneChip 47 ALL AML ALL 25 TEST Train

31 Affy arrays

32 Affymetrix GeneChip Design 5’ 3’ Reference Sequence …TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT… CCCTTACCCAGTCTTCCGGAGGCTA Perfect Match CCCTTACCCAGTGTTCCGGAGGCTA Mismatch

33 Terminology Each gene or portion of a gene is represented by 1q to 20 oligonucleotides of 25 base-pairs. Probe: an oligonucleotide of 25 base-pairs, i.e., a 25-mer. Perfect match (PM): A 25-mer complementary to a reference sequence of interest (e.g., part of a gene). Mismatch (MM): same as PM but with a single homomeric base change for the middle (13th) base (transversion purine pyrimidine, G C, A T). Probe-pair: a (PM,MM) pair. Probe-pair set: a collection of probe-pairs (1q to 20) related to a common gene or fraction of a gene. Affy ID: an identifier for a probe-pair set. The purpose of the MM probe design is to measure nonspecific binding and background noise.

34 Affymetrix files Main software from Affymetrix company MicroArray Suite - MAS, now version 5. DAT file: Image file, ~10^7 pixels, ~50 MB. CEL file: Cell intensity file, probe level PM and MM values. CDF file: Chip Description File. Describes which probes go in which probe sets and the location of probe-pair sets (genes, gene fragments, ESTs).

35 Expression Measures 10-20K genes represented by 11-20 pairs of probe intensities (PM & MM) Obtain expression measure for each gene on each array by summarizing these pairs Background adjustment and normalization are important issues There are many methods (MAS, RMA, Li/Wong, …)

36 Let’s take an example the Golub training data set We can access this as built-in data For a description of the Golub et al. (1999) dataset type ? golubTrain and to load these data type: > library(golubEsets) > data(golubTrain) > data(golubTest)

37 affy: oligo arrays case study Let’s look at some of the main functions in the affy package for pre-processing Affymetrix microarray data. A number of sample datasets are available in the package; to list these, type data(package="affy"). To load the package: > library(affy) The function ReadAffy is available for reading CEL files. One of the main classes in affy is the AffyBatch class. Other classes include ProbeSet (PM and MM intensities for individual probe sets), Cdf (information contained in a CDF file), and Cel (single array cel intensity data). The object Dilution is an instance of the class AffyBatch.

38 affy: oligo arrays case study > class(Dilution) [1] "AffyBatch" > slotNames(Dilution) [1] "cdfName" "nrow" "ncol" "exprs" "se.exprs" [6] "phenoData" "description" "annotation" "notes" > Dilution AffyBatch object size of arrays=640x640 features (12805 kb) cdf=HG_U95Av2 (12625 affyids) number of samples=4 number of genes=12625 annotation=hgu95av2 notes= > annotation(Dilution) [1] "hgu95av2"

39 affy: oligo arrays case study For a description of the target samples hybridized to the arrays > phenoData(Dilution) phenoData object with 3 variables and 4 cases varLabels liver: amount of liver RNA hybridized to array in micrograms sn19: amount of central nervous system RNA hybridized to array in micrograms scanner: ID number of scanner used > pData(Dilution) liver sn19 scanner 20A 20 0 1 20B 20 0 2 10A 10 0 1 10B 10 0 2

40 affy: oligo arrays case study The exprs slot contains a matrix with columns corresponding to arrays and rows to individual probes on the array. To obtain the matrix of intensities for all four arrays: > e <- exprs(Dilution) > nrow(Dilution) * ncol(Dilution) [1] 409600 > dim(e) [1] 409600 4 You can access probe-level PM and MM intensities using: > PM <- pm(Dilution) > dim(PM) [1] 201800 4 > PM[1:5, ] 20A 20B 10A 10B 1000_at1 468.8 282.3 433.0 198.0 1000_at2 430.0 265.0 308.5 192.8 1000_at3 182.3 115.0 138.0 86.3 1000_at4 930.0 588.0 752.8 392.5 1000_at5 171.0 128.0 152.3 97.8

41 affy: oligo arrays case study To get the probe set names (Affy IDs) > gnames <- geneNames(Dilution) > length(gnames) [1] 12625 > gnames[1:5] [1] "1000_at" "1001_at" "1002_f_at" "1003_s_at" "1004_at" > nrow(e)/length(gnames) [1] 32.44356 As with other microarray objects in Bioconductor packages, you can use subsetting commands for AffyBatch objects > dil1 <- Dilution[1] > class(dil1) [1] "AffyBatch" > dil1 AffyBatch object size of arrays=640x640 features (3204 kb) cdf=HG_U95Av2 (12625 affyids) number of samples=1 number of genes=12625 annotation=hgu95av2 notes=

42 affy: oligo arrays case study

43 To produce boxplots of probe log intensities: > boxplot(Dilution, col = c(2, 2, 3, 3))

44 affy: oligo arrays case study The boxplots and density plots show that the Dilution data needs normalization. As described in the dataset help file and in the phenoData slot (pData(Dilution)), two concentrations of mRNA were used and, for each concentration, two scanners were used. From the plots, we note that scanner effects seem stronger than concentration effects (different colors). Arrays that should be the same are different; arrays that should be different are similar. Because different mRNA concentrations were used, we perform probe-level normalization within concentration groups: > Dil20 <- normalize(Dilution[1:2]) > Dil10 <- normalize(Dilution[3:4]) > normDil <- merge(Dil20, Dil10) Notice how the boxplot now looks better: > boxplot(normDil, col = c(2, 2, 3, 3))

45 affy: oligo arrays case study Having access to PM and MM data can be useful. Let’s look at a plot of PM vs.MM: > plot(mm(Dilution[1]), pm(Dilution[1]), pch = ".", log = "xy") > abline(0, 1, col = "red")

46 Case study: Golub et. al., 1999 The microarray expression measures and clinical variables corresponding to each of the 38 training mRNA samples are stored in an object of class exprSet. For information on the training data golubTrain: > class(golubTrain) [1] "exprSet“ > slotNames(golubTrain) [1] "exprs" "se.exprs" "phenoData" "description" "annotation" [6] "notes“ > golubTrain Expression Set (exprSet) with 7129 genes 38 samples phenoData object with 11 variables and 38 cases varLabels Samples: Samples ALL.AML: ALL.AML BM.PB: BM.PB T.B.cell: T.B.cell …

47 Naïve differential expression: Golub et. al., 1999 We will illustrate some of the tools for reasoning about differential expression. First, we compute two-sample t statistics in this restricted dataset: > library(genefilter) > ALLinds <- which(small$ALL.AML == "ALL") > AMLinds <- which(small$ALL.AML == "AML") > myt <- fastT(exprs(small), ALLinds, AMLinds) We will focus on genes for which the signed t statistic exceeds 2: > bigT 2 > heatmap(exprs(small)[bigT, ])

48 Case study: Golub et. al., 1999 The phenotypic data are stored in a separate, but linked, object of class phenoData. An object of class phenoData is a combination of a data frame containing a number of variables for each array and a list that explains what each variable represents. > phenoTrain <- phenoData(golubTrain) > class(phenoTrain) [1] "phenoData" > slotNames(phenoTrain) [1] "pData" "varLabels" > varLabels(phenoTrain) $Samples [1] "Samples" $ALL.AML [1] "ALL.AML" … > pData(phenoTrain) Samples ALL.AML BM.PB T.B.cell FAB Date Gender pctBlasts Treatment 1 1 ALL BM B-cell 9/4/1996 M NA …

49 Case study: Golub et. al., 1999 The $ operator can be used to extract particular variables from an object of class phenoData. It also can be used directly on the exprSet instance: > table(phenoTrain$ALL.AML) ALL AML 27 11 > table(golubTest$ALL.AML) ALL AML 20 14 Data on only the first 10 genes in the first 3 chips can be obtained using the subsetting operator "[“ > golubTrain[1:10, 1:3] Expression Set (exprSet) with 10 genes 3 samples phenoData object with 11 variables and 3 cases …

50 Annotation: Golub et. al., 1999 One of the largest challenges in analyzing genomic data is associating the experimental data with the available biological metadata, e.g., sequence, gene annotation, chromosomal maps, literature. Bioconductor provides two main packages for this purpose: annotate (end-user) AnnBuilder (developer) These packages can be used for querying databases such as GenBank, GO, LocusLink, and PubMed, from R, and for processing the query results in R.

51 Annotation: Golub et. al., 1999 Using the Golub et al. (1999) dataset as a case study we will now explore annotation resources for the Affymetrix HU6800 chip used in this study. The Bioconductor annotation data package for the HU6800 chip is hu6800 and can be downloaded from the ”Data packages” section of the Bioconductor website. To load the packages needed for annotation type: > library(Biobase) > library(annotate) > library(tkWidgets) > library(geneplotter) > library(golubEsets) > library(hu6800) > library(GO)

52 Annotation: Golub et. al., 1999 Quality control data, counts, etc., for the mappings are available by calling the function hu6800() : > hu6800() Quality control information for hu6800: Date built: Thu Feb 13 11:00:27 2003 Number of probes: 7129 Probe number missmatch: None Probe missmatch: None Mappings found for probe based rda files: hu6800ACCNUM found 7092 of 7129 hu6800CHRLOC found 6467 of 7129 hu6800CHRORI found 6467 of 7129 hu6800CHR found 6882 of 7129 hu6800ENZYME found 1067 of 7129 hu6800GENENAME found 6913 of 7129 hu6800GO found 5731 of 7129 hu6800GRIF found 3533 of 7129 hu6800LOCUSID found 6918 of 7129 hu6800MAP found 6873 of 7129 hu6800PATH found 1448 of 7129 hu6800PMID found 6837 of 7129 …

53 Annotation: Golub et. al., 1999 The main R functions we will use for dealing with environments are ls (to list the names of objects in a specified environment) and get (to search for an R object with a given name and return its value, if found, in the specified environment). We obtain the Affymetrix identifiers using ls and note that there are 7,129 identifiers for the HU6800 chip (this matches the Golub exprSets). > affyID <- ls(env = hu6800CHR) > length(affyID) [1] 7129

54 Annotation: Golub et. al., 1999 We now arbitrarily select the probe set with Affymetrix identifier "U18237_at" and obtain a number of other IDs corresponding to this probe set. Note that there may be several PMIDs, i.e., PubMed abstracts, associated with a given gene. > mygene <- affyID[4001] > mygene [1] "U18237_at" > get(mygene, env = hu6800ACCNUM) [1] "U18237" > get(mygene, env = hu6800LOCUSID) [1] 23456 > get(mygene, env = hu6800SYMBOL) [1] "ABCB10" > get(mygene, env = hu6800GENENAME) [1] "ATP-binding cassette, sub-family B (MDR/TAP), member 10"

55 Annotation: Golub et. al., 1999 In some cases, we need to obtain data on several genes at once. We wrote a special function for this purpose: multiget. > fivegenes <- affyID[6:10] > fivegenes [1] "AB000409_at" "AB000410_s_at" "AB000449_at" "AB000450_at" [5] "AB000460_at" > multiget(fivegenes, env = hu6800PMID) $"AB000409_at" [1] 10859165 9155018 $"AB000410_s_at" [1] 12244119 12189194 12164330 12119232 12117782 12034821 11992556 11927502 [9] 11902834 11837743 10449904 10233168 9681819 9348312 9321410 9223306 [17] 9223305 9207108 9197244 9190902 9187114 $"AB000449_at" [1] 9344656 $"AB000450_at" [1] 9344656 $"AB000460_at" [1] 9734812

56 Annotation: Golub et. al., 1999 A pubMedAbst class structure was defined for handling PubMed abstracts in R. The slots are: > slotNames("pubMedAbst") [1] "authors" "abstText" "articleTitle" "journal" "pubDate" [6] "abstUrl"

57 Annotation: Golub et. al., 1999 The function pm.getabst provides a simpler way to download the specified PubMed abstracts (stored in XML) and create a list of pubMedAbst objects. The following commands can be used to store the PubMed abstracts for 5 genes in a list of objects of class pubMedAbst, to compute the number of abstracts retrieved for each gene, and to print the abstract(s) for the first gene. We can see, for example, that one of the genes has 21 abstracts associated with it: > absts2 <- pm.getabst(fivegenes, "hu6800") > lapply(absts2, length) $"AB000409_at" [1] 2 $"AB000410_s_at" [1] 21

58 Annotation: Golub et. al., 1999 > absts2[[1]] [[1]] list() attr(,"authors") [1] "R Cuesta" "G Laroia" "RJ Schneider" attr(,"abstText") [1] "Inhibition of protein synthesis during heat shock limits accumulation of unfolded attr(,"articleTitle") [1] "Chaperone hsp27 inhibits translation during heat shock by binding eIF4G and facilitating attr(,"journal") [1] "Genes Dev" attr(,"pubDate") [1] "Jun 2000" attr(,"abstUrl") [1] "No URL Provided" attr(,"class") [1] "pubMedAbst"

59 Annotation: Golub et. al., 1999 Another source of data is GenBank. We can interact with GenBank in much the same way as with PubMed; the main function for this purpose is genbank. > gbacc <- multiget(fivegenes, hu6800ACCNUM) > genbank(gbacc, disp = "browser") > gb <- genbank(gbacc[1], disp = "data")

60 Annotation: Golub et. al., 1999 The classes chromLoc and chromLocation are used to keep track of location information for a single gene and a set of genes (entire genome), respectively. The object hu6800ChrClass is an instance of the class chromLocation for the Affymetrix HU6800 chip. > slotNames("chromLoc") [1] "chrom" "position" "strand" > data(hu6800ChrClass) > species(hu6800ChrClass) [1] "Human" > chromNames(hu6800ChrClass) [1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" [16] "3" "4" "5" "6" "7" "8" "9" "X" "Y"

61 Annotation: Golub et. al., 1999 The geneplotter package provides two main functions for plotting genomic data: cPlot and alongChrom. These functions operate on instances of the class chromLocation. With cPlot, we can render all chromosomes the same length or we can scale them by their relative lengths. > cPlot(hu6800ChrClass, scale = "relative")

62 Annotation: Golub et. al., 1999 We can feed information from these objects into standard R graphing capabilities, such as image: > alongChrom(golubTrain, chrom = 22, specChrom = hu6800ChrClass, plotFormat = "image")

63 marray packages for spotted arrays The four main marray functions available are: marrayClasses. This package contains class definitions and associated methods for pre- and post-normalization intensity data for batches of arrays. Methods are provided for the creation and modification of microarray objects, basic computations, printing, subsetting, and class conversions. marrayInput. This package provides functionality for reading microarray data into R, such as intensity data from image processing output files (e.g.,.spot and.gpr files for the Spot and GenePix packages, respectively) and textual information on probes and targets (e.g., from.gal files and god lists). marrayPlots. This package provides functions for diagnostic plots of microarray spot statistics, such as boxplots, scatterplots, and spatial color images. marrayNorm. This package implements robust adaptive location and scale normalization procedures, which correct for different types of dye biases (e.g., intensity, spatial, plate biases) and allow the use of control sequences spotted onto the array and possibly spiked into the mRNA samples. Normalization is needed to ensure that observed differences in intensities are indeed due to differential expression and not experimental artifacts; fluorescence intensities should therefore be normalized before any analysis that involves comparisons among gene expression measures within or between arrays.

64 Marray: spotted arrays case study To load the packages > library(marrayNorm) We will work with the example sample dataset swirl; for a description of swirl, type ? swirl. To load this dataset: > data(swirl)

65 Marray: spotted arrays case study The object swirl is an instance of the class marrayRaw. Try the following commands to obtain information on this object: > class(swirl) [1] "marrayRaw" > slotNames(swirl) [1] "maRf" "maGf" "maRb" "maGb" "maW" "maLayout" [7] "maGnames" "maTargets" "maNotes" > swirl Pre-normalization intensity data: Object of class marrayRaw. Number of arrays: 4 arrays. A) Layout of spots on the array: Array layout: Object of class marrayLayout. Total number of spots: 8448 Dimensions of grid matrix: 4 rows by 4 cols …

66 Marray: spotted arrays case study To access individual slots: > maLayout(swirl) Array layout: Object of class marrayLayout. Total number of spots: 8448 Dimensions of grid matrix: 4 rows by 4 cols Dimensions of spot matrices: 22 rows by 24 cols Currently working with a subset of 8448 spots. Control spots: There are 2 types of controls : …

67 Marray: spotted arrays case study As with other microarray objects in Bioconductor packages, you can use subsetting commands for marrayRaw objects. For data on the first 100 genes in the second array in the swirl batch: > sw <- swirl[1:100, 2] > class(sw) [1] "marrayRaw" > sw Pre-normalization intensity data: Object of class marrayRaw. Number of arrays: 1 arrays. A) Layout of spots on the array: Array layout: Object of class marrayLayout. Total number of spots: 8448 Dimensions of grid matrix: 4 rows by 4 cols Dimensions of spot matrices: 22 rows by 24 cols Currently working with a subset of 100 spots. …

68 Marray: spotted arrays case study You can access red and green foreground and background intensities, and log ratios as follows: > Rf <- maRf(swirl) > dim(Rf) [1] 8448 4 > Rf[1:5, ] swirl.1.spot swirl.2.spot swirl.3.spot swirl.4.spot [1,] 19538.470 16138.720 2895.1600 14054.5400 [2,] 23619.820 17247.670 2976.6230 20112.2600 [3,] 21579.950 17317.150 2735.6190 12945.8500 [4,] 8905.143 6794.381 318.9524 524.0476 [5,] 8676.095 6043.542 780.6667 304.6190

69 Marray: spotted arrays case study The marrayPlots package provides functions for diagnostic plots of microarray spot statistics, such as boxplots, scatterplots, and spatial color images. To produce a spatial image of background intensities for the Cy3 channel in the third array: > tmp <- maImage(swirl[, 3], x = "maGb", bar = FALSE)

70 Marray: spotted arrays case study To produce a spatial image of log ratios for the first array in the batch: > tmp <- maImage(swirl[, 1], col = maPalette(low = "blue", high = "yellow"), bar = FALSE)

71 Marray: spotted arrays case study To produce boxplots of log ratios by sector for the first array in the batch: > maBoxplot(swirl[, 1])

72 Marray: spotted arrays case study For boxplots of log ratios for all four arrays: > maBoxplot(swirl)

73 Marray: spotted arrays case study The marrayNorm package implements robust adaptive location and scale normalization procedures, which correct for different types of dye biases (e.g., intensity, spatial, plate biases). The main location and scale normalization function is maNormMain. Simpler wrapper functions are provided in maNorm and maNormScale. The functions operate on objects of class marrayRaw (or possibly marrayNorm, if normalization is performed in several steps) and return objects of class marrayNorm. For within-print-tip-group loess location normalization of the batch swirl: > swirl.norm <- maNormMain(swirl)

74 Marray: spotted arrays case study

75 Case study: Golub et. al., 1999 The microarray expression measures and clinical variables corresponding to each of the 38 training mRNA samples are stored in an object of class exprSet. For information on the training data golubTrain: > class(golubTrain) [1] "exprSet“ > slotNames(golubTrain) [1] "exprs" "se.exprs" "phenoData" "description" "annotation" [6] "notes“ > golubTrain Expression Set (exprSet) with 7129 genes 38 samples phenoData object with 11 variables and 38 cases varLabels Samples: Samples ALL.AML: ALL.AML BM.PB: BM.PB T.B.cell: T.B.cell …

76 Thank you! I hope I have been helpful to you this summer. Please always feel free to post questions on the bioinformatics mailing list. Have fun with bioinformatics!!! I need bioinformatics tools… Lots of bioinformatics tools…


Download ppt "Welcome to lecture 7: An introduction to bioconductor IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of."

Similar presentations


Ads by Google