Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agenda Introduction to microarrays

Similar presentations


Presentation on theme: "Agenda Introduction to microarrays"— Presentation transcript:

0 Microarray Analysis Using R/Bioconductor
Reddy Gali, Ph.D.

1 Agenda Introduction to microarrays
Workflow of a gene expression microarray experiment Publishing microarray data (MIAME format) Microarray experimental design Public microarray databases Microarray preprocessing - Quality control and Diagnostic analysis

2 Agenda Introduction to R/Bioconductor
Installation of R and Bioconductor Packages General data analysis and strategies Data analysis using AffylmGUI 2

3 Microarray Applications
Analyze and compare patterns of gene expression - before and after an intervention - between tissue types - between transgenic strains - in neighboring cells (laser capture microdissection) Find DNA copy-number variations SNP detection Tool for genotyping High throughput screening tool for drug discovery Elucidate gene function (RNAi microarrays; Silva et al., PNAS 2004) Investigate interactions between DNA and protein (ChIP on Chip) 3

4 Workflow of Gene Expression
Biological question Experimental design QC Tissue / sample preparation Extraction of Total RNA Probe amplification & labeling Microarray hybridization & processing Image analysis Data analysis Expression measures - Normalization - Statistical Filtering - Clustering - Pathway analysis Biological Verification

5 Pitfalls of Microarray Experiment
Gene expression changes detected by microarray analysis cannot be validated by other methods - Inadequate design Data quality is low - Statistical approach is not adequate - Expression level of gene is below detection limit - Change in gene expression is small - Microarray detection probe is not specific or not sensitive 5

6 Microarray Processing

7 Two color vs Single color
Homemade Microarray Affymetrix GeneChip Tissue Tissue normal diseased normal diseased Total RNA Total RNA cDNA synthesis First-strand cDNA synthesis Double-stranded cDNA Cy5 Cy3 in vitro transcription Cy3 or Cy5 labeled cDNA Biotin-labeled cRNA Mixing Hybridization Hybridization and Staining Raw Data Output Raw Data Output Expression Ratio to Absolute Expression Values 7

8 Affymetrix probe design
PM MM 11 Probe pairs / Probe Set Multiple Probe Sets / Gene Lipshutz et al; 1999; Nature Genetics, 21(1):20-24 8

9 Questions usually asked
What kind of technology or microarrays I have to use How many replicates do I need What is a real replicate Do I need statistical advice Should I do technical replicate Should I do dye swap Should I pool my samples How do I analyze my dataset What software should I use 9

10 Design of Microarray Experiment
Replicates Goal, resources, technology, quality, design and analysis Two fold change – 3 replicates Smaller change – 5 replicates Technical replicates and Biological replicates Sample pooling Amount of sample Replicates of pooled sample No way to find variance between samples 10

11 MIAME- How to publish Minimum Information About a Microarray Experiment (MIAME)- 11

12 MIAME – Check list Type of experiment: for example, is it a comparison of normal vs. diseased tissue, a time course, or is it designed to study the effects of a gene knock-out? Experimental factors: the parameters or conditions tested, such as time, dose, or genetic variation. The number of hybridizations performed in the experiment. The type of reference used for the hybridizations, if any. Hybridization design: if applicable, a description of the comparisons made in each hybridization, whether to a standard reference sample, or between experimental samples. An accompanying diagram or table may be useful. Quality control steps taken: for example, replicates or dye swaps. 12

13 MIAME – Check list The origin of the biological sample (for instance, name of the organism, the provider of the sample) and its characteristics: for example, gender, age, developmental stage, strain, or disease state. Manipulation of biological samples and protocols used: for example, growth conditions, treatments, separation techniques. Protocol for preparing the hybridization extract: for example, the RNA or DNA extraction and purification protocol. Labeling protocol(s) External controls (spikes) 13

14 MIAME – Check list Type of scanning hardware and software used: this information is appropriate for a materials and methods section. Type of image analysis software used: specifications should be stated in the materials and methods. A description of the measurements produced by the image-analysis software and a description of which measurements were used in the analysis. The complete output of the image analysis before data selection and transformation (spot quantitation matrices). Data selection and transformation procedures. Final gene expression data table(s) used by the authors to make their conclusions after data selection and transformation (gene expression data matrices). 14

15 Gene Expression Omnibus- GEO
15

16 Public Microarray Databases
BodyMap - SMD - RIKEN - MGI - GEO - CIBEX - ArrayExpress - 16

17 Microarray Platforms Agilent Microarrays 60-mer format
Codelink Bioarrays 30-mer format Affymetrix GeneChips 25-mer format Illumina Beadchips NimbleGen 60-mer format 17

18 RNA quality OD 260/280  1.8-2 Electropherograms: degradation, rRNA peaks Bio-analyzer graphs

19 Microarray data Mining
Biological question Experimental design Microarray experiment Biological verification/ interpretation Estimation/Testing Clustering Classification/Prediction Data analysis Expression quantification Normalization Image analysis Pre-processing 19

20 Microarray data Mining
CDF / CEL Quality assessment Background correction probe level normalization probe set summary Log ratios Log intensities Identify genes Clustering etc 20

21 Microarrays – Image Inspection
Microarray: - Visual inspection of the chip  Scratches, bubbles, uneven hybridization  outlier detection 21

22 Diagnostic plots-RNA degradation
22

23 Box Plots of unnormalized data
23

24 Raw vs Normalized data Raw Data Normalized Data 24

25 Histograms of unnormalized data
25

26 QC stats 26

27 Why Normalize It adjusts the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made. Unequal quantities of starting RNA Differences in labeling or detection efficiencies between the fluorescent dyes used Systematic biases in the measured expression levels. Sample preparation Variability in hybridization Spatial effects Scanner settings Experimenter bias 27

28 Data analysis workflow
28

29 Free Software – Data analysis
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data. TMEV 4.0 is an application that allows the viewing of processed microarray slide representations and the identification of genes and expression patterns of interest. dCHIP DNA-Chip Analyzer (dChip) is a software package for probe-level (e.g. Affymetrix platform) and high-level analysis of gene expression microarrays and SNP microarrays. 29

30 R / Bioconductor R and Bioconductor packages
R ( )is a comprehensive statistical environment and programming language for professional data analysis and graphical display. Bioconductor ( is an open source and open development software project for the analysis of microarray, sequence and genome data. More 300 Bioconductor packages. 30

31 R / Bioconductor - Installation
31

32 OneChannelGUI A graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays Affymetrix IVT, Human Gene 1.0 ST and exon arrays are implemented OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. 32

33 TCL and Tk pacakges ActiveTcl is ActiveState's distribution of Tcl. It is most commonly used for rapid prototyping, scripted applications and GUIs. Install Tcl - Tcl/Tk packages, BWidget and Tktable Install in C:\Tcl Directory 33

34 Installing R/ Active Tcl
34

35 Installing AffylmGUI packages for Affymetrix data
install.packages("affylmGUI",contriburl=" source(" biocLite("affylmGUI", dependencies=TRUE) biocLite("affylmGUI") biocLite("tkrplot") biocLite("affyPLM") biocLite("R2HTML") biocLite("xtable") library(affylmGUI) 35

36 AffylmGUI Browser 36

37 OneChannelGUI Installation
source(" biocLite("oneChannelGUI") biocLite("oneChannelGUI ", dependencies=TRUE) library(oneChannelGUI) 37

38 OneChannelGUI 38

39 Target File creation Create, with excel, a tab delimited file named targets.txt Targets file is made of three columns with the following header: Name, FileName, Target In column Name place a brief name (e.g. c1, c2, etc) In column FileName place the name of the corresponding .CEL file In column Target place the experimental conditions (e.g. control, treatment, etc) Place targets.txt and CEL files into a folder (directory) 39

40 Target File 40

41 Working with OnechannelGUI
B 41

42 Working with OnechannelGUI
Click on “File” to start a new project B C Click on “New” to start a new project Selected 3’IVT arrays D Select working directory that has the .CEL files and targets.txt file 42

43 Working with OnechannelGUI

44 Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 44

45 Quality Control plots Click on Quality Control menu 45

46 QC plots/reports Work with your data set
Plot various QC plots and come up with what arrays are not of good quality Plot RNA degradation plot Download affyQCreport package and create a QC report for the dataset you are working > library(affyQCReport) > QCReport(mydata, file=“reddy.pdf”) 46

47 Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 47

48 Probe set summary A Click on probe set menu
and select the probe set summary and normalization option. B 48

49 Normalization 49

50 Exercise 4 Calculate probe set summaries with GCRMA and RMA
Export and save the normalized values 50

51 Working with OnechannelGUI
Quality control Statistical analysis Normalization Filtering Biological Knowledge extraction Annotation 51

52 Filtering - OnechannelGUI
Signal features: Percent intensities greater of a user defined value Interquantile range (IQR) greater of a defined value Annotation features: Specific gene features (i.e. GO term, presence of transcriptional regulative elements in promoters, etc.) Using Ingenuity pathway knowledge base 52

53 Filtering Perform IQR filter at 0.25 followed by an intensity filter at 50% of the arrays with and intensity over 100. Export the data as tab delimited file. -Question: How many probe sets are left after the first and the second filter? Using transcription factors from Ingenuity create a file containing only the entrez genes without header and use it to filter the data set. Save the data set 53

54 Linear Modeling (Limma)
54

55 Differential Expression
Computer contrasts builds differential expression 55

56 MA and Volcano plots 56

57 Expression values P-values Average intensity Gene Description
Gene Symbol Log2 FC Log-odd statistics T statistics AffyID 57

58 Differential Expression
Use the “Table of Genes Ranked in order of Differential Expression” and filter the genes and export the normalized expression values Plot differentially expressed genes with raw p-value ≤ 0.05 and an absolute fold change ≥ 1 for the two contrasts. Using "Venn Diagram between probe set lists“, evaluate the level of overlap between the two sets. Hint: make two sets from two contrasts 58

59 Thank you http://catalyst.harvard.edu Reddy Gali, Ph.D.
Phone:


Download ppt "Agenda Introduction to microarrays"

Similar presentations


Ads by Google