Agenda Introduction to microarrays

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

Affymetrix Gene Expression Microarrays Application to Pulmonary Arterial Hypertension Bob Stearman 02/24/2014.
Introduction to Microarray Gene Expression
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2012.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Microarrays Pauliina Munne
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
CDNA Microarray Design and Pre-processing By H. Bjørn Nielsen.
The Human Genome Project and ~ 100 other genome projects:
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
High Throughput Sequencing
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
Agenda Introduction to microarrays
Microarray Preprocessing
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
‘Omics’ - Analysis of high dimensional Data
Analysis of Microarray Data 1.Scan the images 2.Quantify intensity of spots 3.Normalization 4.Analysis of data 5.Identification of genes of interest 6.Validation.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Lecture 22 Introduction to Microarray
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Microarray - Leukemia vs. normal GeneChip System.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2014.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Computational Biology and Bioinformatics Lab. Songhwan Hwang Functional Genomics DNA Microarray Technology.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Microarray - Leukemia vs. normal GeneChip System.
Getting the numbers comparable
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Microarray Data Analysis
Presentation transcript:

Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu http://catalyst.harvard.edu

Agenda Introduction to microarrays Workflow of a gene expression microarray experiment Microarray experimental design Public microarray databases Microarray preprocessing - Quality control and Diagnostic analysis

Agenda Introduction to R/Bioconductor Installation of R and Bioconductor Packages General data analysis and strategies Data analysis using lumi package Data analysis using limma package 2

Workflow of Gene Expression Biological question Experimental design QC Tissue / sample preparation Extraction of Total RNA Probe amplification & labeling Microarray hybridization & processing Image analysis Data analysis Expression measures - Normalization - Statistical Filtering - Clustering - Pathway analysis Biological Verification

Pitfalls of Microarray Experiment Gene expression changes detected by microarray analysis cannot be validated by other methods - Inadequate design Data quality is low - Statistical approach is not adequate - Expression level of gene is below detection limit - Change in gene expression is small - Microarray detection probe is not specific or not sensitive 4

Questions usually asked What kind of technology or microarrays I have to use How many replicates do I need What is a real replicate Do I need statistical advice Should I do technical replicate Should I pool my samples How do I analyze my dataset What software should I use 5

Design of Microarray Experiment Replicates Goal, resources, technology, quality, design and analysis Two fold change – 3 replicates Smaller change – 5 replicates Technical replicates and Biological replicates Sample pooling Amount of sample Replicates of pooled sample No way to find variance between samples 6

Gene Expression Omnibus- GEO 7

Public Microarray Databases BodyMap - http://bodymap.ims.u-tokyo.ac.jp/ SMD - http://genome-www5.stanford.edu/ RIKEN - http://read.gsc.riken.go.jp/ MGI - http://www.informatics.jax.org/ GEO - http://www.ncbi.nlm.nih.gov/geo/ CIBEX - http://cibex.nig.ac.jp/index.jsp ArrayExpress - http://www.ebi.ac.uk/microarray-as/ae/ 8

Microarray Platforms Agilent Microarrays 60-mer format Codelink Bioarrays 30-mer format Affymetrix GeneChips 25-mer format Illumina Beadchips NimbleGen 60-mer format 9

Illumina Bead Array Technology Silica Beads Each bead is covered with hundreds of thousands of copies of a specific oligonucleotide 10

Some Facts Each bead carries copies of probes with, on average, 30 replicates of every bead type per array Around 105 copies of a particular DNA sequence of interest are covalently attached to each bead DNA sequences (oligonucleoties) attached to the beads are 75 base pairs in length, with 25 base pairs used for decoding and 50 base pairs used for target hybridization A pool of different bead types is created, beads of the same type having the same probe sequence attached

Box Plots of unnormalized data 12

Raw vs Normalized data Raw Data Normalized Data 13

Histograms of unnormalized data 14

Why Normalize It adjusts the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made. Unequal quantities of starting RNA Differences in labeling or detection efficiencies between the fluorescent dyes used Systematic biases in the measured expression levels. Sample preparation Variability in hybridization Spatial effects Scanner settings Experimenter bias 15

Free Software – Data analysis Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data. TMEV 4.0 is an application that allows the viewing of processed microarray slide representations and the identification of genes and expression patterns of interest. 16

R / Bioconductor R and Bioconductor packages R (http://cran.r-project.org/ )is a comprehensive statistical environment and programming language for professional data analysis and graphical display. Bioconductor (http://www.bioconductor.org/) is an open source and open development software project for the analysis of microarray, sequence and genome data. More 300 Bioconductor packages. http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html 17

R / Bioconductor - Installation 18

Preparing R for analysis

Preparing R for analysis

Preparing R for analysis

Preparing R for analysis

Preparing R for analysis

Analysis using lumi R package - Loading data into R/Bioconductor >lumi_data <- lumiR(‘worshop_data.csv') Summary of the loaded data >lumi_data - Quality control of loaded data >summary(lumi_data, 'QC')

>density(lumi.Rdata)

>boxplot(lumi.Rdata)

>MAplot(lumi.Rdata)

>> plot(lumi.Rdata, what='sampleRelation') >> plot(lumi.Rdata, what=‘cv') >> plot(lumi.Rdata, what=‘outlier')

Variance Stabilization > lumi.Tdata <- lumiT(lumi.Rdata) > lumi.VSdata <- plotVST(lumi.Tdata)

> lumi.Ndata <- lumiN(lumi.Tdata) Normalization > lumi.Ndata <- lumiN(lumi.Tdata) Or Do all the default preprocessing in one step > lumi.N.Q <- lumiExpresso(lumi.Rdata) Background Correction: bgAdjust Variance Stabilizing Transform method: vst Normalization method: quantile Perform all the QC again > summary(lumi.Ndata, 'QC')

Differential expression >design <- model.matrix(~ -1 + factor(c(1, 1, 1,1, 2, 2, 2,2))) >colnames(design) = c("control","affected") >fit <- lmFit(lumi.Ndata, design) >cont.matrix <- makeContrasts(signature = affected - control,levels=design) >fit2 <- contrasts.fit(fit, cont.matrix) >ebFit <- eBayes(fit2) >results <- topTable(ebFit, number=100, sort.by="B", resort.by="M") >print(results) >write.table(topTable(ebFit, coef=1, adjust="fdr", sort.by="B", number=25000), file="results.xls", row.names=F, sep="\t")

Thank you http://catalyst.harvard.edu Reddy Gali, Ph.D. rgali@hms.harvard.edu Phone: 617 432 7471 http://catalyst.harvard.edu