Microarrays 1/31/2018.

Slides:



Advertisements
Similar presentations
Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.
Advertisements

Introduction to Microarray Gene Expression
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2012.
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Microarray Normalization
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
The Human Genome Project and ~ 100 other genome projects:
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Information Aspects of Nucleic Acids Measurement Technologies Description of nucleic acid measurement technologies Algorithmic, optimization, data analysis.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Microarray Preprocessing
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Microarray - Leukemia vs. normal GeneChip System.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2014.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Comparison of Microarray Data Generated from Degraded RNA using Five Different Target Synthesis Methods and Commercial Microarrays Scott Tighe and Tim.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Microarray: An Introduction
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
The Central Dogma. Life - a recipe for making proteins DNA protein RNA Translation Transcription.
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Some statistical musings Naomi Altman Penn State 2015 Dagstuhl Workshop.
Transcriptomics History and practice.
Microarray - Leukemia vs. normal GeneChip System.
Microarray Experiment Design and Data Interpretation
Genomic analysis: Toward a new approach in breast cancer management
Functional Genomics in Evolutionary Research
Functional Genomics in Evolutionary Research
Microarray Technology and Applications
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Lecture 11 By Shumaila Azam
Overview Expression data basics Introduction Biological network data
The Basics of Microarray Image Processing
Precision Profiling and Components of Variability Analysis for Affymetrix Microarray Assays Run in a Clinical Context  Thomas M. Daly, Carmen M. Dumaual,
Introduction to Microarrays.
Transcriptomics History and practice.
Getting the numbers comparable
Gene Expression Analysis
Microarray Data Analysis
Data Type 1: Microarrays
Design Issues Lecture Topic 6.
Presentation transcript:

Microarrays 1/31/2018

Acknowledgement This lecture was, in part, designed to be consistent with lecture material from Johns Hopkins University. Please see the link below for more information. http://www.ams.jhu.edu/~dan/550.435/notes/COURSENOTES435.pdf

Central Dogma of Biology Microarrays are a tool that help figure out why, when, and to what extent genes are transcribed

Definitions Gene expression == mRNA abundance Increased expression == induced, or up-regulated Decreased expression == down-regulated Difference in expression == differential expression

Research Questions Which genes are expressed in a given context? Which genes are expressed differently between different contexts? Which genes’ expression correlate with a variable of interest? What do the (differentially) expressed expressed genes tell us about the biological processes of the system?

“A collections of microscopic DNA spots attached to a solid surface” What Is A Microarray? “A collections of microscopic DNA spots attached to a solid surface”

What Is A Microarray?

Microarray Experiments Experimental questions: Disease vs Normal tissue Changes in expression over time Before and after drug treatment Before and after toxin exposure Different tumor subtypes Data driven questions: Which genes are differentially expressed? Which genes are co-expressed? Given gene expression, can we predict a condition?

Impact of Microarrays 82,731 publications in PubMed Breast cancer prognosis tool Breast cancer predicts chemotherapy response Colon cancer assay predicts risk of recurrence Prostate cancer assay predicts risk of progression

Microarray Study Design Biological Question 1. Setup 2. Data pre-processing 3. Data analysis 4. Interpretation

Two Types of Microarrays Spotted arrays High density oligonucleotide arrays

Oligo Arrays (in situ) Also called Affy arrays (orig. Affymetrix) Probe - a unique, short cDNA sequence Probes synthesized on the chip Terminology: probe - an individual 25nt sequence probe cell - millions of copies of a probe placed together probe set - a set of unique probes for a given gene

Oligo Arrays (in situ) Each gene is represented by a set of probes Probe sets Unique to each gene Multiple probe cells tiled across exons

Oligo Arrays (in situ)

Oligo Arrays - Labeling & Scanning Sample cDNA tagged w/ fluorescent marker Sample washed over flowcell Unbound cDNA washed away Luminescence quantified in a scanner

Oligo Arrays - Older Technologies Affymetrix U133A & B Probe cells grouped together, led to biases and artifacts

Oligo Arrays - Older Technologies Used perfect match and mismatch probes Mismatch probes contain...mismatch in probe sequence to account for non-specific binding

Oligo Arrays - Technology Update U133 Affy arrays designed against 2001 draft human genome Mismatch probes didn’t work well Genome reference improved, more genes needed to be profiled Gene ST array created and current

Oligo Arrays - Current Technology Several chips: 3’ IVT - gene expression, measures 3’ UTR Gene - gene expression, tiled sequence Exon - DNA sequence, exon sequences only Tiling - DNA sequence, tiled across genome

Microarray Analysis Methods

Typical Processing Steps

Normalization Statistically adjust a set of expression matrices so they are comparable Includes: Background correction - remove artifacts/noise Data normalization - adjust probe distributions across arrays to ensure comparability Probe summarization - combine probe intensities within probeset to gene-level

Normalization

Old Normalization Method: MAS5 Measured value = Noise + Probe Effects + Signal Background correction Intensity adjustment Examine perfect/mismatch probe ratio Calculate % present AUC The good - usable with single chip, p-value for gene expression The bad - uses mismatch probes, very complicated

State of the Art Normalization: RMA Robust Multi-array Average (RMA) Background correction - kernel density estimation based Normalization - quantile normalization across batch of arrays Summarization - linear additive model controlling for probe effects, gene expression, and error

RMA Normalization

Quality Control - RNA Quality RIN - RNA Integrity Number Ranges 0-9 RNA Degradation Plot RNA degrades 5’ -> 3’ Probe intensity tends to increase accordingly Slope >2 may indicate poor RNA quality

Quality Control - RLE Subtract median intensity across all arrays from each probe Median RLE != 0 -> number of up/down genes not the same Large IQR means many genes are differentially expressed

Quality Control - NUSE Normalized Unscaled Standard Error Arrays w/ large IQR or median NUSE > 1 may be poor quality

Quality Control - PCA Principal Component Analysis Data dimensionality reduction technique Identifies “directions of variance”

Quality Control Rules of Thumb RLE, NUSE, RNA Degradation, and PCA No single metric is indicative of poor quality Flagged samples should be examined before omission Make decisions based on multiple criteria Rule of thumb: Median RLE > 0.10 Median NUSE > 1.05 RIN < 4.0

Correcting for Batch Effects Batch effects - variation in data due to technical effects: Arrays processed on different days The technician who ran the arrays Sample collection site Variance in reagent type/quality Confounding - when batch effects are correlated with condition of interest ComBat - R package to correct batch effects

Correcting for Batch Effects

R Primer and Demonstration

R R is a statistical computing language Free and open source Designed for statistics, data analysis, and visualization Can be run on command line More commonly run with rstudio

Bioconductor

Project Principles

Kindness Patience Acceptance Trust Honesty KPATH

Trust Honesty Patience Acceptance Kindness THPAK

Most ideas are “bad” Many “bad” ideas lead to “good” ideas

You are not your ideas Your ideas are not you

This Room Is A Safe Place To Dissent Remember: KPATH (or THPAK)

Project Setup Suggested project directory structure: project_1/ samples/ ← save sample files here reference/ ← annotations, etc analysis/ ← all code README.txt ← text description of project

Use Relative Paths Portable code in analysis/my_script.R: sample.info <- read.csv(‘../reference/annot.csv’) Nonportable code: sample.info <- read.csv( ‘/projectnb/bf528/group_1/project_1/reference/annot.csv’ )

For Next Time Read project description and paper Familiarize yourself with the Workshop 4 on R, up through the Data Wrangling section http://foundations-in-computational-skills.readthedocs.io