基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司 2011.5.28.

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

Visualization and analysis of large data collections: a case study applied to confocal microscopy data Wim de Leeuw, Swammerdam Institute for Life Sciences,
An Introduction to Bioconductor Bethany Wolf Statistical Computing I April 4, 2013.
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Getting the numbers comparable
University of Louisville The Department of Bioinformatics and Biostatistics.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Microarray Data Analysis - A Brief Overview R Group Rongkun Shen
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Analysis of microarray data
Microarray data analysis towards the understanding the role of hzy in the formation of rhabdomeres Ashwini Oke School of Informatics, Indiana University.
An Introduction to Bioconductor Bethany Wolf Statistical Computing I April 9, 2014.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Lab 4 R and Bioconductor II Feb 15, 2012 Alejandro Quiroz and Daniel Fernandez
CDNA Microarrays MB206.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Panu Somervuo, March 19, cDNA microarrays.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Introduction to BioConductor 許家維 許文馨 游崇善 陳彥如. Bioconductor BioConductor 起初是由 Fred Hutchinson 癌症研究 中心發起的計畫,之後有許多來自不同國家的研 究人員參與,這個計畫是一個為了分析理解基因 體資料的開放源碼計劃。
Agenda Introduction to microarrays
BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
David R. McWilliams, Ph.D. Section of Statistical Genetics, Department of Biostatistical Sciences, Center for Public Health Genomics Bioinformatician IV.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Computational Biology and Bioinformatics Lab. Songhwan Hwang Functional Genomics DNA Microarray Technology.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Dept of Biomedical Informatics University of Pittsburgh
Getting the numbers comparable
Cancer Cell Line Encyclopedia
Presentation transcript:

基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司

Outline Introduction to Microarray Introduction to R/Bioconductor Expression Profiling analysis using R/Bioconductor 2

Introduction to Microarray DNA – Array-based SNP Detection – Array-based CNV Detection – DNA Methylation Microarray RNA – Gene Expression Profiling Microarray – MicroRNA Microarray Protein Cell Application – Human health Prediction Prevention Personalization – Species identification pathogen bacteria – Breeding –

Introduction to Microarray 4 sample target hybridization label probe image Data analysis

Introduction to Microarray Data Quality assessment Background adjustment – non-specific hybridization, the noise in the optical detection system Normalization – different efficiencies of reverse transcription, labeling, or hybridization reactions – physical problems with the arrays – reagent batch effects – laboratory conditions summarization – multiple probes Non-specific filtering Differentially expressed genes Multiple testing Heatmap 5

Introduction to R Robert C. Gentleman Ross Ihak R vs. S, SAS, Matlab, Stata Started in 1992, first emerged in 1996 free, open-source program R and perl, C, Java

Robert C. Gentleman – ~ 至今, senior director, bioinformatics and computational biology,Genentech – 2004~2009.8, Adjunct Professor, Department of Statistics, University of Washington, Seattle WA – ,Adjunct Associate Professor, Department of Biostatistics, Harvard University, Boston, MA – , Visiting Professor, University of Ghent, Ghent, Belgium – , Associate Professor, Dana-Farber Cancer Institute and Harvard University, Department of Biostatistics 2001, Bioconductor project, NIH – , Visiting Scholar, Harvard University, School of Public Health, Department of Biostatistics – , Senior Research Fellow, University of Auckland, Clinical Trials Research Unit, Department of Medicine – , Senior Lecturer, University of Auckland, Department of Statistics – , Lecturer, University of Auckland, Department of Mathematics and Statistics Developed R – , Assistant Professor, University of Waterloo, Department of Statistics and Actuarial Science

Introduction to Bioconductor R Bioconductor : – The Bioconductor project started in 2001 and is overseen by a core team, based primarily at the Fred Hutchinson Cancer Research Center, and by other members coming from US and international institutions.core teamFred Hutchinson Cancer Research Center – It gained widespread exposure in a 2004 Genome Biology paper.Genome Biology

背景介绍 Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, more than 460 packages, and an active user community.460 packages Introduction to Bioconductor

Bioconductor Books Bioinformatics and Computational Biology Solutions Using R and Bioconductor R Programming for Bioinformatics Bioconductor Case Studies

Install Bioconductor Packages Install R Install a selection of core Bioconductor packages >source(" > biocLite() Install a particular package, e.g., limma > biocLite("limma") > biocLite(c("GenomicFeatures", "AnnotationDbi"))

Bioconductor Mailing Lists Search Mailing Lists

User Guides and Package Vignettes

Expression Profiling Analysis Preprocessing: Oligonucleotide Arrays library("affy") ReadAffy(); #input data expresso(); #Background adjustment,Normalization,Summarization justRMA(); #more efficient exprs(); library(simpleaffy) ampli.eset <- call.exprs(cel,"mas5",sc = target) qcs <- qc(cel,ampli.eset) 14

Expression Profiling Analysis Preprocessing: Two-Color Spotted Arrays library(limma) read.maimages(); #input data backgroundCorrect(); #Background adjustment normalizeWithinArrays(); #Normalize within arrays normalizeBetweenArrays(); #Normalize between arrays exprs.MA(); #Extract expression values avereps(); #Summary plotMA(); # MA plot 15

Expression Profiling Analysis Non-specific filtering – Intensity-based – variability across samples – fraction of Present calls – R packages : genefilter 16

Differentially expressed genes library(samr) samr(); #Significance analysis of microarrays library(multtest) mt.rawp2adjp(); #Adjusted p-values for simple multiple # testing procedures library(limma) lmFit(); #Linear Model for Series of Arrays eBayes(); #Empirical Bayes Statistics for #Differential Expression 17 Expression Profiling Analysis

Clustering and visualization library(amap) hcluster(); #Hierarchical Clustering #more efficient than hclust() dist(); #Distance Matrix Computation library(ctc) r2gtr(); #Write to gtr, atr, cdt file format for Treeview r2atr() r2cdt() library("gplots") heatmap.2(); #extensions to the standard R heatmap() 18 Expression Profiling Analysis

Workflow – Intergration – Independence Methods – Write R scripts/functions for each step – Call the scripts according to the analysis demand DOS: R CMD BATCH SAM.r perl etc. 19 Expression Profiling Analysis

Efficiency – Time: 8h vs. 24h – Cost: Machine vs. people – Accuracy: Reduce human error – Experience: slaves and slave owners 20 Expression Profiling Analysis

Thank you! Questions?