BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Mutual Information Mathematical Biology Seminar
Microarray GEO – Microarray sets database
Microarray Data Preprocessing and Clustering Analysis
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Cluster Analysis Class web site: Statistics for Microarrays.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Gene expression profiling identifies molecular subtypes of gliomas
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
A Bioinformatics Meta-analysis of Differentially Expressed Genes in Colorectal Cancer Simon Chan, Thursday Trainee Seminar – October 11.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Gene expression analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
Microarray Data Analysis The Bioinformatics side of the bench.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Arrowsmith extensions to bio-informatics Vetle I. Torvik.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
David Amar, Tom Hait, and Ron Shamir
An Artificial Intelligence Approach to Precision Oncology
miRPathDB: A Specialized Professional Database with Upkeep Concerns
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Altered Caspase-8 Expression
Knowledge-Guided Sample Clustering
Presentation transcript:

BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University Sumona Mondal Clarkson University Fusheng Tang University of Arkansas, Little Rock

Goals Provide a guideline for teachers in different disciplines to explore different -omics data. The instructor will guide the students through a tutorial of the experimental process, including: data retrieval, statistical design and analysis, biological analysis, and model validation.

Module Outline 1. Introduce Microarray and RNAseq technology. 2. Locate available public expression data 3. Formulate questions from the dataset. 4. Design computational and statistical experiments. 5. Interpret biological significance of identified genes. (UniProt, IntAct, and Reactome will be used.) 6. Validate the biological model (using ATLAS).

Step 1: Introduce gene expression and microarray and RNAseq technology. How is gene expression measured? Introduce microarrays and RNAseq. Compare and contrast these two.  What is gene expression?

Step 2: Locate available public expression data ArrayExpress is a database of gene expression and other microarray data at the European Bioinformatics Institute (EBI)

5 Sample data set (from EBI ArrayExpress)

Obtaining data; an example Go to ArrayExpress and search “colon cancer.” Select Accession E-GEOD-42368, titled “p53- dependent regulation of gene expression following DNA damage” for Homo sapiens. Download the processed data as a zip file. Create a spreadsheet (e.g., Excel) and copy over the data into it, one column per sample. Each column should have an ILMN_ID number, and then for each sample, an expression level and p-value. Organize the data by increasing p-values. Use david.abcc.ncifcrf.gov/ to locate gene names from ILMN_IDs.

Preprocessing Why Preprocessing?: The data may have non- biological variation in the standardized data. Thresholding Scaling (log transformation) Standardize Normalization (Quantile Normalization) Reducing the data set (by pairwise t-test) 7

8 Were there genes whose expression profiles were correlated with colon cancer? If so, how can we accurately determine which of the samples are cancerous based entirely on gene expression profiles? Can any subtypes be identified by cluster analysis across samples ? Step 3: Formulate questions about the data

9 Class Prediction: Develop a multi-gene predictor of class label for a sample using its gene expression profile. (pairwise t-test) Class Discovery: Use a various clustering algorithms to discover clusters among samples and genes. (K-means, hclust, PAM,…) Step 4: Computational and statistical experiments with R & Bioconductor

10 Hierarchical Clustering Results Over expressed in cancer tissues Over expressed in normal tissues Gene 187 (Hsa.9972)

Step 5: Model for Cancer Therapy 11 NCEH1 20X ABCBs 2~3X ABCB7 10X Down-regulation of NCEH1 blocks cancer development?

Step 6: Validation of Model Search PubMed for NCEH1 and cancer 12

13 Thank you!