Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University.

Similar presentations


Presentation on theme: "BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University."— Presentation transcript:

1 BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University Sumona Mondal Clarkson University Fusheng Tang University of Arkansas, Little Rock

2 Goals Provide a guideline for teachers in different disciplines to explore different -omics data. The instructor will guide the students through a tutorial of the experimental process, including: data retrieval, statistical design and analysis, biological analysis, and model validation.

3 Module Outline 1. Introduce Microarray and RNAseq technology. 2. Locate available public expression data 3. Formulate questions from the dataset. 4. Design computational and statistical experiments. 5. Interpret biological significance of identified genes. (UniProt, IntAct, and Reactome will be used.) 6. Validate the biological model (using ATLAS).

4 Step 1: Introduce gene expression and microarray and RNAseq technology. How is gene expression measured? Introduce microarrays and RNAseq. Compare and contrast these two.  What is gene expression?

5 Step 2: Locate available public expression data ArrayExpress is a database of gene expression and other microarray data at the European Bioinformatics Institute (EBI) www.ebi.ac.uk/arrayexpress/

6 5 Sample data set (from EBI ArrayExpress)

7 Obtaining data; an example Go to ArrayExpress and search “colon cancer.” Select Accession E-GEOD-42368, titled “p53- dependent regulation of gene expression following DNA damage” for Homo sapiens. Download the processed data as a zip file. Create a spreadsheet (e.g., Excel) and copy over the data into it, one column per sample. Each column should have an ILMN_ID number, and then for each sample, an expression level and p-value. Organize the data by increasing p-values. Use david.abcc.ncifcrf.gov/ to locate gene names from ILMN_IDs.

8 Preprocessing Why Preprocessing?: The data may have non- biological variation in the standardized data. Thresholding Scaling (log transformation) Standardize Normalization (Quantile Normalization) Reducing the data set (by pairwise t-test) 7

9 8 Were there genes whose expression profiles were correlated with colon cancer? If so, how can we accurately determine which of the samples are cancerous based entirely on gene expression profiles? Can any subtypes be identified by cluster analysis across samples ? Step 3: Formulate questions about the data

10 9 Class Prediction: Develop a multi-gene predictor of class label for a sample using its gene expression profile. (pairwise t-test) Class Discovery: Use a various clustering algorithms to discover clusters among samples and genes. (K-means, hclust, PAM,…) Step 4: Computational and statistical experiments with R & Bioconductor

11 10 Hierarchical Clustering Results Over expressed in cancer tissues Over expressed in normal tissues Gene 187 (Hsa.9972)

12 Step 5: Model for Cancer Therapy 11 NCEH1 20X ABCBs 2~3X ABCB7 10X Down-regulation of NCEH1 blocks cancer development?

13 Step 6: Validation of Model Search PubMed for NCEH1 and cancer http://www.ncbi.nlm.nih.gov/pubmed/17052608 12

14 13 Thank you!


Download ppt "BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University."

Similar presentations


Ads by Google