Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Charlie Whittaker – BIG meeting 12/3/14
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
. Differentially Expressed Genes, Class Discovery & Classification.
Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Head and Neck Cancer: microRNA analysis
Gene expression analysis
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
The Broad Institute of MIT and Harvard Differential Analysis.
Jin MENG Shen FU (DPD 08) Biology 2 - Head/Neck and CNS Tumors
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of.
David Amar, Tom Hait, and Ron Shamir
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Classification with Gene Expression Data
Clustering Manpreet S. Katari.
Predicting Recurrence in Clear Cell Renal Cell Carcinoma
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
An Artificial Intelligence Approach to Precision Oncology
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
CellExpress Examples A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Claudio Lottaz and Rainer Spang
The Functional Impact of Alternative Splicing in Cancer
Characterization of microRNA transcriptome in tumor, adjacent, and normal tissues of lung squamous cell carcinoma  Jun Wang, MD, PhD, Zhi Li, MD, PhD,
Gene expression analysis
Volume 5, Issue 5, Pages (May 2004)
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
The Functional Impact of Alternative Splicing in Cancer
Volume 4, Issue 3, Pages (August 2013)
Diagnostics and Prognostics
Integrating human omics data to prioritize candidate genes
Molecular prognostication of liver cancer: End of the beginning
Altered Caspase-8 Expression
Volume 26, Issue 12, Pages e5 (March 2019)
Didi Amar and Tom Hait Group meeting October 2013
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Distinct molecular and clinical correlates of H3F3A mutation subgroups
AXL is not expressed in human prostate tumors.
EN1 expression in breast cancer and clinical outcome.
Claudio Lottaz and Rainer Spang
Volume 28, Issue 3, Pages e7 (July 2019)
Volume 28, Issue 4, Pages e6 (July 2019)
Presentation transcript:

Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al. 2012 Presented by Yves A. Lussier MD PhD The University of Chicago

Head and Neck Squamous Cell cancer A group of similar cancers Spread to lymph nodes Strongly related to environmental risk factors Alcohol, smoking, UV lights, viruses ~40,000 new cases in the US per year Hard to predict recurrence Early detection is crucial

Computational goals Use gene expression data for patient classification and recurrence survival Improve interpretability of the classifiers by integrating pathway information Keep classification performance high Standardize across multi datasets

Sample analysis Input: an expression profile of a sample A vector of real values for each patient Step 1: rank the genes Step 2: calculate a score for each gene Rank of gene g in sample s Total number of ranked genes

Gene set features Gene sets: GO terms or KEGG pathways Define the normalized centroid (NC) of a gene set to be the average weights of the genes Calculate the NC score for each gene set and its complement

Gene set features Calculate the NC score for each gene set and its complement

Gene set features The score of a gene set is the difference between the two Which results in the FAIME profile for each sample

FAIME summary

Differential pathways Differential analysis for each pathway\GO feature Healthy FAIME scores Sick FAIME scores Get empirical p-values by shuffling the labels (1000 repeats)

Analysis overview 1

Data used 8 GE HNSCC datasets 3 for training 5 for validation Controls can be Samples from an independent healthy individuals Paired samples – sample from distant uninvolved site Paired samples – samples from the margins of the tumor

Data used

Comment: preprocessing is not sample based! Download raw data (cel files) from ArrayExpress Use MAS5 to get the expression matrix Probe -> genes Keep the probe with the largest inter-quantile range Remove genes with negative average expression across samples Affy’s MAS5 implementation should produce non negative values Comment: preprocessing is not sample based!

Data used

Stability of FAIME scores Dataset A 22 HNSCC 22 independent controls Boxplot for each sample Values are pathway scores

Reproducibility tests Compared methods FAIME Pathway p-value calculated using Z-scores Hypergeometric enrichment analysis Differential genes (from the original papers) GSEA (actually, they used GSA) CORG (Lee et al. 2008; Ideker lab) Test the overlap after running the methods on datasets A-C

Reproducibility tests

Reproducibility across methods Goal: test if FAIME’s significant pathways cover the standard analysis Idea: use the standard analysis (GSEA, or HG) as gold standard Given FAIME’s pathways calculate precision-recall curves based on pathway sizes For a threshold k calculate precision and recall considering pathways with > k genes

Reproducibility across methods

Validation on independent datasets Use the pathways and GO terms that were stable in datasets A-C 57 features Test their ability to classify new datasets D: n=35; E:n=91 Classification: just cluster the samples to two groups (next slides) Average linkage hierarchical clustering

Validation on independent datasets Extra-cellular events Nuclear events, cell cycle No Errors Metabolism

Validation on independent datasets 3 Errors

Additional dataset (supplement) 4 Errors

Survival analysis For datasets E and F a follow-up study is given for the cancer patients Cluster the HNSCC samples to two groups Used the CLARA algorithm Compare the sample clusters based on the survival data (recurrence free survival)

Survival analysis

Additional dataset (supplement)

Survival analysis: comments The same flow with GSEA or Mean-G: survival plots are not significant (p>0.07) In the publication of dataset E another HNSCC specific clinical test (called HPV) did not produce significant separation.

Prostate cancer (Chen et al. 2013, BMC Medical Genomics) Analyzed three case-control datasets Validation: survival analysis on an independent dataset

FAIME scores Re-scale FAIME scores (not justified):

Tested gene sets Gene ontology Cancer modules (Segal et al. 2004) ~4000 terms Cancer modules (Segal et al. 2004) ~2000 samples covering 22 tumor types Cluster to 454 modules

Reproducibility GO CM FAIME

Survival analysis Recurrence: if PSA (prostate cancer blood biomarker) is present – suggesting bad prognosis

Relevant to us? Ranking based analysis can help in integration of different datasets Pathway level scores provide standardized set of features across datasets The FAIME score can be improved The idea of using case control studies for survival analysis on independent data is a strong validation scheme