IDENTIFYING CANCER SUBTYPES BASED ON SOMATIC MUTATION PROFILE BIOINFORMATICS SEMINAR 2016 SPRING YOUJIN SHIN.

Slides:



Advertisements
Similar presentations
Departments of Medicine and Biostatistics
Advertisements

Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University Microarray Data.
Network-based stratification of tumor mutations Matan Hofree.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
What is Cluster Analysis?
Genomic signatures to guide the use of chemotherapeutics Authors: Anil Potti et. al Presenter: Jong Cheol Jeong.
Microsatellite Instability Detection by Next Generation Sequencing S.J. Salipante, S.M. Scroggins, H.L. Hampel, E.H. Turner, and C.C. Pritchard September.
Evaluating cell lines as tumor models by comparison of genomic profiles Domcke, S. et al. Nat. Commun 4:2126.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Gene expression profiling identifies molecular subtypes of gliomas
CS 548 Spring 2015 Showcase by Pankaj Didwania, Sarah Schultz, Mingchen Xie Showcasing Work by Malhotra, Chau, Sun, Hadjipanayis, & Navathe on Temporal.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Cancer is heterogeneous disease! -> enabled characterization of new tumor subtypes for improving personalized treatment and ultimately achieving better.
Supplemental Figures Loss of circadian clock gene expression is associated with tumor progression in breast cancer Cristina Cadenas 1*, Leonie van de Sandt.
LUNG ADENOCARCINOMAS. CLINICOPATHOLOGICAL STUDY WITH RESPECT TO THE UPCOMING NEW CLASSIFICATION AND EGFR-KRAS MUTATION ANALYSIS IMPLICATIONS. First author:
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
INCREASED EXPRESSION OF PROTEIN KINASE CK2  SUBUNIT IN HUMAN GASTRIC CARCINOMA Kai-Yuan Lin 1 and Yih-Huei Uen 1,2,3 1 Department of Medical Research,
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
CHFR METHYLATION AS AN EPIGENETIC MARKER FOR RECURRENCE OF COLON CANCER M. D. Anderson Cancer Center, Houston, Texas Motofumi Tanaka, Salil Sethi, Donghui.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Gene expression. Gene Expression 2 protein RNA DNA.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Supplementary Figure 2: Representative Kaplan-Meier plots of overall survival considering alterations erbB signaling pathway genes and p53 in lung cancer.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Supplementary Table 1. A)Patient details for Fresh Frozen test cohort, B) Patient details for validation cohort Age Median64 Range41-89 Grade
R2 김재민 / Prof. 정재헌 Journal conference 1.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Theme 6. Linear regression
Precision Medicine: The Potential for Identifying Personalized Therapies for Patients with Gastro-Esophageal Cancer.
Uterine serous carcinoma is more aggressive than high-grade serous ovarian carcinoma: a retrospective study H. Nagano1, Y. Tachibana1, M. Kawakami1, M.
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Lung squamous cell carcinoma
Transcriptional heterogeneity of breast cancer subtypes,
Volume 24, Issue 7, Pages (July 2016)
Gene expression.
Table 1 Characteristics of study population, by pneumococcal vaccination status. From: Prior Pneumococcal Vaccination Is Associated with Reduced Death,
Chapter 14: Correlation and Regression
Claudio Lottaz and Rainer Spang
The Genomics of Cancer and Molecular Testing:
Prepared by Lee Revere and John Large
Volume 145, Issue 3, Pages (June 2017)
Volume 29, Issue 5, Pages (May 2016)
Narrative Reviews Limitations: Subjectivity inherent:
Volume 5, Issue 6, Pages e3 (December 2017)
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
Volume 152, Issue 1, Pages (January 2019)
Adjuvant chemotherapy after potentially curative resection of metastases from colorectal cancer. A meta-analysis of two randomized trials E Mitry, A Fields,
Tracing the Innate Genetic Evolution and Spatial Heterogeneity in Treatment Naïve Lung Cancer Lesions Jihye Kim Translational Bioinformatics and Cancer.
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Nadia Howlader, PhD National Cancer Institute
Claudio Lottaz and Rainer Spang
Molecular definitions of lung adenocarcinoma subtypes.
DO NOT POST #4054 Gene expression Difference (GED) Revealed Immune Function Gene UP- or Down-regulation as Tumor-associated Inflammatory Cell (TAIC) Infiltration.
Presentation transcript:

IDENTIFYING CANCER SUBTYPES BASED ON SOMATIC MUTATION PROFILE BIOINFORMATICS SEMINAR 2016 SPRING YOUJIN SHIN

1. INTRODUCTION  Cancer is a complex and heterogeneous disease, mutating combinations of genes that vary greatly among patients.  We are now able to stratify cancer by genomic contents which are closer to fundamental causalities and effects than the classical stratifications that are based mostly on clinical observations.  However, due to its massiveness, there is an urgent need for the ability to analyze molecular defects in cancers efficiently and systematically. Most prior attempts suffer from systematic errors called the ”batch effect”. A few works provide tumor subtypes which have clinical correlation including patient survival and response to chemotherapy.  In this work, to identify tumor subtypes, we focus on the somatic mutation data. Somatic mutations are identified by comparing a tumor sample with a matched normal sample. Generally, somatic mutations are sparse in that proportion of mutation is small compared to the whole genome size and they are heterogeneous in that there are several types of mutation possible for a given loci in a genome. It has been observed that in several cases patients with same clinical profiles do not share even a single mutation. Only a few patients and mutations of loci have enough number of instances.  The sparse and heterogeneous character of the somatic mutation makes it difficult to determine the similarity and the difference among cancer genome. To overcome this difficulty, we introduce 2 somatic mutation profile and distance pairs.

2. MATERIALS AND METHOD  This work is to.. o Identify tumor subtypes o using k-means clustering o based on 2 different representations of somatic mutation data o With proper distance measurement(Jaccard distance / weighted Euclidean distance).  3 types of somatic mutation data used in this work o Ovarian serouscystadeno carcinoma (OV)- 442 patients with genes o Uterine Corpus Endometrial Carcinoma (UCEC)- 247 patients with genes o Lung adenocarcinoma (LUAD)- 517 patients with genes * data from TCGA * discard the patients having less than 10 mutation

2. MATERIALS AND METHOD A :total number of attributes where X=1 and Y=1 B :total number of attributes where X=0 and Y=1 C :total number of attributes where X=1 and Y=0 D :total number of attributes where X=0 and Y=0 => No meaning

2. MATERIALS AND METHOD number of mutation n in i-th gene maximum number of mutation over all genes set of points that belong to cluster i maximum number of mutation over all genes

2.1 EVALUATION MEASURES

3. EXPERIMENTS  Verify the effectiveness of 2 approaches for somatic mutation profile with appropriate distance measures, o 1) binary profile with the Jaccard distance ( B-Jac) o 2) weighted profile with the Euclidean distance ( W-Euc) o And compare the outputs of the 2 approaches with binary profile with the Euclidean distance ( B-Euc )  Then, use k-means clustering to stratify the profile distance pairs for each tumor types o Clusters having less than 10 patients are discarded

3.1 ASSOCIATION OF SUBTYPES AND CLINICAL FEATURE

3.2 SURVIVAL ANALYSIS  To verify the relationship between the extracted subtypes and patient survival time, o Fit a Cox proportional hazards regression model using R, o Conduct a log likelihood-ratio test to compare the fit of 2 models. o Full model : the subtypes and the clinical features o Baseline model :the clinical features including age, clinical stage, neoplasm histologic grade, and residual surgical resection o Smoking history is included for the LUAD data o In UCEC, a survival analysis is not possible due to the low mortality rates of patients

3.2 SURVIVAL ANALYSIS  B-Euc fail to identify meaningful tumor subtypes that have clinical correlations  The identified subtypes by B-Jac and W-Euc are good indicators for patient survival time  In the case of ovarian cancer(OV), o B-Jac have meaningful predictive power of the survival time (log-rank P-value= 1.09×10 − 3 where the # of subtypes= 8) - The mean survival of the most aggressive tumor subtype is approximately 27 months - The mean survival of the least aggressive tumor subtype is more than 36 months - (Fig 3-(a)). o W-Euc have meaningful predictive power of the survival time (log-rank P-value= 2.23×10 − 3 where # of subtypes= 6) - The mean survival of the most aggressive tumor subtype is approximately 15 months - The mean survival of the least aggressive tumor subtype is more than 41 months - (Fig 3-(b)).

3.2 SURVIVAL ANALYSIS  B-Euc fail to identify meaningful tumor subtypes that have clinical correlations  The identified subtypes by B-Jac and W-Euc are good indicators for patient survival time  In the case of lung cancer(LUAD), o B-Jac have significant predictive power of the survival time (log-rank P-value= 1.08×10 − 7 where # of subtypes= 7) - The mean survival of the most aggressive tumor subtype is approximately 13 months - The mean survival of the least aggressive tumor subtype is more than 22 months - (Fig 4-(a)). o W-Euc have significant predictive power of the survival time (log-rank P-value= 7.09×10 − 8 where # of subtypes= 6) - The mean survival of the most aggressive tumor subtype is approximately ten months. - The mean survival of the least aggressive tumor subtype is more than 20 months - (Fig 4-(b)).

4. CONCLUSION  In this work, we identify tumor subtypes based on somatic mutation profiles.  To resolve the sparsity problem of the somatic mutation data, we use appropriate distance measure for the binary profile and suggest a new profile called weighted somatic mutation profile.  More precisely, Jaccard distance is used for binary mutation profile, and Euclidean distance is used for somatic mutation profile weighted according to the mutation frequency.  For a tumor stratification, k-means clustering is used due to its simplicity and interpretability.  According to the results, for the typical binary somatic mutation profile, the Jaccard distance is more appropriate than the Euclidean distance.  Also, the weighted somatic mutation profile with the Euclidean distance performs comparably to the binary somatic mutation profile with the Jaccard distance according to the predictive power of the clinical features and the survival time analysis.

THANK YOU