Download presentation
Presentation is loading. Please wait.
Published byBernard Jacobs Modified over 9 years ago
1
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu by Fabio Vandin, Eli Upfal, and Benjamin J. Raphael Genome Research, 2012
2
Problem overview Cancer is caused by a genetic mutation, or set of mutations, that leads to uncontrolled growth and division A driver pathway is any pathway such that a mutation in the pathway leads to cancer. A mutation in a driver pathway is called a driver mutation Other mutations are called passenger mutations Problem statement: Given: A set of cancer genomes Goal: Find the driver mutations
3
Challenges Passenger mutations are difficult to discern from driver mutations Cancer genomes are highly heterogeneous in respect to both passenger and driver mutations – Many combinations of driver mutations may lead to cancer – Cannot test all combinations of genes
4
Assumptions As is often done in computational biology, we make some assumptions to make the problem well defined: – Driver mutations tend to be rare and thus can be assumed to be mutually exclusive, meaning that if a cancer genome has one driver mutation it does not have another – A set of driver mutations should “explain” the global set of cancer genomes. Meaning that each cancer genome should have one driver mutation.
5
Formulating the objective function With these assumptions we search for a set of mutations with high: – Coverage – most patients have at least one mutation in the set of driver mutations – Exclusivity – a patient has only one driver mutation Given: A set of cancer genomes Goal: Find a set of mutations with maximum coverage while maintaining exclusivity
6
Formulating the objective function number of genes number of patients an matrix where if gene is mutated in patient – the set of patients for which gene is mutated – the set of patients that have a mutation in some gene in the set of genes
7
Formulating the objective function A set of genes is mutually exclusive if for all pairs of genes the following holds: An submatrix of is mutually exclusive if each row of the submatrix contains at most one value of 1
8
Formulating the objective function The problem can now be restated mathematically: PROBLEM: driver mutations may not be measured as mutually exclusive due to experimental error. Furthermore, passenger mutations may co-occur in driver pathways. Given: A mutation matrix and Goal: Find a mutually exclusive submatrix of size with the largest number of non-zero rows
9
Formulating the objective function We must reformulate the problem. Our current formulation is too strict. Instead of strictly mutual exclusive mutations, we’ll attempt to find approximately exclusive mutations: – most patients have no more than 1 mutation in This introduces a tradeoff: Increase coverage Decrease exclusivity Why? We can always increase coverage by adding a new mutation to our set of driver mutations. But this mutation might be highly non-exclusive
10
Formulating the objective function To make this problem mathematically well-defined, we need to formalize this tradeoff We measure the coverage overlap using the following equation: Given 2 genes (red) and (blue) we can visualize this equation as: The area of the overlap is
11
Formulating the object function We measure the tradeoff between coverage and exclusivity with the following measure: Penalizes non-exclusivity. The lower the better. Measures coverage. The higher the better. Given: A mutation matrix and Goal: Find a submatrix of size that maximizes
12
Formulating the objective function
13
Maximizing the objective function The authors prove that solving this problem is NP-hard Roughly, this translates to the fact that we need to try every combination of genes to find the one that maximizes Thus, we require either an algorithm for finding an approximate solution, or a heuristic
14
The Greedy Approach Greedily add mutations to the current set of driver mutations as long as the objective function increases until genes are added: 1. 2.for : 1. 2. 3.return
15
Results-Greedy approach Even with this very naïve approach, we can make interesting guarantees on its accuracy under the gene independence model – Gene mutations are independent – Driver genes have high coverage – Each driver mutation contributes to the value Can prove that under this model, we would need 2,400 patients to maximize the objective function with probability 1- (1x10 -4) – This number of patients is not currently available
16
Better idea: MCMC Markov Chain Monte Carlo (MCMC) is a method for sampling from a complicated joint probability distribution Problem: Solution: Form a Markov chain such that its stationary distribution is the distribution of interest Given: A joint distribution Goal: Generate a sample
17
Quick review: Markov chains A Markov chain is a basic model for modeling a stochastic process. It consists of a set of states and probabilities for transitioning from state to state Example: The stationary distribution is the probability of being in each state if we let the random process traverse from state to state for an infinite amount of time
18
The MCMC Approach Sample from sets of genes in proportion to We do so by forming a Markov chain such that each state in the Markov chain is a associated with a set of genes Stochastically transition from state to state. The most frequently visited state is most likely have the highest
19
The MCMC Approach More specifically, given current state we obtain as follows: 1.Choose a gene uniformly at random from the global set of genes 2.Choose gene uniformly at random from 3.Let 4.With probability set otherwise
20
The MCMC Approach With this definition of the transition matrix, the stationary distribution is The authors prove that this Markov chain approaches its stationary distribution quickly
21
Results – Simulated data Generated 2 simulated datasets – A dataset starting from a set of 6 genes – A dataset consisting of 2 driver pathways and Control coverage and exclusivity Simulate passenger mutations using observed characteristics in Glioblastoma data Simulated both single-nucleotide mutations as well as copy- number abberations (CNAs) Ran the MCMC algorithm for 10 7 iterations and sampled every 10 4 iterations on each dataset
22
Results – Simulated data
24
Results – real data Built matrices from various cancer genome studies Searched for sets of size Once a statistically significant set of mutations was found, they remove them from the matrix and re-run the algorithm to find new sets Performed a statistical test. The test statistic was and the null model was obtained by independently permuting the mutations for each mutation group among the patients – This preserved the mutation frequency – The reason for doing this is to assess the significance of the coverage and exclusivity given a fixed mutation frequency
25
Results – multiple cancer types
26
Results – Lung adenocarcinoma
27
Results – Glioblastoma multiforme
28
Discussion Is there an underlying network model? In contrast to nearly every other method that we have discussed in this class, this method does not utilize a biological network such as a protein-protein or protein-DNA interaction network – Can we incorporate such a network into this method? Are coverage and exclusivity the best metrics for finding driver mutations? Does their objective function correctly capture coverage and exclusivity? What other methods could they have tried in order to solve their combinatorial optimization problem? How can this method be validated with biological experiments?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.