# HYPE Hybrid method for parameter estimation In biochemical models Anne Poupon Biology and Bioinformatics of Signalling Systems PRC, Tours, France.

## Presentation on theme: "HYPE Hybrid method for parameter estimation In biochemical models Anne Poupon Biology and Bioinformatics of Signalling Systems PRC, Tours, France."— Presentation transcript:

HYPE Hybrid method for parameter estimation In biochemical models Anne Poupon Biology and Bioinformatics of Signalling Systems PRC, Tours, France

The question Ay0y0 By1y1 y2y2 Cy3y3 k0k0 k1k1 k2k2 k3k3 k4k4 How to simulate the evolutions of the different quantities as a function of time ?

The question A A B B

A A B B This is the topology of the model Or static model inference graph influence graph...

The question A A B B k0k0 k1k1 Ordinary differential equation (ODE) The dynamical model: topology + time-evolution rules

The question A A B B k0k0 k1k1 Ordinary differential equation (ODE) Using mass action law :

The question A A B B k0k0 k1k1 Ordinary differential equation (ODE) Using mass action law : A A B B k0k0 k1k1 C C

The question A A B B k0k0 k1k1 k 0 = 1 k 1 = 1 [A](0) = 10 [B](0] =0

The question A A B B k0k0 k1k1 k 0 = 2 k 1 = 1 [A](0) = 10 [B](0] =0

The question A A B B k0k0 k1k1 What if we don’t know the values of the parameters ?

The question A A B B k0k0 k1k1 What if we don’t know the values of the parameters ? Experimental values

The question A A B B k0k0 k1k1 What if we don’t know the values of the parameters ? Experimental values Try to find k 0 and k 1 such as the simulated curves fit with experimental values

The question A A B B k0k0 k1k1 What if we don’t know the values of the parameters ? Experimental values Try to find k 0 and k 1 such as the simulated curves fit with experimental values That’s parameter estimation !

The question Iterative methodology Initial k 0, k 1 Simulate Compare with exp. values Objective function Compare with exp. values Objective function Change k 0, k 1 Done !

The question A A B B k0k0 k1k1 Why do we want to simulate ?

The question A A B B k0k0 k1k1 Why do we want to simulate ? If we can find with sufficient precision the parameters of the model, we can simulate its behavior in any condition without doing the experiments !

In order to parameterize biochemical models, we need a parameter estimation method : fast, so different topologies can be explored robust, we want to make sure that the parameter found is correct, and that it’s unique flexible, so different types of data can be used : dose-response, time series, relative mesurments, etc. The question However...

Test models In order to develop the method, we need a benchmark... We cannot use real systems because the parameters are unknown ! We will use synthetic models. Synthetic models allow to evaluate the influence of experimental uncertainty. We also designed the different models in order to evaluate the importance of 2 different features: the number of molecular species the range between smallest and biggest parameter value

Model 1 1 for all parameters Model 2 parameters from 2.10 -3 to 1,28.10 2 (5 logs) Model 3 parameters from 5.10 -7 to 1.10 5 (12 logs) 4 equations 8 parameters Test models Ay0y0 By1y1 y2y2 Cy3y3 k0k0 k1k1 k2k2 k3k3 k4k4 k 5 = A + y 0 k 6 = B + y 1 + y 2 k 7 = C + y 3

Test models A y0y0 By1y1 Dy4y4 C y2y2 y3y3 k0k0 k1k1 k2k2 k3k3 k6k6 k7k7 k8k8 k9k9 k4k4 k5k5 k 10 = A + y 0 k 11 = B + y 1 k 12 = C + y 2 + y 3 k 13 = D + y 4 Model 4 5 equations 14 parameters

Model 5 10 equations 27 parameters Test models A y0y0 B y1y1 y2y2 Cy3y3 Dy4y4 y5y5 E y6y6 y7y7 Fy8y8 Gy9y9 k0k0 k1k1 k3k3 k2k2 k5k5 k4k4 k6k6 k7k7 k8k8 k9k9 k 10 k 11 k 13 k 12 k 15 k 14 k 16 k 17 k 18 k 20 = A + y 0 k 21 = B + y 1 + y 2 k 22 = C + y 3 k 23 = D + y 4 + y 5 k 24 = E + y 6 + y 7 k 25 = F + y 8 k 26 = G + y 9

Model 6 16 equations 42 parameters Test models k 34 = A + y 0 k 35 = B + y 1 + y 2 k 36 = C + y 3 + y 4 k 37 = D + y 5 + y 6 + y 7 k 38 = E + y 8 + y 9 k 39 = F + y 10 k 40 = G + y 11 + y 12 k 41 = H + y 13 + y 14 + y 15

Theoretical parameters Concentrations at chosen time points « Experimental » data Parameter estimation ODE Integration Add perturbation (pert) HYPE Observed error Real error Principle

The objective function is what we need to minimize Objective function

Normalized difference between observed and simulated in % Objective function

Normalized difference between observed and simulated in % Sum on all the time point for this observable Objective function

Normalized difference between observed and simulated in % Sum on all the time point for this observable Divided by the number of time points for this observable Objective function

Normalized difference between observed and simulated in % Sum on all the time point for this observable Divided by the number of time points for this observable Sum on observables Objective function

Normalized difference between observed and simulated in % Sum on all the time point for this observable Divided by the number of time points for this observable Sum on observables Divided by the nb of observables Objective function

Normalized difference between observed and simulated in % Sum on all the time point for this observable Divided by the number of time points for this observable Sum on observables Divided by the nb of observables Depends on standard deviation Objective function

Optimization method Now we need an optimization method.... Evolutionnary methods are usually very good at finding extrema in a large and rought solution space ! Most popular: genetic algorithm

1 individual = (x, y); Parameter 1 Parameter 2 Genetic algorithm

Mutation Parameter 1 Parameter 2 Genetic algorithm

Cross-over Parameter 1 Parameter 2 Genetic algorithm

New parents Parameter 1 Parameter 2 Genetic algorithm

Cross-over : global exploration Mutation : local exploration After enought generations, all the individuals are close to the global minimum. Genetic algorithm

ModelBest error Model 10.0045 Model 20.0076 Model 30.00018 Model 40.026 Model 50.083 Model 60.17

Genetic algorithm ModelBest error Model 10.0045 Model 20.0076 Model 30.00018 Model 40.026 Model 50.083 Model 60.17 Not so bad... But not that good !

Parameter 1 Parameter 2 CMA-ES

Parameter 1 Parameter 2 CMA-ES

Parameter 1 Parameter 2 CMA-ES

New parent : weighted average Parameter 1 Parameter 2 CMA-ES

Best direction Parameter 1 Parameter 2 CMA-ES

Parameter 1 Parameter 2 CMA-ES

New generation Parameter 1 Parameter 2 CMA-ES

Genetic algorithm ModelGenetic Algorithm CMA-ES Model 10.00450 Model 20.00760 Model 30.00018NC Model 40.026NC Model 50.083NC Model 60.17NC

Genetic algorithm ModelGenetic Algorithm CMA-ES Model 10.00450 Model 20.00760 Model 30.00018NC Model 40.026NC Model 50.083NC Model 60.17NC When CMA-ES converges its very good, but...

Hybrid method Let’s try to combine them ! Genetic algorithm always converges, but not very close to the solution CMA-ES doesn’t converge often, but when it does, it gets very close

Hybrid method GA runs … CMA-ES runs … best GA runs … CMA-ES runs … best Average GA runs … CMA-ES runs … best Median GA runs … CMA-ES runs … best Best (i) One/Best (ii) Average/Best (iii) Median/Best (iv) Best/Best

Hybrid method GA runs … CMA-ES runs … best GA runs … CMA-ES runs … best Average GA runs … CMA-ES runs … best Median GA runs … CMA-ES runs … best Best (i) One/Best (ii) Average/Best (iii) Median/Best (iv) Best/Best

Hybrid method ModelGenetic Algorithm CMA-ESHybrid Model 10.004500 Model 20.007600 Model 30.00018NC0 Model 40.026NC2 10-6 Model 50.083NC6.3 10-4 Model 60.17NC1.4 10-4

Hybrid method Modelstdev=0stdev=0.1stdev=0.2 Model 102.4 10-31 10-2 Model 201 10-45.4 10-3 Model 306 10-51.1 10-3 Model 42 10-62.7 10-23.5 10-2 Model 56.3 10-43 10-27 10-2 Model 61.4 10-42.6 10-24.7 10-2 What happens if we have uncertainties on measured values ? Real errors: compare simulation with real values, not perturbed ones Error is significantly lower than uncertainty on experimental data.

1 random parameter set, 1 experimental data set EP x10 GA x10 Praxis x10 PS x10 HJLMSD Hype x10 Best set (Copasi error) 7 parameter sets Best set (Copasi error) Hype error Best set Comparison with other methods SSm Go x10 Best set (SSmGo error)

Comparison with other methods MethodM1M2M3M4M5M6 EP0.0391.31.530.640.10.56 GA-SR0.0340.360.560.890.150.31 HJ0.0420.43.320.960.0930.086 LM0.0390.0870.0770.610.0791.56 Praxis0.0390.725.570.670.10.35 P Swarm0.0390.370.192250.070.2 SD0.0251.9315.420.620.471.36 SSmGo0.0390.0574410.0410.077.93 HYPE0.0110.00540.00110.0350.070.047

Simulation in control conditions (model 2) Comparison with other methods Ay0y0 By1y1 y2y2 Cy3y3 Errors : Copasi: 0.088 HYPE: 0.0054

Simulation in perturbed conditions (k1/100) Comparison with other methods Ay0y0 By1y1 y2y2 Cy3y3 Errors : Copasi: 0.17 HYPE: 0.06

0.617.93 * ** *** Comparison with other methods For all 6 models HYPE is significantly more predictive

0.617.93 * ** *** Comparison with other methods For all 6 models HYPE is significantly more predictive If the model is good enought it is predictive !

A real model... Control of the balance between G and beta- arrestine pathways at the angiotensine receptor Heitzler et al. Competing G protein-coupled receptor kinases balance G protein and β- arrestin signaling. Molecular systems biology. 01/2012; 8:590.

A real model... What is the situation ? 11 equations 3 observables: DAG in control conditions PKC in control conditions ERK in control conditions + 4 perturbed conditions 32 unknown parameters for control conditions If we use only control conditions, we don’t have enought data ! So we use the perturbed conditions, but then... 55 ODEs 36 unknown parameters

A real model... DAG PKC ERK ctl + Ro ERK ctl + Si barr2 ERK ctl + si GRK23 ERK ctl + si GRK56

A real model... Is the model predictive ?

A real model... Is the model predictive ? Make a prediction...

A real model... Is the model predictive ? Make a prediction...... then do the experiment !

A real model...

4 parameter sets with very low error Highest simulated value Lowest

Estimations Log (k10) Are estimated parameters reliable ? Darker bar: value of the parameter in the set. Colored region: values of the parameter for which the error remains small (< 3 fold) For k10 same value in all sets A real model...

A real model

A second type of behavior: different values in the different sets, but we could take 0 in the 4 sets.

A real model 25 parameters have same values in the 4 sets

A real model 25 parameters have same values in the 4 sets 8 parameters with upper bounds

A real model 25 parameters have same values in the 4 sets 8 parameters with upper bounds The 2 remaining parameters have the same values in 3 of the 4 sets.

What next ? 3 problems remain: Identifiability of the parameters, can we do something more formal (and more generic !) Convergence efficiency: only 4 good parameter sets over 60 estimations Computation time: one optimisation of the model takes about 3 weeks on a single core.

k 5 =0.002 Log(k 5 ) Log(erreur observée) Identifiability

k 0 =0.07 k 1 =128 k 2 =24 k 5 =0.002 k 4 =0.6 k 3 =1 k 6 =0.01 k 7 =1 Identifiability

When we use unperturbed data, we can reach very small errors, and the estimated parameter values are very close to the expected ones. Problem is, is real world we don’t have unperturbed data !

Identifiability Expected 0.0655

Identifiability Expected 1.10 5

Identifiability Let’s come back to the equations...

Identifiability Let’s come back to the equations... p 2 appears in many places...

Identifiability Let’s come back to the equations... whereas p 5 appears only once as a product with p 0

Identifiability Expected 0.05

Identifiability Can we do something more formal... Can we express the variation of the error as a function of the variation of the parameters ?

Identifiability Can we do something more formal... Can we express the variation of the error as a function of the variation of the parameters ? Write the Taylor expansion: If t is close to a point a, then the concentrations at a+t are:

Taylor expansion Since we use only mass action laws, each ODE can be written as: Using the ODEs we get:

Taylor expansion t has to be close to a, but how close ? Within the covergence radius. We cannot compute the exact convergence radius, but we can have an upper bound: Using We can show that

Taylor expansion This result is very interesting because this upper bound does not depend neither on i, nor on a ! Why not use that to integrate the ODEs ? ODE integration is the most CPU consuming task in the process...

Taylor expansion Globally very satisfying ! Less computation time and more convergence. RosenbrockTaylor M110119.5 M25738094 M36045273 M4570533 M555151866 M68713418158 Computation times (min) per good optimisation

Taylor expansion RosenbrockTaylor M110119.5 M25738094 M36045273 M4570533 M555151866 M68713418158 Globally very satisfying ! Less computation time and more convergence. But... Computation times (min) per good optimisation

Taylor expansion When the convergence radius becomes two small, Rosenbrock is more efficient. Hybrid: if R < 0.05 go back to Rosenbrock RosenbrockTaylorHybrid M110119.523 M257380941611 M36045273270 M4570533170 M5551518661498 M687134181586888

« Drifting » parameters What about the « drifting » parameters ?

« Drifting » parameters We can evaluate the difference between the simulations made using the 2 parameter sets at time  using Taylor expansion: By definition: We can compute the first term:

« Drifting » parameters For model 3, i=0, this term is: If p1 doesn’t vary and p 0 p 5 =     Since p 0 is small (5.10 -7 ), this term is neglictible as long as |    remains small. Consequently p 0 « wanders » between 10 -11 and 0.1 while p 5 varies between 1 and 10 8. But the product reaches the expected value. However it cannot be null or nothing happens in the system !

Taylor expansion When one parameter reaches very high values, the convergence radius gets very small. Consequently we can limit this parameter drifting by adding small penalty to the error when the convergence radius becomes too low. RosenbrockTaylorHybridHyb + pen M110119.52317 M2573809416111330 M36045273270280 M4570533170112 M5551518661498941 M6871341815868886627

Conclusions It is possible to reliabily estimate the parameters of a model, even in cases where: the standard deviations on experimental data is important the number of observables is small the parameter have very different values In practice the models are identifiable, thanks to mass action ! The problem of drifting can be identified and contained to a certain extent We can get to errors small enought so that the model becomes predictive

BIOS group Domitille Heitzler Eric Reiter Pascale Crépieux Astrid Musnier Kelly Leon Guillaume Durand Nathalie Gallay-Langonne Laurence Dupuy Christophe Gauthier Vincent Piketty People... LRI, Orsay Jérôme Azé Nikolaus Hansen INRIA Rocquencourt Frédérique ClémentFrançois Fages Aurélien Rizk Duke University Robert J. Lefkowitz Seungkirl Ahn Jihee Kim Jonathan D. Violin

Download ppt "HYPE Hybrid method for parameter estimation In biochemical models Anne Poupon Biology and Bioinformatics of Signalling Systems PRC, Tours, France."

Similar presentations