Simulating Biological Systems in the Stochastic Pi-calculus

Simulating Biological Systems in the Stochastic Pi-calculus
This talk presents some ongoing work on a programming language for biological systems Most of this work was done in collaboration with Luca Cardelli Abstract: This talk presents a programming language for simulating computer models of biological systems. The language is based on a computational formalism known as the pi-calculus, and the simulation algorithm is based on standard kinetic theory of physical chemistry. The language will first be presented using a simple graphical notation, which will subsequently be used to model and simulate a range of biological systems, including a genetic oscillator and a peptide editing pathway of the immune system. One of the benefits of the language is its scalability: large models of biological systems can be programmed from simple components in a modular fashion. Andrew Phillips with Luca Cardelli Microsoft Research, Cambridge UK

Biological Computing In many ways, biological systems are like massively parallel, highly complex, error-prone computer systems. For example, the genetic code of a biological organism is stored in digital form as DNA, much like a computer code, except that instead of 0s and 1s there are four letters, G,A,T,C. The code is located inside every cell of the organism. A special protein travels along the DNA to read the code, much like a computer reads a sequence of instructions in a program. Not all of the DNA is read at once. Instead, the DNA is divided into sections or genes, in the same way that a computer program is divided into modules. In general, each gene produces a separate strand of RNA, which is more accessible, like loading a module into memory. The RNA produces protein, which folds into different shapes to perform the essential functions of a living organism. Some proteins can decide which genes are switched on and off, and when. They do this by interacting with the DNA, in the same way that a computer program decides what code to execute next in response to signals from its environment.

Modelling Biology The Human Genome project: Systems Biology:
Map out the complete genetic code in humans To understand and predict gene and protein behaviour Like reading the source code of a computer program... But functional meaning of the code is still a mystery! Systems Biology: Understand and precisely describe the behaviour of biological systems Two complementary approaches: Look at experimental results and infer general system properties Build detailed models of systems and test these in the lab Biological Modelling: Conduct virtual experiments, saving time and money. Need tools for modelling complex parallel systems. Should also scale up to very large systems. The beginnings of a biological programming language... The Human Genome Project had the ambitious goal of mapping out the complete genetic code in humans. Scientists had hoped that by discovering this code, they would be able to understand and predict the behaviour of genes and proteins in the human body. A bit like reading the source code of a computer program in order to find out what it does. Instead, what they got was a long sequence of letters that was difficult to understand. Like studying a sequence of 0s and 1s from a computer program, and trying to find out what the program does. They did not know the set of instructions that the genetic code represented. It was then that the discipline of Systems Biology emerged, whose ultimate goal is to decipher the code of the human genome, in order to understand how we function as a complex system. Systems Biology can be broadly divided into two complementary approaches. On the one hand, scientists are doing experiments in the lab and studying the results, in order to infer general properties of biological systems. On the other hand, scientists are trying to build detailed models of systems on a computer and then testing these models in the lab. This is the approach that we are focussing on. Such models can be a powerful tool, since they allow a biologist to simulate a range of experiments on a computer, before testing the most promising ones in the lab, saving considerable time and money. They also clarify how a biological system functions, and are beginning to play a key role in understanding and curing disease. In order to build such models, we need tools that are suitable for modelling complex, parallel biological systems. We also need tools that can scale up to very large systems. This points towards the need for a biological programming language.

Programming Biology stochastic p-calculus
Languages for complex, parallel computer systems: Languages for complex, parallel biological systems: Over the years, there has been considerable research on designing programming languages that are suitable for developing complex parallel computer systems. Interestingly, some of this research is also applicable to biological systems, which are typically highly complex and massively parallel. One example of such a language is the stochastic pi-calculus... [Computing] In many ways, biological modelling is pushing the boundaries of concurrent programming. From a computer science perspective this is becomming increasingly important, as multi-core takes over to compensate for limited clock speeds. stochastic p-calculus

Reactions vs. Components
Traditional modelling: model individual reactions Reaction equations are a de-facto standard of biological modelling. In this diagram, each shape represents a protein in the system and each box represents a reaction. For example, the red protein can bind to the yellow protein. Then they can unbind. But what happens if we add a new component, say an orange protein. Need to explicitly state how this protein interacts with every other protein in the system. This can lead to a combinatorial explosion in the number of nodes and edges. (e.g. the red protein can bind to the yellow protein in presence and in absence of the green protein, and so can the orange protein, etc.)

Traditional modelling: model individual reactions As our model grows we will need to add new proteins, and new reactions.

Large, Connected Reaction Graphs
Eventually we end up with a very large, highly connected reaction graph.

Traditional modelling: model the individual reactions Stochastic p-calculus: model the components One way to solve this problem is to decompose a biological system into distinct components, and describe each component separately. This is the approach used by the stochastic pi-calculus. Each component is described by a separate connected graph. For example, we have a graph for the red protein, and one for the yellow protein. Each node in the graph represents a state of the component, and each labelled edge represents a reaction, which can be either unary or binary. Unary reactions such as degradation are labelled with a reaction name. Binary reactions such as binding and unbinding are labelled with a reaction channel preceded by an input (?) or output (!) prefix. Two components can interact by doing a complementary input and output on the same channel. Each reaction name or channel is associated with a corresponding rate, which denotes the speed of the reaction. Instead of explicitly saying which protein interacts with which other protein, we describe the channels over which a protein can interact. This adds a layer of abstraction to the model, where interactions are determined by looking at edges with complementary inputs and outputs. [Biology: If we think in terms of DNA, we know that complementary DNA sequences can interact. So rather than explicitly saying which sequence interacts with which other sequence, we simply say which sequences are complementary.] In the graph we see that the yellow protein has a particular binding site (?bind). We also see that the red protein has a complementary binding site (!bind), and so does the orange protein. So we deduce that the yellow protein can interact with both the red and orange proteins.

Compositional Modelling
Build complex models incrementally, by direct composition of simpler components: This allows new components to be added in parallel, without modifying the existing system. As a result, large and complex systems can be defined incrementally, by direct composition of simpler components. Furthermore, we can modify a particular component in a modular way, without modifying the rest of the graph. We can compare this approach to modular programming of computer systems. For small programs it is ok to use linear code. But as programs become larger, the code becomes more difficult to maintain, and modules become indespensable. In the long term, the use of modules in biological modelling could have a similar impact as the use of modules in computer programming.

Model Analysis A formal programming language
Analysis techniques (types,equivalences,model-checking) Could help provide insight into fundamental properties of biological systems The pi-calculus is also a mathematical programming language. There has been decades of research on analysis techniques for pi-calculus. Many of these techniques are directly applicable to biological modelling, and could help provide insight into fundamental properties of biological systems.

Equivalent Models Can we replace one model with another?
For example, we can use a notion of equivalence to help a modeller decide: when is it ok to replace one model with another? This can be a useful programming tool

Related Work Stochastic p-calculus proposed by [Priami, 1995]
Used to simulate a range of biological systems: RTK MAPK pathway [Regev et al., 2001] Gene Regulation by positive Feedback [Priami et al., 2001] Cell Cycle Control in Eukaryotes [Lecca and Priami, 2003] First simulator for stochastic p-calculus [BioSPI] A subset of p-calculus with limited choice Compiles a calculus process to an FCP procedure Executed by the FCP Logix platform [Silverman et al., 1987] There has been considerable work on using the stochastic pi-calculus to model biological systems. We are building on this work in our current research.

Graphical Stochastic p-calculus
Display stochastic p-calculus models as graphs [Phillips and Cardelli, 2005] Helps make the stochastic p-calculus more accessible Defined a graphical calculus and graphical execution model Proved equivalent to the stochastic p-calculus [Phillips, Cardelli and Castagna, 2006] One of our projects was to develop a graphical representation for the stochastic pi-calculus. The main objective was to help make the calculus more accessible to biologists. We defined a graphical calculus and graphical execution model, and proved its equivalence with the stochastic pi-calculus. Defined translations from graphs to code and back. This allows the graphical and textual representations to be used interchangeably. For example, instead of reading the code on the left, we can look at the diagram on the right, which was automatically generated from the code. let gLow() = (Low() | gLow()) and Low()= ( new do !pep(low); ?low; Low() or ) let gMed() = (Med() | gMed()) and Med()= ( new do !pep(med); ?med; Med() let gHigh() = (High() | gHigh()) and High()= ( new do !pep(high); ?high; High() let gMHC() = | MHCo() ) and gTPN() = gTPN() | TPN() ) and TPN()= ( new do !tpn(uT); ?uT; TPN() or and MHCo()= do ?pep(u);MHCc_pep(u) or ?tpn(uT);MHCo_TPN(uT) or and MHCc_pep(u:chan) = do MHCe_pep(u) or !u; MHCo() and MHCe_pep(u:chan) = and MHCo_TPN(uT:chan)= do ?pep(u)*a; MHCc_TPN_pep(u,uT) or !uT*v; MHCo() and MHCc_TPN_pep(u:chan,uT:chan)= do !uT; MHCc_pep(u) or !u*q; MHCo_TPN(uT) PROVED EQUIVALENT

The Stochastic Pi Machine
A simulation algorithm for stochastic p-calculus [Phillips and Cardelli, 2004] Based on standard theory of chemical kinetics [Gillespie, 1977] The probability of a reaction is proportional to its rate Proved correct with respect to the stochastic p-calculus. Another project was to develop a simulation algorithm for the stochastic pi-calculus The algorithm is based on standard theory of chemical kinetics. The probability of a particular type of reaction is proportional to the rate of the reaction times the number of reactants Was proved correct with respect to the stochastic pi-calculus. PROVED CORRECT

The SPiM Simulator Simulation algorithm mapped to functional code (F#)
Used as the basis for implementing the SPiM simulator. [Phillips, 2006] GUI by James Margetson, MSRC Close correspondence between formal algorithm and functional code Correct specification improves confidence in simulation results Used in various research centres (UK, France, Italy, Sweden...) The simulation algorithm was mapped to functional program code, which was used to implement the SPiM simulator. The simulator is implemented in F#, which was close to the formal specification, allowing rapid implementation. The correctness of the specification improves the confidence in the simulation results. The simulator is now used in various research centres across Europe.

Visualising Simulations in 3D
Generate a 3D view of the interactions Software by Rich Williams, MSRC Used our graphical representation to generate a 3D view of simulations. This is particularly useful for visualising the causality relation between components. Also useful for visualising the large number of interactions, which are calculated on the fly.

Course Outline The Stochastic Pi-calculus Gene Networks
Signalling Networks Immune System Pathway

The Stochastic Pi-calculus
Introductory Tutorial

Calculus Syntax p::= ?x(m) Input value m on channel x
!x(n) Output value n on channel x tr Delay at rate r P::= p1.P pN.PN Choice between actions P1 | ... | PM Parallel composition of processes X(n) Instance of X with arguments n new x1,...,xN P Restriction of names x1,...,xN to P E::= X(m) = P Definition of X, where fn(P) Í m E1, ... ,EN Union of definitions The stochastic pi-calculus is essentially a calculus of actions and processes. Specifically developed for modelling concurrent systems in Computer Science, but can be used to model Biological systems in a nice way. Each process can be used to describe the behaviour of a molecule in the system, such as a gene or protein. The actions describe what the molecules can do. In particular, a molecule can do an input, an output or just a delay. Delay represents an internal reaction like radioactive decay, or change of shape. Delay is associated with a rate, which determines the average duration of the reaction. Like the rate of radiactive decay, which is used to determine the half-life. Input and output represent interactions between two molecules, which interact by performing a complementary input and output on the same channel. Can represent two proteins with complementary shapes, or two chemicals that are known to react with each other. Can also send values, such as an electron or a phosphate. Note that 3-way reactions are extremely rare: very low probability of three molecules reacting at exactly the same time. Usually two molecules interact, and then a third. Pi-calculus fits this model nicely. Often a given molecule can do more than one thing. E.g. can either react with another molecule or decay. Represented as a choice of actions. We can of course have multiple molecules in parallel. Represented as parallel composition. Each parallel process represents a separate molecule. We also want to define different types of molecules, using parameterised “macros” or “procedures”. These are defined in an environment. Finally, want to have private bonds, which are used to represent the formation of complexes. Will describe this later.

Graphical Syntax Choice Parallel Instance Restriction P
p1.P pN.PN P1 | ... | PM X(n), if X(m) = P new x1,...xN We developed a simple graphical representation for the calculus. A choice pi1.P piN.PN is displayed as an elliptical node with directed edges to processes P1, , PN. Each edge to a process Pi is labelled with an action pi_i and denotes an alternative execution path in the system. A parallel composition P1 | | PN is displayed as a solid rectangular node with directed edges to processes P1, , PN. Each edge to a process Pi denotes a concurrent execution path in the system. An instance X(n) of a definition X(m) , P is displayed as a solid rectangular node with a directed edge to process P. The tip of the edge is labelled with the substitution {n/m}. Pairings in the substitution that correspond to the identity function are not shown, and empty substitutions where n = m are omitted altogether. Restricted names are displayed by placing them next to the node of the process. Processes in the environment are displayed next to each other. Definition Union E X(m) = P E1, ... ,EN

Execution: Stochastic Delay
tr.P pN.PN Showing how we can execute a process of the pi-calculus. Present the most basic execution rules. Summose we have a molecule. Can do a number of actions, including an internal action.

Execution: Stochastic Delay
tr.P pN.PN ¾® P1

Execution: Interaction
!x(n).P pN.PN | ?x(m).Q pM.QM Two molecules can interact, if they have complementary shape or chemical properties. Represented as interaction over a channel.

Execution: Interaction
!x(n).P pN.PN | ?x(m).Q pM.QM ¾® P1 | Q1{n/m} {n/m}

Execution: Binding Interaction
new n (!x(n).P pN.PN ) | ?x(m).Q pM.QM

Execution: Binding Interaction
new n (!x(n).P pN.PN ) | ?x(m).Q pM.QM ¾® new n ( P1 | Q1{n/m} ) n {n/m}

Ionization: Na + Cl  Na+ + Cl-
let Na() = !ionize; Na_plus() and Na_plus() = ?deionize; Na() run Na() let Cl() = ?ionize; Cl_minus() and Cl_minus() = !deionize; Cl() run Cl() Chemical examples Na can ionize Cl at rate(ionize) = 100s-1 Cl- can deionize Na+ at rate(deionize) = 10s-1

let Na() = !ionize; Na_plus() and Na_plus() = ?deionize; Na() run Na() let Cl() = ?ionize; Cl_minus() and Cl_minus() = !deionize; Cl() run Cl() Na can ionize Cl by an output on the ionize channel

let Na() = !ionize; Na_plus() and Na_plus() = ?deionize; Na() run Na_plus() let Cl() = ?ionize; Cl_minus() and Cl_minus() = !deionize; Cl() run Cl_minus() Cl- can deionize Na+ by an output on the deionize channel

let Na() = !ionize; Na_plus() and Na_plus() = ?deionize; Na() run Na() let Cl() = ?ionize; Cl_minus() and Cl_minus() = !deionize; Cl() run Cl() Na and Cl are no longer charged

A number of Na and Cl atoms can be composed in parallel.

One of the Na atoms can ionize one of the Cl atoms.

Additional Na and Cl atoms can interact in parallel.

A Cl- ion can deionize any of the Na+ ions.

These reactions can continue indefinitely...

Virtual Experiment: Na + Cl  Na+ + Cl-
What happens if we mix 100×Na and 100×Cl ? Use a more compact representation to count populations. The colour is proportional to the number of atoms:

One of the Na atoms can ionize one of the Cl atoms.

Additional Na and Cl atoms can interact in parallel.

A Cl- ion can deionize any of the Na+ ions.

Eventually an Equilibrium is reached...

At equilibrium: 100×[Na][Cl] = 10×[Na+][Cl-] Approximately 76×Na and 24×Na+

Binding: H + Cl  HCl H has a private electron e.
let H() = new ( !share(e); H_Bound(e)) and H_Bound(e) = !e; H() let Cl() = ?share(e); Cl_Bound(e) and Cl_Bound(e) = ?e; Cl() run ( H() | Cl() ) H has a private electron e. H can share its electron with Cl at rate(share) = 100s-1 HCl can break its private bond at rate(e) = 10s-1

Binding: H + Cl  HCl let H() = new ( !share(e); H_Bound(e)) and H_Bound(e) = !e; H() let Cl() = ?share(e); Cl_Bound(e) and Cl_Bound(e) = ?e; Cl() run ( H() | Cl() ) H can share its electron with Cl on the share channel.

Binding: H + Cl  HCl let H() = new ( !share(e); H_Bound(e)) and H_Bound(e) = !e; H() let Cl() = ?share(e); Cl_Bound(e) and Cl_Bound(e) = ?e; Cl() run new e (H_Bound(e) |Cl_Bound(e)) HCl can break its private bond by synchronising on e. e

Binding: H + Cl  HCl H and Cl are no longer bound
let H() = new ( !share(e); H_Bound(e)) and H_Bound(e) = !e; H() let Cl() = ?share(e); Cl_Bound(e) and Cl_Bound(e) = ?e; Cl() run (H() | Cl() ) H and Cl are no longer bound

Binding: H + Cl  HCl A number of H and Cl atoms can be composed in parallel.

Binding: H + Cl  HCl One of the H atoms can bind with one of the Cl atoms

Binding: H + Cl  HCl Additional H and Cl atoms can bind in parallel.

Binding: H + Cl  HCl e A single HCl molecule can split into H and Cl atoms. e

Binding: H + Cl  HCl e These reactions can continue indefinitely...

Virtual Experiment: H + Cl  HCl
As with the previous reaction, we mix 100×H and 100×Cl

One of the H atoms can bind with one of the Cl atoms

Additional H and Cl atoms can bind in parallel. e x1

A single HCl molecule can split into H and Cl atoms. e x2

Eventually an Equilibrium is reached... e x1

At equilibrium: 100×[H][Cl] = 10×[HCl] Approximately 3×H and 97×HCl

Calculus Syntax p-calculus: Graphical p-calculus:
[see paper on a graphical representation for the stochastic pi-calculus] Syntax of S, with processes P,Q, actions , channels x, y and tuples m, n. In a biological setting, each process typically describes the behaviour of a molecule, such as a gene or protein, and each action describes what a given molecule can do. A delay action r describes a change in the internal structure of a molecule, such as a radioactive decay or a change in shape. Each delay is associated with a rate r that characterises an exponential distribution. In the case of radioactive decay, the rate determines the half-life of the reaction. Two molecules can interact by performing a complementary input ?x(m) and output !x(n) on a common channel x. This can represent two proteins with complementary shapes, or two chemicals with complementary electronic configurations. In practice, reactions between more than two molecules are extremely rare, since the probability of three or more molecules interacting simultaneously is very low. Thus, the binary interaction model of the stochastic -calculus fits well with the biological reality. Values m, n can also be sent and received during a reaction, e.g. to represent the transfer of an electron or a phosphate from one molecule to another. A choice of actions 1.P N.PN represents the ability of a molecule to react in N different ways, while a parallel composition P1 | | PM represents the existence of M molecules in parallel. A definition of the form X(m)=P represents a particular type of molecule X, parameterised by m. The parameters are assumed to contain all of the free names of P, written fn(P) m. The definitions are recorded in a constant environment E, which is assumed to contain a single definition for each instance X(n). A process P together with its constant environment E denotes a system in the calculus, written E ` P. Finally, a restriction x P is used to represent the formation of complexes between molecules, where a complex of two processes P and Q is modelled as x (P | Q). The restriction denotes a private channel x on which the two molecules can synchronise to split the complex.

Calculus Semantics P ~ Q a a P’ ~ Q’
Interaction labels in S, where fn() and bn() denote the set of free and bound names in , respectively. Each label denotes an interaction that a given process can perform. The labels for receive ?x(n), send !x(n) and private send !x(y) are defined as in [19]. The label x denotes an interaction on channel x, where the rate of interaction depends on the number of inputs and outputs on the channel. The label keeps track of the channel name so that the rate can be re-calculated whenever new inputs or outputs are added in parallel. Finally, the label r denotes an interaction with constant apparent rate r, such as a stochastic delay or an interaction on a private channel. Reduction in S. An output !x(n).P can send the value n on channel x and then execute process P (1). An input ?x(m).P can receive a value n on channel x and then execute process P, in which the received value is assigned to m (2). A delay r.P can perform an internal action with apparent rate r and then execute the process P (3). If a process P can send a value n on channel x and a process Q can receive a value n on channel x then P and Q can interact on x (4). If n is private then the scope of n is extended over the resulting processes, where n (P0 | Q0) denotes the formation of a complex between P0 and Q0 (5). If two processes interact on a private channel x then the apparent rate of the interaction is constant, and is given by R(x, P) (6). Rule (7) allows a private channel to be sent. Finally, rules (8), (9), (10) and (11) allow an action to be performed inside a restriction, a choice, a parallel composition and a definition, respectively. For each of the rules (4), (5) and (10) there exists a symmetric rule (not shown) in which P | Q and P0 | Q0 are commuted.

with Luca Cardelli (MSR Cambridge) Ralf Blossey (IRI Lille)
Gene Networks with Luca Cardelli (MSR Cambridge) Ralf Blossey (IRI Lille)

Programming a Biological Clock
This video simulates a computer model of a biological clock, which was engineered in living bacteria. The clock works by producing alternate populations of red, blue and purple proteins. These regular oscillations are used to define a notion of time. Our biological clock works using a similar principle, where a large population of a particular protein is produced every 24 hours.

Gene with Negative Control
Neg(a,b) produces protein b and is blocked by protein a val transcribe = 0.1 val degrade = 0.001 val unblock = new new let Neg(a:chan,b:chan) = do (Protein(b) | Neg(a,b)) or ?a; Blocked(a,b) and Blocked(a:chan,b:chan) = Neg(a,b) and Protein(b:chan) = do !b; Protein(b) or run Neg(a,b) transcribe = 0.1 unblock = degrade = 0.001 rate(a,b) = 1.0 The basic building block of our clock is a gene with negative control. The graphical representation on the left of the figure is equivalent to the textual representation on the right. (bottom) The gene is parameterised by two proteins, a and b. The gene produces protein b, and is blocked by protein a. (middle) The graph shows the dynamic behaviour of the Gene. Each node in the graph represents a separate molecule in a given state. The node shaped like a strand of DNA represents a Gene(a,b). The node shaped like a strand of DNA with a blocked promoter region represents the gene in a blocked state, called Blocked(a,b), and the node shaped like a round blob represents a Protein(b). Each labelled edge in the graph represents a reaction. There are two types of reaction: unary and binary. Unary reactions are labelled with the reaction name. For example, a protein can degrade by doing a degrade reaction. Binary reactions are labelled with the reaction channel preceded by an input (?) or output (!) prefix. For example, a Gene(a,b) can become blocked by doing an input on channel a and a Protein(b) can react by doing an output on channel b. Two molecules can interact by doing a complementary input and output on the same channel. For example, if there was a Protein(a) it could interact with the promoter region of a Gene(a,b), causing the gene to block. This interaction between complementary inputs and outputs allows us to construct networks of genes, as we shall see later on. A branching edge is used to represent the production of a new molecule in parallel. For example, a Gene(a,b) can produce a new protein in parallel by doing a transcribe reaction. (top) Stochastic behaviour is incorporated into the model by associating each reaction with a corresponding rate. The rate of a unary reaction is equal to the rate of the reaction name. The rate of a binary reaction is equal to the rate of the reaction channel. The rates are used as the basis for a stochastic simulation algorithm, which calculates the probability of all possible reactions at each step and stochastically chooses the next reaction based on these probabilities. What happens if we simulate one copy of the gene? *** [type this example into SPiM, hiding the existing text. Add sample and plot directives. Ask students to guess what will happen. Run the simulation] [Short version] (middle) The program describes the dynamic behaviour of the Gene. Each node in the graph represents a separate molecule in a given state. The gene can be on or off, and the protein can be active or degraded. Each labelled edge in the graph represents a reaction. The gene can produce proteins, which can degrade, or the gene can become blocked and then unblocked. [Background] Each node in the graph represents the state of a component, and each labelled edge represents a reaction, which can be either unary or binary. Unary reactions such as degradation are labelled with a reaction name. Binary reactions such as binding and unbinding are labelled with a reaction channel preceded by an input (?) or output (!) prefix. Two components can interact by doing a complementary input and output on the same channel. Each reaction name or channel is associated with a corresponding rate, which denotes the speed of the reaction. a b

Gene Simulation Simulation results show evolution over time
Level of protein b fluctuates around 100 A gene by itself produces a steady population of proteins. *** [click on Results icon. Show the sequence of reductions. Mention that the time is calculated based on the rate]

Gene Simulation: 0s A protein b can be transcribed at rate 0.1
reaction rate (s-1) transcribe 0.1 degrade total A protein b can be transcribed at rate 0.1 P(transcribe) = 1 x1 Simulator does one reaction at a time. How does it decide which reaction happens next? The probability of a reaction is proportional to its rate. Add up the rates of all the reactions in the system, in order to calculate the probability. Duration of the reaction is calculated based on the total rate of the system. time 10*ln(1/n)

Gene Simulation: 10.50401s Another protein b can be transcribed
reaction rate (s-1) transcribe 0.1 degrade 0.001 total 0.101 Another protein b can be transcribed P(transcribe) = 0.1 / 0.101 x1 x1

Gene Simulation: 19.7126s And another... P(transcribe) = 0.1 / 0.102
reaction rate (s-1) transcribe 0.1 degrade 0.001*2 total 0.102 And another... P(transcribe) = 0.1 / 0.102 x1 x2

Gene Simulation: 26.80166s A protein b can be degraded at rate 0.001
reaction rate (s-1) transcribe 0.1 degrade 0.001*3 total 0.103 A protein b can be degraded at rate 0.001 P(degrade) = / 0.103 x1 x3

Gene Simulation: s reaction rate (s-1) transcribe 0.1 degrade 0.001*2 total 0.102 Eventually reach an equilibrium between transcription and degradation x1 x2

Gene Simulation: 2980.631s Equilibrium at about 100 proteins.
reaction rate (s-1) transcribe 0.1 degrade 0.001*100 total 0.2 Equilibrium at about 100 proteins. P(transcribe) = 0.1 / 0.2 = P(degrade) x1 x100

Repressilator [Elowitz and Leibler, 2000]
A gene network engineered in live bacteria. Modelled as a simple combination of Neg gates: ( Neg(lac,tet) | Neg(tet,lambda) | Neg(lambda,lac) | Neg(tet,gfp) ) We used our model of a gene to program a biological clock, which was previously engineered in real bacteria. This is a famous experiment, that showed how we can “program” real bacteria by inserting new genes into the bacteria. Done by Elowitz and Leibler. The clock uses a network of three genes similar to the one described previously. An extra gene was also inserted to observe the oscillations. The gene produced a protein that makes the bacteria glow (GFP). Able to build this network using a simple combination of our genes with negative control. Reproduced the observed oscillations in our simulations. *** (We see that GFP population is high when there is no Tet protein, since GFP is inhibited by Tet.) [Details:] This is a famous experiment, that showed how we can “program” real bacteria by inserting new genes into the bacteria. Done by Elowitz and Leibler. Recall that a gene is a sequence of DNA that codes for a particular protein. Each gene has a promoter unit that allows the code to be read. If the promoter region is blocked, say by another protein, the gene can no longer be read and we say that it is “switched off”. Experiment inserts specific genes into bacteria, which produce proteins that “switch each other off”. Here we have three genes, tet, lambda and lac. tet produces a protein that blocks lambda, which in turn produces a protein that blocks lac, which in turn produces a protein that blocks tet. Form a cycle. An additional gene produces a green fluorescent protein (GFP). Makes the bacteria glow. Repressed by the Tet protein. This means that when the Tet protein is produced, it blocks the production of the GFP. Put these together, and what hapens? The bacteria glow on and off. Continue glowing in their descendents. Why? The network can be modelled and simulated in the stochastic pi-calculus, using a simple combination of three genes with negative control, as shown on the right. The simulations result in alternate oscillation of protein populations. We also see that GFP is high when there is no Tet protein, since GFP is inhibited by Tet. © 2000 Elowitz, M.B., Leibler. S. A Synthetic Oscillatory Network of Transcriptional Regulators. Nature 403:

Repressilator Simulation
Alternate oscillation of proteins: tet, lac, lambda, tet, ... Why are the oscillations in a particular order? tet lambda lac tet lambda lac tet lambda lac This gives rise to oscillations.

Repressilator: Debugging
lac lambda Understand how the oscillations are produced. Neg(lac,tet) | Neg(tet,lambda) | Neg(lambda,lac) tet We can debug our model to try to understand why the oscillations are produced in a particular order. Gene blocked by protein a and produces a protein b Gene blocked by protein b and produces a protein c Gene blocked by protein c and produces a protein a

Repressilator: 0s Initially there is one copy of each gene
x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1*3 0.3 Initially there is one copy of each gene Any one of the proteins can be transcribed at rate 0.1 P(transcribe) = 0.3 / 0.3

Repressilator: s x1 x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1*3 0.001 1.0 1.301 The tet protein can block the lambda gene at rate 1.0 P(tet) = 1.0 / 1.301

Repressilator: 6.329912s Now no lambda protein can be transcribed.
x1 x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1*2 0.0001 0.001 0.2011 Now no lambda protein can be transcribed. But lac protein can still be transcribed at rate 0.1 P(transcribe) = 0.2 /

Repressilator: s x1 x1 x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1*2 0.0001 0.001*2 1.0 1.2021 The lac protein can block the tet gene at rate 1.0 P(lac) = 1.0 /

Repressilator: s x1 x1 x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1 0.0001*2 0.001*2 0.1022 Now no tet or lambda protein can be transcribed. A tet protein can degrade at rate 0.001 P(degrade) = /

Repressilator Meanwhile, lots of lac protein is transcribed Reaction
x1 x1 x1 x1 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1 0.0001*2 0.001 0.1012 Meanwhile, lots of lac protein is transcribed

Repressilator Represents one oscillation cycle
Equilibrium between transcription and degradation Eventually, lambda or lac gene unblocks at rate P(unblock) = / x1 x1 x1 x100 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1 0.0001*2 0.001*100 0.2002

Repressilator Suppose the lac gene unblocks
There is a high probability that it will block immediately P(lac) = / x1 x1 x1 x100 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1*2 0.0001 0.001*100 1.0*100

Repressilator: s Eventually, the lambda gene unblocks at rate P(unblock) = / x1 x1 x1 x100 Reaction transcribe unblock degrade tet lambda lac total Rate s-1 0.1 0.0001*2 0.001*100 0.2002

Repressilator: 11039.77s There is nothing to block the lambda gene.
x1 x1 x1 x100 There is nothing to block the lambda gene. The lambda protein can now take over... A high probability of oscillating in a particular order.

Repressilator Simulation in 3D
Can see the oscillations more clearly in 3D. The red gene produces proteins that switch off the purple gene, which produces proteins that switch off the blue gene, which produces proteins that switch off the red gene, completing the cycle. [Details:] This video shows an example of a gene network, which consists of three genes that inhibit each other. The network is modelled in the stochastic pi-calculus, and displayed in 3D. The three genes in the model produce red, purple and blue proteins, respectively, where the genes are shown at the bottom level and the proteins are shown at the top. The genes can be in two states, either on or off, while the population of proteins can grow or shrink over time. When the gene is switched on, it produces proteins at a steady rate. The proteins can also degrade, so that the population of proteins stabilises when production and degradation are in equilibrium. The proteins can also interact with the genes, switching them on or off. The red gene produces proteins that switch off the purple gene, which produces proteins that switch off the blue gene, which produces proteins that switch off the red gene, completing the cycle. [run through the video, and pause at each point] The video shows how the gene network can produce oscillations in protein levels. Initially there is a large population of red proteins, which switch off the purple gene (1). The blue gene then starts producing proteins, which switch off the red gene (2). Since no more red proteins are produced, the population of red proteins slowly decreases over time, until all of the red proteins are degraded (3). The purple gene then starts producing proteins, which switch off the blue gene (4). Since no more blue proteins are produced, the population of blue proteins slowly decreases over time, until all of the blue proteins are degraded (5). The red gene then starts producing proteins, which switch off the purple gene, completing the cycle. We have several genetic oscillators inside our body, including one that controls our biological clock. These oscillators can be quite sophisticated, and computer models can help us understand how they work. *** Further detail: In the demo there are three separate promoter-gene units with negative control, one each for red, purple and blue proteins. The units are arranged in a cycle so that they mutually repress each other, as in the original Repressilator network. There are two types of nodes. Orange nodes represent reactions, and are of fixed size. Each reaction node is labelled with a reaction name, which can be either a delay at a given rate, an input on a channel or an output on a channel. The name of the reaction is displayed when clicking on the corresponding node. Nodes that are not orange represent species, and are of variable size, where the diameter of the node is proportional to the population of the species. The name of the species is displayed when clicking on the corresponding node. In this example there are two levels of species nodes. The nodes at the bottom level represent genes, which can be either on or off, while the nodes at the top level represent proteins, which can have variable populations. There are two types of edges. Yellow edges represent reactions, and go from a source species node to a reaction node, and from a reaction node to zero or more target species nodes. Green edges represent interactions between species, and go from an output reaction node on a given channel to a corresponding input reaction node on the same channel. Consider the red promoter gene unit in the example. The species Neg(lambda,lac) and Blocked(lambda,lac) are on the bottom right and left, respectively, while species Protein(lac) is on the top right. The gene can actively transcribe a protein by doing a delay, represented as a vertical edge to a reaction node "transcribe". This node has two outbound edges, one upward edge to Protein(lac) and another downward edge back to Neg(lambda,lac) (unfortunately the downward edge is not clearly visible). The gene can also be switched off by interacting with a blue protein on reaction node "?lambda" and going into a state Blocked(lambda,lac). The blocked gene can spontaneously unblock by doing a delay on reaction node "unblock". The transcribed protein can also degrade by doing a delay on reaction node "degrade", which has no outbound edges. The red protein can also block the purple gene by doing an output on reaction node "!lac" (this is mostly hidden from view, but can be observed by rotating the network in three dimensions). The purple gene is modelled by node Neg(lac,tet) while the blue jean is modelled by node Neg(tet,lambda). The interactions between genes and proteins give rise to alternate oscillations in protein populations.

Programming Better Oscillators
We observe irregular oscillations when degrade << unblock In some conditions the clock behaves irregularly. The gene can unblock, and cause a leaky transcription, which perturbs the signal.

Repressilator with Dimers
Refined model of a gene with cooperative binding Different implementation, same main program Neg(lac,tet) | Neg(tet,lambda) | Neg(lambda,lac) let Neg(a:chan,b:chan) = ( new Neg2(a,b2,b) ) and Neg2(a:chan,b2:chan,b:chan) = do (Protein(b2,b) | Neg2(a,b2,b)) or ?a; Blocked(a,b2,b) and Blocked(a:chan,b2:chan,b:chan) = Neg2(a,b2,b) and Protein(b2:chan, b:chan) = do ?b2; Protein2(b) or !b2 or and Protein2(b:chan) = do !b; Protein2(b) We can improve the performance of the clock by changing our model of a gene. In this case, we require two proteins to form a dimer in order to block a gene. We can do this without changing the definition of the network. Allows modular programming.

Dimers Improve Regularity
Proteins form dimers first Improves regularity of oscillations Investigating stochasticity, cooperativity, ODEs [Blossey, Cardelli, Phillips, 2007] We obtain more regular oscillations with this form of cooperativity. Less sensitive to leaky transcription, since two proteins are required to block a gene. Even better whene we require tetramers instead of dimers. So the idea is that if we want to build a biological oscillator, computer simulations can help us to uncover the basic design principles.

Gene with additional Inhibitor
transcribe = 0.1 unblock = degrade = 0.001 rate(a,b,r) = 1.0 val transcribe = 0.1 val degrade = 0.001 val unblock = new new new let Negp(a:chan,b:chan,r:chan) = do (Proteinp(b,r) | Negp(a,b,r)) or ?a; Blockedp(a,b,r) and Blockedp(a:chan,b:chan,r:chan) = Negp(a,b,r) and Proteinp(b:chan,r:chan) = do !b; Proteinp(b,r) or ?r or let Inh(r:chan) = !r; Inh(r) run Negp(a,b,r) In order to model these logic gates, we defined a gene with an additional input that allows the protein to be inhibited. So we have two levels: the gene itself can be switched off, or the protein it produces can be inhibited. The presence of an inhibitor is used as an input to the system. [run simulation both with and without inhibitor] r a b

Bacteria Logic Gates [Guet et al., 2002]
© 2002 AAAS. Reprinted with permission from Guet et al. Combinatorial Synthesis of Genetic Networks. Science 296 (5572): Some of the researchers who designed the clock inside bacteria also engineered logic gates! [Details:] Same three genes: lac, lambda and tet, which produce Lac, Lambda and Tet proteins, respectively. Now, each gene has one of 5 possible promoters. Recall that each promoter allows the gene to be transcribed to a protein. Two of the promoters are blocked by Lac at different strengths, one promoter is blocked by Tet, one is blocked by lambda and one is enhanced by lambda. 3 different genes, 5 possible promoters, gives rise to 125 possible different networks of 3 promoter-gene units. Digital input: IPTG neutralises the Tet protein, and aTc neutralises the Lac protein. Digital output: GFP is linked to lambda. So if there is lambda, then there is no GFP and the bacteria glows. Put all of these circuits inside bacteria, and looked at those that behaved like boolean gates. 3 genes: tetR, lacI, lcI 5 promoters: PL1, PL2, PT, Pl-, Pl+ 125 possible networks consisting of 3 promoter-gene units 2 inputs: IPTG (represses Lac), aTc (represses Tet) 1 output: GFP (linked to Pl-)

Combinatorial Library of Genes
Can model 125 networks using just 2 modules (Neg, Negp). Used simulation to investigate system behaviour. Can easily refine the modules without rewiring the networks. [Blossey, Cardelli, Phillips, 2006] let D038() = ( Negp(TetR,TetR,aTc) | Negp(TetR,LacI,IPTG) | Neg(LacI,LambdacI) | Neg(LambdacI,GFP) ) D038() D038() | Inh(aTc) D038() | Inh(IPTG) D038() | Inh(aTc) | Inh(IPTG) We can model 125 networks with just 2 modules. Illustrates one of the benefits of our approach. Can change the definition of a gate, without changing top-level network. Computer simulations can help us program logic gates in bacteria. Once you have logic gates, you are one step closer to having a biological computer. In the long-term, if we ever want to design biological hardware we will need a way of specifying and verifying our designs. Requires specialised tools for modelling concurrent, stochastic systems. [Details:] [show simulations with the 4 types of input] Boolean input simulated by presence or absence of inhibitors. Boolean output simulated by presence or absence of GFP Intuitive boolean analysis does not reproduce the experiment. [explain why, by reverse analysis, we get a contradiction in absence of inhibitors] We need to understand the dynamics of the head feedback loop. [show simulation of cascade without head feedback] Ask audience to guess [show simulation of head feedback, and how we can obtain a fixpoint] [show simulation with different values of unblocking, in absence of inhibitors] [see paper: “A compositional approach to the stochastic dynamics of gene networks”] Shows how the pi-calculus is a useful framework for modelling combinatorics: we can reproduce the combinatorial aspect of the in-vivo experiments. Also shows how insight can be gained simply by playing around with circuits and observing the results. Can formulate hypotheses about behaviour of networks by using different combinations of genes. The main point was to show how we can build complex networks from simple building blocks. Modular construction made it easier to experiment with different circuits. Simple boolean analysis does not predict the experiments. Need to take into account the dynamics of the circuit, in the presence of noise, in order to explain some of the results. *Here we showed that the unexpected experimental results can be reproduced when the head feedback loop of the tet gene is at a fixpoint. We can reproduce the experimental results. In the case of D038, we can perform experiments in silico to explain the regime of operation of the circuit. In particular, the experimental results are only reproduced in certain regimes. Do not have the rates. But can perform experiments to formulate hypotheses about what conditions are required in order to reproduce the experiments. N.B. Once we have defined the libraries, we can forget about them and just code up the network using calls to the library. Two kinds of programmers: library developers, and library users. Proteins aTc and IPTG act as boolean inputs. GFP as boolean output. Either the bacteria glows or it doesn’t.

with Luca Cardelli (MSR Cambridge) Jasmin Fisher (EPFL Lausanne)
Signalling Networks with Luca Cardelli (MSR Cambridge) Jasmin Fisher (EPFL Lausanne)

Programming a Biological Switch
A Biological switch. Takes a small input and produces a large output. Used in cell division, where it is important to have a clear signal. In this simulation, the input signal is a single yellow protein. Gives increasingly rapid response in population of blue, orange and green proteins, respectively (see graph along the bottom). The green protein is used as an output signal. When we remove the yellow input protein, the switch gradually re-sets (not shown in 3d simulation).

Enzymatic Reactions x1 x1 An enzyme can bind to a substrate

Enzymatic Reactions x1 x1 The enzyme and substrate can unbind r

Enzymatic Reactions x1 x1 The enzyme can bind to the substrate again

Enzymatic Reactions x1 x1 The enzyme can react with the substrate r

Enzymatic Reactions x1 The enzyme is restored and the substrate is transformed into a product x1

Mapk Cascade [Huang and Ferrel, 1996]
Signals the presence of an input enzyme Here is the program x1 x10 x1 x100 x1 x100 x1

Mapk Cascade: Simulations
Rates as in paper All rates set to 1.0! The system is surprisingly robust to reaction rates. [Details:] Simulation results for the mapk cascade. The results were obtained by executing the code, using the SPiM simulator. Simulation (a) was obtained using rates and quantities derived from [8], with rate(ai) = 1.0s−1, rate(di) = rate(ki) = 150.0s−1, starting with one of E1, E2 and KKPase, 120 of KPase, 3 of KKK and 1200 of KK and K. Simulation (b) was obtained by setting all the rates to a nominal value of 1.0, starting with the quantities in Fig. 17. Both simulations exhibit an increase in signal response as the cascade is traversed from KKK to KK and K. Functionally similar response profiles were observed for the output KPP in both simulations, in spite of the differences in simulation conditions.

Mapk Simulation in 3D A Biological switch. Takes a small input and produces a large output. Used in cell division, where it is important to have a clear signal. In this simulation, the input signal is a single yellow protein. Gives increasingly rapid response in population of blue, orange and green proteins, respectively (see graph along the bottom). The green protein is used as an output signal. When we remove the yellow input protein, the switch gradually re-sets (not shown in 3d simulation).

Pi-calculus v Reaction Equations

Mapk is a module of EGFR Mapk is a module of EGFR. Understanding this system could play a key role in cancer treatment. Case study in modularity of biological programs. The idea is to build large systems from simpler building blocks.

Immune System Modelling: MHC I Antigen Presentation
Teamed up with leading researchers from southampton university. with Luca Cardelli (MSR Cambridge) Leonard Goldstein (Cambridge University) Tim Elliott (Southampton University) Joern Werner (Southampton University)

MHC: A Biological Virus Scanner
We used our language to program a key pathway in the immune system, which is able to detect the presence of potentially harmful intruders in the cell, such as viruses or bacteria. [Details:] As a case study, the stochastic pi-calculus can be used to model and simulate MHC I antigen presentation. This is a key pathway of the immune system, which is able to detect the presence of potentially harmful intruders in the cell, such as viruses or bacteria. The detection is performed by a special protein complex called MHC, which regularly scans protein fragments in the cell to check for the presence of potentially harmful intruders, such as viruses or bacteria. ©2005 from Immunobiology, Sixth Edition by Janeway et al. Reproduced by permission of Garland Science/Taylor & Francis LLC.

MHC: A Biological Virus Scanner
The MHC molecule (in yellow) acts like a biological anti-virus mechanism. Regularly scans a sample of the protein fragments (peptides) that are present in the cell. Able to recognise the fragments that are potentially harmful, e.g. from a virus or bacterium living inside the cell. The captured peptides are egressed to the cell surface, and presented as evidence that the cell is infected. The peptides at the surface are recognised as harmful, and the infected cell is destroyed. [Details:] So MHC behaves like an anti-virus mechanism. Regularly scans a sample of the protein fragments that are present in the cell. Some of the peptides do not bind to MHC at all (3), while others bind initially, but are unstable (4). Some bind in a stable manner, and are captured by MHC (5). These correspond to peptides that MHC recognises as potentially harmful, such as peptides from a virus or bacterium living inside the cell. The captured peptides are transported to the cell surface, and presented as evidence that the cell is infected (6). The peptides at the surface are recognised as harmful by the immune system, and the infected cell is destroyed. *** As a case study, the stochastic pi-calculus can be used to model and simulate MHC I antigen presentation. This is a key pathway of the immune system, which is able to detect the presence of potentially harmful intruders in the cell, such as viruses or bacteria. The detection is performed by a special protein complex called MHC, which stands for Major histocompatibility complex. The MHC complex is formed inside a special compartment of the cell, where it scans protein fragments to check for the presence of potentially harmful intruders, such as viruses or bacteria. This video describes MHC I Antigen Presentation. The MHC complex is composed of a variety of proteins, which are assembled inside a special compartment in the cell, known as the Endoplasmic reticulum (ER) (1). Once formed, the MHC complex anchors itself to the entrance of this compartment, where it scans a sample of the protein fragments that are present in the cell. These protein fragments are called peptides, and are produced by the recycling machinery of the cell (2). The peptides are transported into the compartment where they can be scanned by MHC, by a process known as “peptide editing”. Some of the peptides do not bind to MHC at all (3), while others bind initially, but are unstable (4). Some bind in a stable manner, and are captured by MHC (5). These correspond to peptides that MHC recognises as potentially harmful, such as peptides from a virus or bacterium living inside the cell. The captured peptides are transported to the cell surface, and presented as evidence that the cell is infected (6). The peptides at the surface are recognised as harmful by the immune system, and the infected cell is destroyed. In this way, the MHC complex behaves much like a computer anti-virus system, which continually scans the contents of the cell to check for infection. ©2005 from Immunobiology, Sixth Edition by Janeway et al. Reproduced by permission of Garland Science/Taylor & Francis LLC.

Investigate the Role of Tapasin
We are currently collaborating with leading researchers in Immunology and Structural Biology from Southampton University (UK), in order to understand how the system functions. In particular, we are trying to understand the role of one of the key molecules in the system, called Tapasin. [Details] Leading researchers in Immunology and Structural Biology from Southampton University (UK) have been studying MHC I antigen presentation for many years, in order to understand how the MHC complex is able to capture the right peptides. In particular, they have been trying to understand the role of Tapasin on peptide editing. [next] We are currently collaborating with these researchers, in order to develop a computer model that captures their current knowledge of MHC I antigen presentation. We are also including additional hypotheses, and running computer simulations of the model. ©2005 from Immunobiology, Sixth Edition by Janeway et al. Reproduced by permission of Garland Science/Taylor & Francis LLC.

Peptide Loading Model Each graph represents a component
We teamed up with these scientists to develop a computer model that captures their knowledge about the system. We also included some additional hypotheses, and ran computer simulations of the model. [Explain the model] This slide presents a stochastic pi-calculus model of peptide loading by MHC I. Each connected graph in the figure describes the behaviour of a component in the system. There are graphs for peptide (a), tapasin (b) and MHC (c). Each node in the graph represents a component state and each directed edge represents a reaction, which can be either unary or binary. Unary reactions such as degradation are labelled with the reaction rate. Binary reactions such as association and dissociation are labelled with an interaction channel preceeded by an input (?) or output (!) prefix, where each channel is associated with a corresponding interaction rate. Two components can interact by doing a complementary input and output on the same channel, where the rate of the interaction is equal to the rate of the channel. a) Peptides are actively loaded into the ER at rate generate_pep, and are degraded at rate degrade_pep. Each peptide is characterised by a private channel u, whose rate determines its affinity to MHC. We assume an equal supply of three classes of peptides with low, medium and high unbinding rates, resulting in high, medium and low affinity peptides, respectively. A peptide can bind to an MHC complex by doing an output on channel bind. The bound peptide can then unbind from MHC by doing an input on channel u. b) Tapasin is assembled in the ER at rate generate_TPN, and is degraded at rate degrade_TPN. Each tapasin molecule is characterised by a private channel uT, whose rate determines its affinity to MHC. Tapasin can bind to MHC by doing an output on channel bindT. The bound tapasin can then unbind from MHC by doing an input on channel uT. c) MHC is assembled in the ER at rate generate_MHC, and is degraded at rate degrade_MHCo. Free MHC can load a peptide with affinity u by doing an input on channel bind. The loaded MHC can then release the peptide by doing an output on channel u. The rate of unbinding is determined by the rate of the channel, which can be low, medium or high, depending on the nature of the loaded peptide. Loaded MHC can egress to the cell surface at rate egress, where it is degraded at rate degrade_MHCe. Free MHC in the ER can also bind to tapasin by doing an input on channel bindT. The bound MHC can then unbind by doing an output on channel uT. The rate of unbinding of empty MHC from tapasin is determined by the rate of channel uT, multiplied by a constant factor v. MHC can also load a peptide in presence of tapasin. We assume that loaded MHC can only egress in absence of tapasin, since tapasin has an ER retention signal. We neglect the degradation of empty MHC bound to tapasin, which is in the order of two hours, and of loaded MHC, which is in the order of several hours []. [explain model of effect of tapasin on loading] We want to study the effect of tapasin on peptide loading. Allow tapasin to affect all aspects of loading. The effects are modelled by multiplying the rates of binding and unbinding of peptide to MHC by a constant factor. As a result a number of different functions of Tapasin can be modelled in isolation or in combination. For example an increase in the binding rate of peptides to MHC would model the stabilisation of the open MHC complex. Tapasin can also decrease the unbinding rate of peptides from MHC by improving the stability of closed MHC complexes. Finally, a peptide can increase the affinity of Tapasin to MHC, by inducing a conformational change in MHC.

Peptide Loading Model MHC can load a peptide x1 x1

Peptide Loading Model The loaded peptide can escape
We teamed up with these scientists to develop a computer model that captures their knowledge about the system. We also included some additional hypotheses, and ran computer simulations of the model. [Explain the model] This slide presents a stochastic pi-calculus model of peptide loading by MHC I. Each connected graph in the figure describes the behaviour of a component in the system. There are graphs for peptide (a), tapasin (b) and MHC (c). Each node in the graph represents a component state and each directed edge represents a reaction, which can be either unary or binary. Unary reactions such as degradation are labelled with the reaction rate. Binary reactions such as association and dissociation are labelled with an interaction channel preceeded by an input (?) or output (!) prefix, where each channel is associated with a corresponding interaction rate. Two components can interact by doing a complementary input and output on the same channel, where the rate of the interaction is equal to the rate of the channel. a) Peptides are actively loaded into the ER at rate generate_pep, and are degraded at rate degrade_pep. Each peptide is characterised by a private channel u, whose rate determines its affinity to MHC. We assume an equal supply of three classes of peptides with low, medium and high unbinding rates, resulting in high, medium and low affinity peptides, respectively. A peptide can bind to an MHC complex by doing an output on channel bind. The bound peptide can then unbind from MHC by doing an input on channel u. b) Tapasin is assembled in the ER at rate generate_TPN, and is degraded at rate degrade_TPN. Each tapasin molecule is characterised by a private channel uT, whose rate determines its affinity to MHC. Tapasin can bind to MHC by doing an output on channel bindT. The bound tapasin can then unbind from MHC by doing an input on channel uT. c) MHC is assembled in the ER at rate generate_MHC, and is degraded at rate degrade_MHCo. Free MHC can load a peptide with affinity u by doing an input on channel bind. The loaded MHC can then release the peptide by doing an output on channel u. The rate of unbinding is determined by the rate of the channel, which can be low, medium or high, depending on the nature of the loaded peptide. Loaded MHC can egress to the cell surface at rate egress, where it is degraded at rate degrade_MHCe. Free MHC in the ER can also bind to tapasin by doing an input on channel bindT. The bound MHC can then unbind by doing an output on channel uT. The rate of unbinding of empty MHC from tapasin is determined by the rate of channel uT, multiplied by a constant factor v. MHC can also load a peptide in presence of tapasin. We assume that loaded MHC can only egress in absence of tapasin, since tapasin has an ER retention signal. We neglect the degradation of empty MHC bound to tapasin, which is in the order of two hours, and of loaded MHC, which is in the order of several hours []. [explain model of effect of tapasin on loading] We want to study the effect of tapasin on peptide loading. Allow tapasin to affect all aspects of loading. The effects are modelled by multiplying the rates of binding and unbinding of peptide to MHC by a constant factor. As a result a number of different functions of Tapasin can be modelled in isolation or in combination. For example an increase in the binding rate of peptides to MHC would model the stabilisation of the open MHC complex. Tapasin can also decrease the unbinding rate of peptides from MHC by improving the stability of closed MHC complexes. Finally, a peptide can increase the affinity of Tapasin to MHC, by inducing a conformational change in MHC. x1 x1 u

Peptide Loading Model The loaded peptide can escape x1 x1

Peptide Loading Model MHC can bind tapasin x1 x1

Peptide Loading Model The bound tapasin can unbind x1 uT
We teamed up with these scientists to develop a computer model that captures their knowledge about the system. We also included some additional hypotheses, and ran computer simulations of the model. [Explain the model] This slide presents a stochastic pi-calculus model of peptide loading by MHC I. Each connected graph in the figure describes the behaviour of a component in the system. There are graphs for peptide (a), tapasin (b) and MHC (c). Each node in the graph represents a component state and each directed edge represents a reaction, which can be either unary or binary. Unary reactions such as degradation are labelled with the reaction rate. Binary reactions such as association and dissociation are labelled with an interaction channel preceeded by an input (?) or output (!) prefix, where each channel is associated with a corresponding interaction rate. Two components can interact by doing a complementary input and output on the same channel, where the rate of the interaction is equal to the rate of the channel. a) Peptides are actively loaded into the ER at rate generate_pep, and are degraded at rate degrade_pep. Each peptide is characterised by a private channel u, whose rate determines its affinity to MHC. We assume an equal supply of three classes of peptides with low, medium and high unbinding rates, resulting in high, medium and low affinity peptides, respectively. A peptide can bind to an MHC complex by doing an output on channel bind. The bound peptide can then unbind from MHC by doing an input on channel u. b) Tapasin is assembled in the ER at rate generate_TPN, and is degraded at rate degrade_TPN. Each tapasin molecule is characterised by a private channel uT, whose rate determines its affinity to MHC. Tapasin can bind to MHC by doing an output on channel bindT. The bound tapasin can then unbind from MHC by doing an input on channel uT. c) MHC is assembled in the ER at rate generate_MHC, and is degraded at rate degrade_MHCo. Free MHC can load a peptide with affinity u by doing an input on channel bind. The loaded MHC can then release the peptide by doing an output on channel u. The rate of unbinding is determined by the rate of the channel, which can be low, medium or high, depending on the nature of the loaded peptide. Loaded MHC can egress to the cell surface at rate egress, where it is degraded at rate degrade_MHCe. Free MHC in the ER can also bind to tapasin by doing an input on channel bindT. The bound MHC can then unbind by doing an output on channel uT. The rate of unbinding of empty MHC from tapasin is determined by the rate of channel uT, multiplied by a constant factor v. MHC can also load a peptide in presence of tapasin. We assume that loaded MHC can only egress in absence of tapasin, since tapasin has an ER retention signal. We neglect the degradation of empty MHC bound to tapasin, which is in the order of two hours, and of loaded MHC, which is in the order of several hours []. [explain model of effect of tapasin on loading] We want to study the effect of tapasin on peptide loading. Allow tapasin to affect all aspects of loading. The effects are modelled by multiplying the rates of binding and unbinding of peptide to MHC by a constant factor. As a result a number of different functions of Tapasin can be modelled in isolation or in combination. For example an increase in the binding rate of peptides to MHC would model the stabilisation of the open MHC complex. Tapasin can also decrease the unbinding rate of peptides from MHC by improving the stability of closed MHC complexes. Finally, a peptide can increase the affinity of Tapasin to MHC, by inducing a conformational change in MHC. uT x1

Peptide Loading Model The bound tapasin can unbind x1 x1

Experimental Setup Assume low, medium and high affinity peptides
We assume an equal supply of three classes of peptides with low, medium and high unbinding rates, resulting in high, medium and low affinity MHC peptide complexes. Note that we do not change the model for MHC. It is parameterised by the class of peptide. Use the high affinity peptide as the representative peptide.

Model Parameters MHC spends < 2h on average in the ER. Name Rate
min-1 Time min Range Description gpep 50 0.02 Active transport of peptides into the ER dpep 10 0.1 Degradation of free peptides inside the ER bind 1 Binding of peptides to MHC (per molecule) low 3 0.33 Unbinding of low affinity peptides from MHC med 1.2 0.83 Unbinding of medium affinity peptides from MHC high 0.5 2 Unbinding of high affinity peptides from MHC gMHC Assembly of MHC complexes inside the ER dMHCo 0.01 100 Degradation of free MHC inside the ER dMHCe Degradation of loaded MHC at the cell surface egress Egression of loaded MHC from the ER gTPN Production of tapasin inside the ER dTPN Degradation of free tapasin inside the ER bindT Binding of tapasin to MHC (per molecule) uT Unbinding of tapasin from loaded MHC Rates. The whole process needs to take about 3 hours. The exact values of the rates is not critical. But the proportions are. For example, assume high peptide turnover. Assume 3 classes of peptides. How long does an MHC complex take to assemble, etc.

Tapasin Hypotheses Tapasin: Peptide:
Can increase peptide loading at ER entrance Can destabilise loaded MHC Peptide: Can increase tapasin unbinding from MHC Factor Value Range Description a 1 1 - 10 Binding of peptides to MHC in presence of tapasin q 10 Unbinding of peptides from MHC in presence of tapasin. v 0.01 Unbinding of tapasin from MHC in absence of peptide.

Peptide Editing Simulation
The simulations were able to reproduce the observed experiments. The MHC presented a large proportion of stable peptides (red), some moderately stable peptide (blue) and very few unstable peptides (yellow) at the cell surface. How does our model achieve this?

Peptide Filtering How does MHC egress the right peptides?
Consider a loaded peptide with unbinding rate u Competition between unbinding and egression P(egress,u) = egress / (u + egress) P(egress,u) = 1/(1 + u / egress) egress   : P(egress,u)  1 egress  0 : P(egress,u)  1/ u How does MHC capture the right peptide? Peptides that are recognised as harmful bind in a stable way to MHC. So ideally, MHC should wait an infinite amount of time before egressing to the cell surface, to be sure to capture the most stable peptides. Cannot wait indefinitely, since a decision needs to be taken quickly. But if MHC acts too quickly, it will take the first peptide that comes along. Need a trade-off between how effective we are and how quickly we can respond. Can calculate the upper bound as egression tends to infinity. [Details:] An open MHC complex can bind a free peptide by doing an input ?bind(u), where the affinity of the peptide is determined by channel u. The loaded MHC complex can either egress, or release the peptide by doing an output !u. x1

Peptide Discrimination
Consider 3 loaded MHC complexes Stable peptides are more likely to egress Pi(egress) = P(egress,ui) / k P(egress,uk) egress  : Pi(egress)  1/k1 = 1/3 egress  0: Pi(egress)  (1/ ui) / k (1/ uk ) Assume three classes of peptides: stable (high), moderately stable (med) and unstable (low). Can calculate upper and lower bounds of peptide descrimination as a function of waiting time, based on relative probabilities. [Details:] It is instructive to analyse the relative proportions of MHC peptide complexes with peptides of low, medium and high unbinding rates as a function of the rate of egression. In the simplest case we consider only MHC and peptide and will add tapsin later. As the rate of egression tends to zero the proportion of high, medium and low affinity MHC peptide complexes tends to the limiting ratios of 12/19, 5/19 and 2/19, respectively. This is expected as can be shown by analysing the probabilities of the formation of MHC peptide complexes as a function of the binding, unbinding and egression rates. If we assume that MHC complexes are loaded with an equal proportion of peptides with low, medium and high unbinding rates due to high peptide turnover, the probability of egression for a peptide with affinity u is given by the above equation. x1 x1 x1

Discrimination Upper Bound
Assume low = 6/2, med = 6/5, high = 6/12 Upper bound as egress tends to 0 Pi(egress)  (1/ ui) / k (1/ uk ) = (1/ ui) / (2/6 + 5/6 +12/6) = (1/ ui)  (6/19) affinity low med high ui 6/2 6/5 6/12 1/ui 2/6 5/6 12/6 Pi() 1/3 Pi(0) 2/19 5/19 12/19 Graph shows relative proportion of unstable (low), moderately stable (medium) and stable (high) peptides presented at the cell surface, as egression rate tends to zero (i.e. as waiting time tends to infinity) MHC cannnot tell the difference between peptides if it egresses immediately. Discrimination improves the longer it waits before egressing. Able to capture high affinity peptides more effectively.

Discrimination vs Egression
Calculate peptide discrimination for different values of egress. Assume 1000 uniformly loaded MHC complexes Graph shows relative proportion of unstable (low), moderately stable (medium) and stable (high) peptides presented at the cell surface, as egression rate tends to zero (i.e. as waiting time tends to infinity) MHC cannnot tell the difference between peptides if it egresses immediately. Discrimination improves the longer it waits before egressing. Able to capture high affinity peptides more effectively. egress low med high 100 329 335 337 10 294 342 364 1 182 331 486 0.1 117 279 604 0.01 107 265 629 105 263 632

Peptide Editing Simulations
Simulate peptide editing for different values of egress. Can reproduce these results by computer simulation.

Peptide Editing Results
Simulation results are comparable to predictions Can reproduce these results by computer simulation.

Peptide Filtering with Tapasin
Tapasin adds a second filtering stage. P(uT,u) = uT / (uq + uT) P(uT,u) = 1 / (1 + uq/uT) P(uT,egress,u) = P(uT,u)  P(egress,u) uT, egress   : P(uT,egress,u)  11 uT, egress  0 : P(uT,egress,u)  (1/uq)  (1/u) x1 Tapasin improves the effectiveness of the system by providing an additional filtering step. This increases the upper bound of peptide discrimination. [Explain with words] Note that the steps are similar to egression. This time, the product factor q is involved. Does not matter whether it is q or uT. Conjecture that we increase q. because of the time constraints. [Details:] Peptide filtering by MHC I. An open MHC complex can bind a free peptide by doing an input ?bind(u), where the affinity of the peptide is determined by channel u. The loaded MHC complex can either egress, or release the peptide by doing an output !u.

Peptide Discrimination
Tapasin improves upper bound on discrimination Pi(uT,egress) = P(uT,egress,ui) / k P(uT,egress,uk) uT,egress  : Pi(uT,egress)  11 /k 11 = 1/3 uT,egress  0: Pi(uT,egress)  (1/uiq)(1/ui) / k (1/ukq)(1/uk) x1 x1 x1

Discrimination vs Egression
Calculate peptide discrimination for different values of uT/q and egress. Assume 1000 uniformly loaded MHC complexes Graph shows relative proportion of unstable (low), moderately stable (medium) and stable (high) peptides presented at the cell surface, as egression rate tends to zero (i.e. as waiting time tends to infinity) MHC cannnot tell the difference between peptides if it egresses immediately. Discrimination improves the longer it waits before egressing. Able to capture high affinity peptides more effectively. uT/q; egress low med high 1;1 88 290 623 0.1;1 52 227 721 0.01;1 47 213 740 0.1;0.1 30 170 800 0.01;0.1 27 159 815 0.01;0.01 24 147 829 0;0 23 145 832

Tapasin Improves Discrimination
Max discrimination without tapasin 105:263:632 Max discrimination with tapasin 23:145:832 Graph shows relative proportion of unstable (low), relatively stable (medium) and stable (high) peptides presented at the cell surface, as egression rate tends to zero (i.e. as waiting time tends to infinity) in presence of tapasin. Note the additional filtering step. Much better discrimination than in absence of tapasin. MHC is able to more effectively capture stable peptides.

Peptide Editing Simulations
Simulate peptide editing with tapasin, for different values of q and egress . v = 0.0. Start with 1000 tapasin.

Peptide Editing Results
Simulation results are comparable to predictions

MHC Alleles: Model Predictions
Peptide discrimination in absence of tapasin: Depends on egress. Peptide discrimination in presence of tapasin: Depends on q and ratio of complexes that follow tapasin pathway. Cell surface expression in absence of tapasin: Depends on dMHCo. Rate Range Effect egress Lower egress gives better filtering in absence of tapasin. dMHCo Lower degradation gives higher throughput of MHC q Higher q gives a better filtering in presence of tapasin.

MHC Alleles: Model Simulations
Explain varying dependence on tapasin B4402 (Dependent) B2705 (Partially) B4405 (Independent) No TPN 1000 TPN Used our model to explain some of the genetic variability of MHC among humans.

Peptide Loading: Flytrap Model
MHC I captures peptides like a Venus Flytrap. Leading researchers in Immunology and Structural Biology from Southampton University (UK) have been studying MHC I antigen presentation for many years, in order to understand how the MHC complex is able to capture the right peptides. They suggested that the MHC complex behaves like a Venus flytrap. A peptide can enter an open MHC complex at random, a bit like a fly wandering into a Venus Flytrap (2). The complex has a special trigger mechanism, which triggers its closure depending on the shape of the peptide - much like a Venus flytrap, which has a trigger mechanism that only closes when the fly is of a certain size. Peptides that are not recognised by the MHC complex can escape (1), while peptides that are recognised as potentially harmful are captured by the MHC complex and presented at the cell surface (3). unstable peptide escapes peptide enters open MHC stable peptide is captured and presented at cell surface

Flytrap Model MHC undergoes a conformational change after peptide loading in the inverted flytrap filtering occurs via slowing down the rate of closure. There is a greater probability of exchanging low-affinity peptides, back into the free pool. There will be more low affinity peptides in the open conformation. [Check this experimentally] We see from the simulations that the loaded MHC in open conformation has a higher proportion of low affinity peptides. By slowing closure, we increase the chance of exchanging a low affinity peptide with one of higher affinity, via peptide unbinding. [hinges on the fact that the pool of peptides has an even distribution of high, low and med affinity peptides] Principle of pre-equilibrium complex.

Model Equivalence Varying levels of detail. Can obtain similar results with these models. They all have a 2-stage filter process. Fundamental property of MHC peptide editing. It turns out that these three models are all weak open bisumulation equivalent (automatically checked in the Mobility Workbench). This indicates that they have similar interactions with their environment, and in some sense preserve the structure of the system. Need to envisage different types of equivalence.

Core model

Adding two new peptides

Adding a conformational change to MHC

Next Steps A functional model of MHC I Antigen presentation
[Cardelli, Elliott, Goldstein, Phillips, Werner] Medical Research Grant (MRC ) to complete a more detailed MHC model. Medical Research Grant (MRC ) to perform targeted experiments. Calreticulin Obtained two focussed research grants from Medical Research Council. To fund collaboration and further experiments. TAP transporter Erp57

Conclusions Biological systems work surprisingly well, though we don’t fully understand why We still have a lot to learn from nature. Long-term benefits for medical research: Better understand disease. Speed up the design of a cure. Long-term benefits for computing: Write more robust concurrent / distributed programs Design and verification of biological computers Smart drugs? Computers in a test tube...? Biological modelling is pushing the boundaries of concurrent programming Modelling is of particular interest to pharmaceutical companies Shorten the development cycle Detect problems in drug design *before* the drugs are released The FDA now requires more stringent modelling before any drugs can be approved. Pharmaceutical companies now have an obligation to do more detailed biological modelling. Biological hardware design requires precise computational modelling, in order to make sure that the hardware will actually work.

Outlook Senior executives of pharmaceutical companies:
“a real need for a modular biological programming language”

Thanks. Questions: *Why is this hard?
In computer programs: we have instructions – read this value from this register. In biological systems we don’t have these kind of instructions. Large numbers of molecules. All interacting in parallel. Interactions have a certain probability of happening. Some proteins fail. Difficult to predict what will happen next. Because the systems are so massively parallel, difficult to know what the design principles are. Imagine a 1000 core CPU. How do you program it? Biology has some decentralised solutions to concurrent programming. Also very robust to failure. *What is the call to action? Build a community. Here are some resources for you to try out. *Mention the bigger context of this research Trento centre. Patnership with a range of researchers.

References [Blossey et al., 2006] Blossey, R., Cardelli, L., and Phillips, A. (2006). A compositional approach to the stochastic dynamics of gene networks. Transactions in Computational Systems Biology, 3939:99–122. [Gillespie, 1977] Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem., 81(25):2340–2361. [Guet et al., 2002] Guet, C.C., Elowitz, M.B., Hsing, W. & Leibler, S. (2002) Combinatorial synthesis of genetic networks. Science [Huang and Ferrel, 1996] Huang, C.-Y. F. and Ferrel, J. E. (1996). Ultrasensitivity of the mitogen-activated protein kinase cascade. PNAS, 93:10078–10083. [Lecca and Priami, 2003] Lecca, P. and Priami, C. (2003). Cell cycle control in eukaryotes: a biospi model. In BioConcur’03. ENTCS. [Phillips, 2006] Phillips, A. (2006). The Stochastic Pi-Machine: A Simulator for the Stochastic Pi-calculus. Available from [Phillips and Cardelli, 2004] Phillips, A. and Cardelli, L. (2004). A correct abstract machine for the stochastic pi-calculus. In Bioconcur’04. ENTCS. [Phillips and Cardelli, 2005] Phillips, A. and Cardelli, L. (2005). A graphical representation for the stochastic pi-calculus. In Bioconcur’05.

References [Phillips et al., 2006] Phillips, A., Cardelli, L., and Castagna, G. (2006). A graphical representation for biological processes in the stochastic pi-calculus. Transactions in Computational Systems Biology, 4230:123–152. [Priami, 1995] Priami, C. (1995). Stochastic -calculus. The Computer Journal, 38(6):578–589. Proceedings of PAPM’95. [Priami et al., 2001] Priami, C., Regev, A., Shapiro, E., and Silverman, W. (2001). Application of a stochastic name-passing calculus to representation and simulation of molecular processes. Information Processing Letters, 80:25–31. [Regev et al., 2001] Regev, A., Silverman, W., and Shapiro, E. (2001). Representation and simulation of biochemical processes using the pi- calculus process algebra. In Altman, R. B., Dunker, A. K., Hunter, L., and Klein, T. E., editors, Pacific Symposium on Biocomputing, volume 6, pages 459–470, Singapore. World Scientific Press. [Sangiorgi and Walker, 2001] Sangiorgi, D. and Walker, D. (2001). The -calculus: a Theory of Mobile Processes. Cambridge University Press. [Silverman et al., 1987] Silverman, W., Hirsch, M., Houri, A., and Shapiro, E. (1987). The logix system user manual, version In Shapiro, E., editor, Concurrent Prolog: Collected Papers (Volume II), pages 46–77. MIT Press, London.

Simulating Biological Systems in the Stochastic Pi-calculus

Similar presentations

Presentation on theme: "Simulating Biological Systems in the Stochastic Pi-calculus"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simulating Biological Systems in the Stochastic Pi-calculus

Similar presentations

Presentation on theme: "Simulating Biological Systems in the Stochastic Pi-calculus"— Presentation transcript:

Similar presentations

About project

Feedback