Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin.

Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin

 This article is from 1998.  Thus, I would like you all to join me into a trip to the past.  I would like to introduce you all to a Language called DC+.

 Intermediate language for multiple synchronous languages.  Advantages: Portability Multiple languages:  needing only one “optimizer”  Combining models written in different languages.

 Requirement 1: It should be general enough to greet a variety of source languages.  Requirement 2: Since it is intended to be used in industrial compilers, which are submitted to hard performance requirements, its use should neither complexify too much the compiling process, nor impede the performances of the generated code.

 Single Intermediate language for many popular languages. Very complex efficient code generator were written Can we even trust those code generators?

Formal validation of the code generator Validate semantic equivalency between the original DC+ code and the resulted C code Today’s Topic So lets start from why this is bad

 Meaning: full formal verification of the code generator.  Problems: Extremely hard to do in industrial size code generators. Once finished- frizzes the design. (Who would dare to change something?)

 Instead of validating the code generator once- Validating each run.  Meaning: making sure that the resulted C code is semantically equivalent to the initial DC+ code. Has to be Automatic

DC+ codeC code CVT Approved Not Approved

 Production of safety critical systems  Enables the use of code generation tool in such high quality systems.  “The combination of automatic code generation and validation …” “…eliminating the need for hand-coding the target code...”.

 Hard question.  First, will need to understand DC+’s semantics.

 Synchronous.  Describes a reactive system whose behavior along time is observable as an infinite sequence of states.  State changes are triggered by the arrival of new values for the input variables. No Interrupts

 A list of constraints on the program variables.  When new values arrive to the input variables, the other variables values are being determined according to the constraints list.  At each instance in time all constraints have to be satisfied by the values that the variables have at that instance. The list of constraints determines the transition relation of the system.

 There are four kinds of variables: Input variables Output variables Internal variables The trigger for state changes Observable variables For internal use Register variables Store information about the history of the current computation

 both the DC+ and the generated C program need to be translated into a common semantic domain.  STS- synchronous transition systems will be used.

 STS S=(V, Θ,p) V Θ p A finite set V of typed variables A satisfiable assertion Θ characterizing the initial states of system S transition relation  all original program variables  the initial state  obtained by a one-to- one translation of the list of constraints into logical formulas

 Solutions of p for given values of the input variables determine the values of the remaining variables.  Observable behavior of such a system can be understood in the following way: initial state of the system Legal State along the computation

 Reminder: DC+ is Synchronous.  According to synchrony hypothesis: no time delay between the reception of new values for input variables and the generation of corresponding output values   all variables are updated simultaneously. atomic Bounded Deterministi c

 The result of the code generator is a C code.  Will have the following structure: ANSI-C One control loop each iteration corresponds to one step of the DC+ program

 Unlike in DC+, here the variables are not updated simultaneously.  the control loop consumes new values for input variables and successively computes (one by one) the values of the remaining variables.

 states marked with a bullet, corresponds to the begin (and end) of the control-loop.  those states match the states of the original DC+ program.  Intermediate states, where only some variables have been updated, are not depicted since they do not correspond to any state of the DC+ program.

 For the purpose of semantical comparison, the C program is also translated into an STS representation.

 Now, we have a common semantic domain for both the C and DC+ programs.  We’’ll say that: “ Program C implements DC+ if for every computation σ of C there exists a computation τ of DC+ such that σ and τ agree state-wise on the values of observable variables, i.e., input and output variables…”

 Now, we’ll need a mapping from DC+  C (abstraction to concrete).  The use-case code generator applies more than 100 optimizations.  Thus, the mapping domain will be the observable variables (I/O).  The mapping will assign a term over the concrete variables for every abstract variable.

 If both of these proof obligations are found to be valid, we can conclude that C is a correct refinement of the corresponding DC+ program.

 Now we’ll move to the practical part.  CVT’s Architecture.

What we talked about so far

 The right hand side of the implication is in the form of a conjunction.  Since the time it takes to verify a program using BDD based tools is worst-case exponential in the size and complexity of the formula, it is the size of the single formula that has to be verified that determines the bottleneck of the validity checking.  Before (practical) SAT solvers?

Paper claims to “soon explain why”. I couldn’t find where this soon is

 Why Decomposition?  Verifying each formula is exponential in it’s size.  Decomposition causes linear increase in the amount of validation tasks.  And linear decrease in each task size.   which means exponential decrease in verification time of each formula. This is way Decomposition is so important

 After breaking the right-hand side, the module returns to the left-hand side of the implication, and calculates the “Cone of Influence”. COI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right- hand side.

 COI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right- hand side.  Now, we have many pairs of files to be calculated (possibly even simultaneously)  The pairs are: Conjunct from the right side The conjunct COI from the left side

 abstraction is needed since we are trying to verify a formula which contains integer and float variables, as well as functions over these variables using a BDD-Based decision procedure for finite-state models.  The abstraction module treats these functions as uninterpreted functions, replacing them by new symbols.

 The faithfulness of this technique depends on two things: the way that the compiler manipulates these functions the kind of functions we leave uninterpreted.  Should we interpret more function? The more we interpret, the more faithful the model is. (it’s also hard to interpret complex functions) The less we interpret, the smaller the model is.

 The abstraction works in an incremental manner.  CVT begins with maximum abstraction.  all functions except equalities, Boolean operators and if-then-else are left uninterpreted.  If the proof fails, CVT invokes the next level of abstraction.  Additionally, comparisons operators on integers (, etc.) are now being interpreted.  There are no more levels.

 Example for why this is necessary: If the compiler reads “a a”. The first level of abstraction will result a false negative. The second level of abstraction will result a true positive.

 This leaves us with a quantifier-free first- order logic formula which enjoys the small model property (i.e., it is satisfiable iff it is satisfiable over a finite domain).  Therefore the next issue is the calculation of a finite domain.   such that the formula is valid if and only if it is valid over all interpretations into this domain.

 Once we have a valid domain, checking whether the formula is satisfiable or not is relatively easy thing to do (BDDs).  So, which domain to use?

Which function do we interpret? Level 0: equalities, Boolean operators and if- then-else Level 1: comparisons operators on integers (, etc.) Only order is important If there are n variables The domain [1..n] contains all possible rearrangement The domain [1..n] is Valid

That’s R:

 TLV- the verifier module  SMV based tool.  Invoked for each pair of files (as created from the COI).  If equivalence proof fails: It is possible to isolate the conjunct that failed it.

 Case study: a turbine from the SACRES project.  5 units (manually separated).  DC+ is few thousands line of code with over 1000 variables.

 Unverified conjuncts add a very large Cone Of Influence.

 Thank you

Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin.

Similar presentations

Presentation on theme: "Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin.

Similar presentations

Presentation on theme: "Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin."— Presentation transcript:

Similar presentations

About project

Feedback