1 Testing Stochastic Processes Through Reinforcement Learning François Laviolette Sami Zhioua Nips-Workshop December 9 th, 2006 Josée Desharnais.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

A Randomized Satisfiability Procedure for Arithmetic and Uninterpreted Function Symbols Sumit Gulwani George Necula EECS Department University of California,

Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

Sugar 2.0 Formal Specification Language D ana F isman 1,2 Cindy Eisner 1 1 IBM Haifa Research Laboratory 1 IBM Haifa Research Laboratory 2 Weizmann Institute.

1 Instituto de Sistemas e Robótica 10th IEEE MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION Instituto Superior Técnico – Instituto de Sistemas e Robótica.

Generative Design in Civil Engineering Using Cellular Automata Rafal Kicinger June 16, 2006.

Chapter 1 The Study of Body Function Image PowerPoint

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

Source of slides: Introduction to Automata Theory, Languages and Computation.

7.5 Glide Reflections and Compositions

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $

RRC-06 MIGS, EXCESS and EQUITABLE ACCESS EBU TECHNICAL DEPARTMENT TERRY OLEARY.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Learning to show the remainder

Lexical Analysis Dragon Book: chapter 3.

1 Relational Algebra, Part III, and Other Operators Hugh Darwen CS252.HACD: Fundamentals of Relational.

Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.

Solve Multi-step Equations

BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.

Discrete Math by R.S. Chang, Dept. CSIE, NDHU1 Languages: Finite State Machines Chapter 6 problemsstrings (languages) machines answers.

ABC Technology Project

1 Capacity analysis of mesh networks with omni or directional antennas Jun Zhang and Xiaohua Jia City University of Hong Kong.

O X Click on Number next to person for a question.

“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Squares and Square Root WALK. Solve each problem REVIEW:

Artificial Intelligence (AI) Computer Science cpsc502, Lecture 17

© 2012 National Heart Foundation of Australia. Slide 2.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Chapter 5 Test Review Sections 5-1 through 5-4.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

Addition 1’s to 20.

25 seconds left…...

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

1 Unit 1 Kinematics Chapter 1 Day

O X Click on Number next to person for a question.

PSSA Preparation.

IP, IST, José Bioucas, Probability The mathematical language to quantify uncertainty  Observation mechanism:  Priors:  Parameters Role in inverse.

How Cells Obtain Energy from Food

Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.

Compiler Construction LR(1) Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.

Delta-Oriented Testing for Finite State Machines

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Secret Sharing, Matroids, and Non-Shannon Information Inequalities.

Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.

Presentation transcript:

1 Testing Stochastic Processes Through Reinforcement Learning François Laviolette Sami Zhioua Nips-Workshop December 9 th, 2006 Josée Desharnais

2 Outline Program Verification Problem The Approach for trace-equivalence Other equivalences Conclusion Application on MDPs

3 Stochastic Program Verification Specification (LMP): an MDP without rewards Implementation s0s0 s1s1 s3s3 s6s6 s2s2 s4s4 s5s5 a[0.5]a[0.3] b[0.9] c c How far the Implementation is from the Specification ? (Distance or divergence)  The Specification model is available.  The Implementation is available only for interaction (no model).

4 1. Non deterministic trace equivalence      P a ac   b c b a c c bb      Q a ba   c c b a a b   c a  b Trace Equivalence Two systems are trace equivalent iff they accept the same set of traces T(P) = {a, aa, aac, ac, b, ba, bab, c, cb,cc} T(Q) = {a, ab, ac, abc, abca, ba, bab, c, ca} 2. Probabilistic trace equivalence Two systems are trace equivalent iff they accept the same set of traces and with the same probabilities      P a[2/3] a[1/3]b[2/3]    a[1/4] c b a[3/4] c a b[1/2]c[1/2] a7/12 aa5/12 aac1/6 bc2/3 …      Q a[1/3] a[1/2]   b c b a[1/4] a[3/4] b[1/2]   c a  a1 aa1/2 aac0 bc0 …

5 Testing (Trace Equivalence)  The system is a black box. The button goes down (transition) The button does not go down (no transition)  When a button is pushed (action execution) Grammar (trace equiv): t ::=  | a.t Observations :  When a test is executed, several observations are possible : O t. b[0.7] s0s0 s3s3 a[0.2]a[0.5] [2,4)[7,10]  Example: O t = {a , a.b , a.b } t = a.b.  0.14 abz

6 Outline Program Verification Problem The Approach for trace-equivalence Other equivalences Conclusion Application on MDPs

7 Why Reinforcement Learning ? s0s0 s1s1 s4s4 s2s2 s5s5 s6s6 a[0.2] a[0.5] b[0.7] a[0.3]a s7s7 b s3s3 b[0.9] a[0.7] s8s8 s0s0 s1s1 s2s2 s3s3 s4s4 s6s6 s7s7 s8s8 s5s5 ab aa b ab LMP MDP  Reinforcement Learning is particularly efficient in the absence of the full model  Reinforcement Learning can deal with bigger systems.  Analogy :  LMP MDP  Trace Policy  Divergence Optimal Value ( V* )

8 A Stochastic Game towards RL F S S  F S F S F F  S F S  F F S  S S F  S S S  F F F b[0.7] s0s0 s1s1 s3s3 s6s6 s2s2 s4s4 s5s5 a[0.2]a[0.5] b[0.3]a c[0.4] s7s7 c[0.2] s 10 b s8s8 b Implementation Specification s0s0 s1s1 s3s3 s2s2 s4s4 s5s5 a[0.2] a[0.3] b[0.7] b[0.3]a s7s7 s9s9 c[0.8] c[0.7] s 10 b s8s8 b b[0.9] Specification (clone) s0s0 s1s1 s3s3 s2s2 s4s4 s5s5 a[0.2] a[0.3] b[0.7] b[0.3]a s7s7 s9s9 c[0.8] c[0.7] s 10 b s8s8 b b[0.9]  Reward : (+1) when Impl  Spec  Reward : (-1) when Spec  Clone

9 MDP Defintion  MDP : Specification LMPStates Actions Next-state probability distribution MDP s0s0 s1s1 s3s3 s6s6 s2s2 s4s4 s5s5 a[0.2]a[0.5] b[0.7] b[0.3]a c[0.4] s7s7 c[0.2] s 10 b s8s8 b s0s0 s1s1 s3s3 s2s2 s4s4 s5s5 a[0.2] a[0.5] b[0.7] b[0.3]a s7s7 s9s9 c[0.8] c[0.7] s 10 b s8s8 b Implémentation Spécification b[0.9] s0s0 s1s1 s2s2 s3s3 s3s3 s4s4 s8s8 s9s9 s5s5 s7s7 s ab a b cb c b Dead

10 Divergence Computation F S S  F S F S F F  S F S  F F S  S S F  S S S  F F F  V*(s 0 ) 0 : Equivalent 1 : Different ** s0s0 s1s1 s3s3 s6s6 s2s2 s4s4 s5s5 a[0.2]a[0.5] b[0.7] b[0.3]a c[0.4] s7s7 c[0.2] s 10 b s8s8 b s0s0 s1s1 s3s3 s2s2 s4s4 s5s5 a[0.2] a[0.5] b[0.7] b[0.3]a s7s7 s9s9 c[0.8] c[0.7] s 10 b s8s8 b Implementation Specification b[0.9] MDP s0s0 s1s1 s2s2 s3s3 s3s3 s4s4 s8s8 s9s9 s5s5 s7s7 s ab a b cb c b Dead

11 Symmetry Problem Implementation Specification F S S S F F  F F S  S S F  Create two variants for each action (a): Success variant ( a ) Failure variant ( a  ) s0s0 s1s1 a[1] s0s0 s1s1 a[0.5] Spec (Clone) s0s0 s1s1 a[0.5] Compute and give reward Give reward 0 Select action make a prediction (, ×) If pred = obs If pred  obs Prediction: execute action Prob=0*.5*.5+1*.5*.5 =.25

12 The Divergence (with the symmetry problem fixed) Theorem. Let "Spec" and "Impl" be two LMPs, and M their induced MDP. V*(s 0 ) ≥ 0, and V*(s 0 ) = 0 iff "Spec" and "Impl" are trace-equivalent.

13 Implementation and PAC Guaranty  There exists a PAC Guaranty for Q-Learning Algorithm but..   Fiechter algorithm has a simpler PAC guaranty.  Besides, it is possible to obtain a bottom bound thanks to the Hoeffding inequality : If then : Implementation :  = 0.8 Action selection : softmax (  decreasing from 0.8 to 0.01 ) RL algorithm : Q-Learning  decreasing according to the function 1/x PAC guaranty :

14 Outline Program Verification Problem The Approach for trace-equivalence Other equivalences Conclusion Application on MDPs

15 Testing (Bisimulation)  The system is a black box. Grammar t ::=  | a.t abz b[0.7] s0s0 s3s3 a[0.2]a[0.5] [2,4)[7,10]  Example: O t = {a , a.(b , b  ), a.(b ,b ), a.(b,b  ), a.(b,b )} t = a.(b,b) P t,s 0 : Replication | (t 1, …, t n ) (bisimulation) :

16   P a   c b[1/3]c[2/3]  c   a[1/3]a[2/3] b  c  Q New Equivalence Notion  ‘’By-Level Equivalence’’

17 K-Moment Equivalence t ::=  | a.t t ::=  | a k.t k  2 1-moment (trace) 2-moment 3-moment t ::=  | a k.t k  3 : is a random variable such that is the probability to perform the trace  and make a transition to a state that accepts action a with probability p i. is equal to Two systems are “By-level’’ equivalent  Recall : k th moment of X = E(X k ) =  ( x i k. Pr(X=x i ) )   k

18 Ready Equivalence and Failure equivalence 1. Ready Equivalence Two systems are Ready equivalent iff for any trace tr and any set of actions A, they have the same probability to run successfully tr and reach a process accepting all actions from A..      P a[1/3] a[2/3]    b c b a[1/4] a[3/4] c a b[1/2]      Q a[1/3] a[1/2]   b c b a[1/4] a[3/4] b[1/2]   c a  (,{b,c}) 2/3(,{b,c}) 1/2 Test t ::=  | a.t | {a 1,.., a n } 1. Failure Equivalence      P a[1/3] a[2/3]    b c b a[1/4] a[3/4] c a b[1/2]      Q a[1/3] a[1/2]   b c b a[1/4] a[3/4] b[1/2]   c a  (,{b,c}) 1/3(,{b,c}) 1/2 Two systems are Ready equivalent iff for any trace tr and any set of actions A, they have the same probability to run successfully tr and reach a process refusing all actions from A. Test t ::=  | a.t | {  a 1,..,  a n }

19 1. Barb acceptation      P a[1/3] a[2/3]    b c b a[1/4] a[3/4] c a b[1/2]      Q a[1/3] a[1/2]   b c b a[1/4] a[3/4] b[1/2]   c a  Barb equivalence (, ) 2/3 2. Barb Refusal      P a[1/3] a[2/3]    b c b a[1/4] a[3/4] c a b[1/2]      Q a[1/3] a[1/2]   b c b a[1/4] a[3/4] b[1/2]   c a  (, ) 1/3 Test t ::=  | a.t | {a 1,.., a n }a.t Test t ::=  | a.t | {  a 1,..,  a n }a.t

20 Outline Program Verification Problem The Approach for trace-equivalence Other equivalences Conclusion Application on MDPs

21 MDP 1 s0s0 s1s1 s2s2 s3s3 s3s3 s4s4 s8s8 s9s9 s5s5 s7s7 ab a b cb c r1 r2r3 r4r5 r7r8r6 s0s0 s1s1 s2s2 s3s3 s4s4 s6s6 s7s7 s8s8 s5s5 ab aa b ab r1 r2r3 r4r5 r7r8 Application on MDPs MDP 2 Case 3 : The reward space is very large (continuous) : w.l.o.g. [0,1] Case 1 : The reward space contains 2 values (binary) : 0 and 1 Case 2 : The reward space is small (discrete) : {r 1, r 2, r 3, r 4, r 5 }

22 Application on MDPs Case 1 : The reward space contains 2 values (binary) r1 : 0 F r2 : 1 S Case 2 :The reward space is small (discrete) {r 1, r 2, r 3, r 4, r 5 } a r1r1 a r2r2 a r3r3 a r4r4 a r5r5 b r1r1 b r2r2 b r3r3 b r4r4 b r5r5 F S Case 3 :The reward space is very large (continuous) Intuition : r = 3/4 1 with probability 3/4 ar pick a reward value (ranVal) randomly ranVal  r ranVal < r S F 0 with probability 1/4

23 Current and Future Work  Application to different equivalence notions : - Failure equivalence - Ready equivalence - Barb equivalence, etc.  Experimental analysis on realistic systems  Applying the approach to compute the divergence between : - HMMs - POMDPs  Studying the properties of the divergence - Probabilistic automata