Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 1 Strategic Programming in Java Pierre-Etienne Moreau Antoine Reilles Stratego User Day, December, 1 st, 2006.
Automated Verification with HIP and SLEEK Asankhaya Sharma.
Semantics Static semantics Dynamic semantics attribute grammars
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Programming Paradigms Introduction. 6/15/2005 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved. L1:
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Current Techniques in Language-based Security David Walker COS 597B With slides stolen from: Steve Zdancewic University of Pennsylvania.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
1 Compiler Construction Intermediate Code Generation.
1 Programming Languages (CS 550) Lecture Summary Functional Programming and Operational Semantics for Scheme Jeremy R. Johnson.
Getting started with ML ML is a functional programming language. ML is statically typed: The types of literals, values, expressions and functions in a.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
CSI 3125, Preliminaries, page 1 Programming languages and the process of programming –Programming means more than coding. –Why study programming languages?
Software Engineering and Design Principles Chapter 1.
Chapter 7Louden, Programming Languages1 Chapter 7 - Control I: Expressions and Statements "Control" is the general study of the semantics of execution.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
C Lecture Notes 1 Program Control (Cont...). C Lecture Notes 2 4.8The do / while Repetition Structure The do / while repetition structure –Similar to.
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
Guide To UNIX Using Linux Third Edition
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Chapter 7Louden, Programming Languages1 Chapter 7 - Control I: Expressions and Statements "Control" is the general study of the semantics of execution.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
{ Graphite Grigory Arashkovich, Anuj Khanna, Anirban Gangopadhyay, Michael D’Egidio, Laura Willson.
An Introduction to Programming and Algorithms. Course Objectives A basic understanding of engineering problem solving process. A basic understanding of.
Chapter 25 Formal Methods Formal methods Specify program using math Develop program using math Prove program matches specification using.
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 5: Software Design & Testing; Revision Session.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Introduction to Exception Handling and Defensive Programming.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Chapter 3 Part II Describing Syntax and Semantics.
Schema-based Program Synthesis and the AutoBayes System Part III Johann Schumann SGT, NASA Ames.
Solution of Nonlinear Functions
Ch. 13 Ch. 131 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (notes?) Dr. Carter Tiernan.
(1) ICS 313: Programming Language Theory Chapter 11: Abstract Data Types (Data Abstraction)
CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.
8.1 8 Algorithms Foundations of Computer Science  Cengage Learning.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
Operational Semantics of Scheme
Compiler Construction (CS-636)
GC211Data Structure Lecture2 Sara Alhajjam.
Loop Structures.
Topic: Functions – Part 2
Computer Programming.
More Selections BIS1523 – Lecture 9.
Hidden Markov Models Part 2: Algorithms
The Metacircular Evaluator
Lecture 23 Pages : Separating Syntactic Analysis from Execution. We omit many details so you have to read the section in the book. The halting.
FP Foundations, Scheme In Text: Chapter 14.
Lecture 12: Message passing The Environment Model
Week 4 Lecture-2 Chapter 6 (Methods).
6.001 SICP Interpretation Parts of an interpreter
SPL – PS1 Introduction to C++.
Chapter 13 Control Structures
Presentation transcript:

Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Example Generate a program that finds the maximum value of a function f(x): max f(x) wrt x univariatemultivariate Note: the function might be given as a formula or a vector of data

Schemas for univariate optimization schema(max F wrt X, C) :-... as before schema(max F wrt X, C) :- length(X, 1), % F is a vector of data points F(0..n) C = let(sequence([ assign(mymax,0), for(idx(I,0,n), if(select(F,I) > mymax, assign(mymax, select(F,I)), skip)... ]), comment([‘The maximum is found by iterating...’]), mymax). schema(max F wrt X, C) :- length(X, 1), % instantiate numeric solution algorithm % e.g., golden section search C =... schema(max F wrt X, C) :-....

Schema for univariate optimization 1.build the derivative: df/dx 2. set it to 0: 0 = df/dx 3.solve that equation for x 4.the solution is the desired maximum schema(max F wrt X, C) :- % INPUT (Problem), OUTPUT (Code fragment) % guards length(X, 1), % calculate the first derivative simplify(deriv(F, X), DF), % solve the equation solve(true, x, 0 = DF, S), % possibly more checks % is that really a maximum? simplify(deriv(DF, X), DDF), (solve(true, x, 0 > DDF, _) -> true ; writeln(‘Proof obligation not solved automatically’) ), XP = [‘The maximum for‘, expr(F), ‘is calculated...’], V = pv_fresh, C = let(assign(V, C, [comment(XP)]), V)..

Demo Generation of multiple programs –-maxprog –-maxprog N -fastest (coarse approximation) Control for numeric solvers –pragma schema_control_arbitrary_init_values –pragma schema_control_use_generic_optimize Tracing pragmas The necessity of constraints

Multivariate Optimization Task: minimize function F(X) wrt X Algorithm: double* minimze(F){ double* x0 = pick_start(); int converging = 1; while (converging){ double step_length = 0.1; double step_dir = -gradient(F,x0); x1 = x0 + step_length * step_dir; if (fabs(F(x1) - F(x0)) < 0.001) converging = 0; else x0 = x1; } start somewhere go down along the steepest slope when you come to a flat area, return that (local) minimum Many design decisions where to start? how to move? when to stop?

Multivariate Optimization schema(max F wrt X, C) :- % IN, OUT % guards: here none length(X,Y), Y > 1, % divide and solve subproblems schema(getStartValue(F,X), C_Start), % recursive schema calls schema(getStepDirection(F,X), C_Dir), schema(getStepSize(F,X), C_Size), % assemble code segment X0=pvar_new(X), % get a new PROGRAM variable C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ).

Multivariate optimization II The schemas generate code in an intermediate language procedural elements local variables, lambda blocks sum(..), while_converging(..) --> loops X0=pvar_new(X), C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ). double v_0; double E; v_0 = -99; E = 1e10; while (E > 0.001){ y = sin(v_0); v_0 = V_0 - cos(v_0) * 0.01; E = fabs(y - sin(v_0)); } generated code for max sin(v) wrt v Important: variables in specification or program are NOT Prolog variables

Why schema-based synthesis? Multiple algorithm variants can be automatically constructed The “best” one is chosen by the user or selected via constraints some possibilities for getStepDir

AB Schema Hierarchies Schemas to break down statistical problem –Bayesian independence theorems -- works on Bayesian graphs Schemas to solve complex statistical problems –instantiate (iterative) clustering algorithms –handling of time series problems Schemas to solve atomic problems –instantiate PDF and maximize (symbolically) –instantiate numerical solvers (see last slides) auxiliary schemas –initialization of clustering algorithms –data pre-processing (e.g., [0..1] normalization)

AB Schema Hierarchy Static tree structure AB uses two kinds of schemas –schemas for probabilistic problems –schemas for formula

Schemas and AB Model The AB schemas have to use all information from the input specification, which is stored in the Prolog data base (AB model) Problem: schemas can modify the model, which must be undone during backtracking –add new statistical variables –remove dependencies for subproblems Solutions: –add model as parameters: schema(Prob, C, M_in, M_out) and everywhere else –keep a model stack (similar to the dynamic calling environments in procedural languages) and use backtrackable asserts/retracts

Backtrackable Global Stuff Global data in Prolog are handled using assert/retract or flags. All other data are local to each clause p(X) :- q(X,Z), r(Z). % X, Y, Z local to clause Asserts are not backtrackable p(X) :- assert(keep(X)),..., fail. The “keep(X)” is kept in the data base even after backtracking Work-around: add global variables as parameter to all predicates (impractical) p(X, GL_in, GL_out) :- GL_out = [keep(X)|GL_in],... Backtrackable bassert/bretract requires some low-level additional C-programs (but has clean semantics)

Schema Control schema applicability is controlled via guards order of application: order in Prolog file How to enforce/avoid certain schemas –autobayes pragmas, but that’s not really fun –doesn’t work for nested applications: inner loop: symbolic solutions only outer loop: enable numeric loop –generate them all and decide later or pick “fastest” schema control language is a research topic –extend declarative AB language –how to talk about selection of iterative algorithm in a purely declarative language?

The AB Infra Structure term utilties rewriting engine symbolic system: –simplifier –abstraction (range, sign, definedness) –solver pretty printer (code, intermediate language) comment generation

Term utilities implemented on top of Prolog a lot of functional-programming style predicates for –lists, sets, bags, relations –terms, AC-terms operations –term_substitute, subsumption, differences between term sets...

Rewriting Engine A lot of stuff in AB is done using rewriting (but not all) small rewriting engine implemented in Prolog –rewriting rules are Prolog clauses –conditional rewriting, AC-style rewriting –Evaluation: eager: apply first top-down lazy: apply bottom up –continuation: pure bottom-up or dove-tailing –handle for attachment of prover/constraint solver –compilation of rewriting rules for higher efficiency

Rewriting Rules Can combine pure rewriting with Prolog programming in the body of the rewrite rule % NAME, STRATEGY, PROVER, ASSUMPTIONS, IN, OUT trig_simplify('sin-of-0', [eval=lazy|_],_,_, sin(0), 0) :- !. trig_simplify('sin-of-pi-over-6',[eval=lazy|_],_,_,sin(*([1/6, pi])), 1/2) :- !. trig_simplify('cos^2+sin^2',[eval=eager|_],_,_, +(Args), +([1|Args3])) :- select(cos(X)**2, Args, Args2), select(sin(X)**2, Args2, Args3), !.

Compilation and Rewriting Group and compile rewrite rules (statically) ?- rwr_compile(my_simplifications, [trig_simplify, remove_const_rules ] ). Call the rewriting engine rwr_cond(my_simplifications, true, S, T). Calling with time-out

Symbolic System Symbolic system implemented on top of the rewriting engine + Prolog code for solvers, etc assumption-based rewriting –X/Y -- (not(Y = 0)) --> X simplification (lots of rules) calculation of derivatives (deriv(F,X) as operator) Taylor-series expansion,... equation solver –polynomial solver –Gauss-elimination for sets of linear equations –sequentialization of equation systems

The AB Intermediate language strict separation between synthesis and code generation small procedural intermediate language with some extensions –sum(..), prod(..), simul_assign(..), while_converging(...) –Annotations for comments, and pre/post/inv formulas code generator for different languages/targets –C++/Octave –C/Matlab, C/standalone –ADA/SparkADA, Java (both “unsupported/in work/bad shape”) Pretty-printer to ASCII, HTML, LaTeX

Extending AutoBayes some extensions are straight-forward: add text-book formulas additional symbolic simplification rules might be required adding schemas requires substantial work –“hard-coded” schema as first step –applicability constraints and control –functional mechanisms to handle scalar/vector/matrix cases are available –support for documentation generation –no schema language, Prolog syntax used

Non-Gaussian PDF Data characteristics are modeled using probability density functions (PDFs) Example: Gaussians, exponential,... AB contains a number of built-in PDFs, which can be extended (hands-on demo) Having multiple PDFs adds a lot of power over libraries

Example For clustering, often Gaussian distribution of data is used. How about angles: 0 == 360 you get 5 clusters A different distribution (vonMises-Fisher) automatically solves this problem In AutoBayes: just replace the “gauss” by “vonmises1” -- no programming required multiple PDFs in one spec

Sample Generation We have used: –MODEL ---> P ---(data)--> parameters The model can be read the other way round: generate me random data, which are consistent with the model –MODEL ---> P ---(parameters)--> data Very useful for –model debugging/development –debugging and assessment of synthesized algorithms

AutoBayes and Correctness practical synthesis: forget about correct-by- construction, but detailed math derivations, which can be checked externally (e.g., by Mathematica) literature references in documentation/comments generation of test harness and sample data checking of safety properties (“AutoCert”) [Cade2002 slide set]

AutoBayes as a Prolog Program AutoBayes is a pretty large program –~180 prolog files, 100,000LoC (with AutoFilter) Heavy use of –meta-programming (call, etc.) –rewriting (using an engine implemented in Prolog) –functional programming elements for all sorts of list/vector/array handling –backtracking and backtrackable global data structures –procedural (non-logical) elements, e.g., file I/O, flags, etc. no use of modules but naming conventions everything SWI Prolog + few C extensions to handle backtrackable global counters and flags

AutoBayes Weak Points The input parser is very inflexible (uses Prolog operators) Very bad error messages–often just “no” no “schema language”: AutoBayes extension only by union of Prolog/domain specialist Only primitive control of schema selection: need for a schema- selection mechanism not all schemas are fully documented large code-base, which needs to be maintained

Summary AutoBayes suitable for a wide range of data analysis tasks AutoBayes generated customized algorithms AutoBayes schema-based program synthesis + symbolic logic + functional + procedural elements used AutoBayes extension: easy to very hard AutoBayes debugging: a pain, but explanations and LaTeX output very helpful AutoBayes is NASA OpenSource: bugfixes/extensions always welcome AutoBayes has a 160+ pages Users manual AutoBayes useful for classroom projects to PhD projects