1 Lecture 11 Implementing Small Languages internal vs. external DSLs, hybrid small DSLs Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164:

Slides:



Advertisements
Similar presentations
Case Study: A Recursive Descent Interpreter Implementation in C++ (Source Code Courtesy of Dr. Adam Drozdek) Payap University ICS220 - Data Structures.
Advertisements

1 Regexes vs Regular Expressions; and Recursive Descent Parser Ras Bodik, Thibaud Hottelier, James Ide UC Berkeley CS164: Introduction to Programming Languages.
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
1 Lecture 12 Implementing Small Languages internal vs. external DSLs, hybrid small DSLs Ras Bodik Ali and Mangpo Hack Your Language! CS164: Introduction.
Programming Types of Testing.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
CPSC Compiler Tutorial 9 Review of Compiler.
Bellevue University CIS 205: Introduction to Programming Using C++ Lecture 3: Primitive Data Types.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
1/18 CS 693/793 Lecture 09 Special Topics in Domain Specific Languages CS 693/793-1C Spring 2004 Mo, We, Fr 10:10 – 11:00 CH 430.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Compiler Construction
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
Describing Syntax and Semantics
Guide To UNIX Using Linux Third Edition
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
ANTLR.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Created by, Author Name, School Name—State FLUENCY WITH INFORMATION TECNOLOGY Skills, Concepts, and Capabilities.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
CMPS 211 JavaScript Topic 1 JavaScript Syntax. 2Outline Goals and Objectives Goals and Objectives Chapter Headlines Chapter Headlines Introduction Introduction.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Rake. 2 Automated Build Any non-trivial project needs facility to automate builds –Routine common tasks that need to be carried out several times a.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
A Little Language for Surveys: Constructing an Internal DSL in Ruby H. Conrad Cunningham Computer and Information Science University of Mississippi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
CS 0401 Debugging Hints and Programming Quirks for Java by John C. Ramirez University of Pittsburgh.
Parser Generation Using SLK and Flex++ Copyright © 2015 Curt Hill.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
CS116 COMPILER ERRORS George Koutsogiannakis 1. How to work with compiler Errors The Compiler provide error messages to help you debug your code. The.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
Introduction to JavaScript MIS 3502, Spring 2016 Jeremy Shafer Department of MIS Fox School of Business Temple University 2/2/2016.
1 Lecture 14 Data Abstraction Objects, inheritance, prototypes Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
© Peter Andreae Java Programs COMP 102 # T1 Peter Andreae Computer Science Victoria University of Wellington.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Compiler Design (40-414) Main Text Book:
Lecture 2 Lexical Analysis
Programming Languages Translator
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
RegExps & DFAs CS 536.
-by Nisarg Vasavada (Compiled*)
Compiler Lecture 1 CS510.
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CS 3304 Comparative Languages
Chapter 10: Compilers and Language Translation
Compiler Construction
Lecture 5 Scanning.
Presentation transcript:

1 Lecture 11 Implementing Small Languages internal vs. external DSLs, hybrid small DSLs Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2012 UC Berkeley

Where are we? Lectures are exploring small languages both design and implementation Lecture 10: regular expressions we’ll finish one last segment today Lecture 11: implementation strategies how to embed a language into a host language Lecture 12: problems solvable with small languages ideas for your final project (start thinking about it) 2

Today Semantic differences between regexes and Res Internal DSLs Hybrid DSLs External DSLs 3

Answer to 2 nd challenge question from L10 Q: Give a JavaScript scenario where tokenizing depends on the context of the parser. That is, lexer cannot tokenize the input entirely prior to parsing. A: In this code fragment, / / could be div’s or a regex: e / f / g 4

Recall from L10: regexes vs REs Regexes are implemented with backtracking This regex requires exponential time to discover that it does not match the input string X==============. regex: X(.+)+X REs are implemented by translation to NFA NFA may be translated to DFA. Resulting DFA requires linear time, ie reads each char once 5

The String Match Problem Consider the problem of detecting whether a pattern (regex or RE) matches an (entire) string match(string, pattern) --> yes/no The regex and RE interpretations of any pattern agree on this problem. That is, both give same answer to this Boolean question Example: X(.+)+X It does not matter whether this regex matches the string X===X with X(.)(..)X or with X(.)(.)(.)X, assigning different values to the ‘+’ in the regex. While there are many possible matches, all we are about is whether any match exists. 6

Let’s now focus on when regex and RE differ Can you think of a question that where they give a different answer? Answer: find a substring 7

Example from Jeff Friedl’s book Imagine you want to parse a config file: filesToCompile=a.cpp b.cpp The regex for this command line format: [a-zA-Z]+=.* Now let’s allow an optional \n-separated 2 nd line: filesToCompile=a.cpp b.cpp \ d.cpp e.h We extend the original regex correspondingly: [a-zA-Z]+=.*(\\\n.*)? This regex does not match our two-line input. Why?

What compiler textbooks don’t teach you The textbook string matching problem is simple: Does a regex r match the entire string s? –a clean statement suitable for theoretical study –here is where regexes and FSMs are equivalent In real life, we face the sub-string matching problem: Given a string s and a regex r, find a substring in s matching r. -tokenization is a series of substring matching problems

Substring matching: careful about semantics Do you see the language design issues? –There may be many matching substrings. –We need to decide which substring to return. It is easy to agree where the substring should start: –the matched substring should be the leftmost match They differ in where the string should end: - there are two schools: RE and regex (see next slide)

Where should the matched string end? Declarative approach: longest of all matches –conceptually, enumerate all matches and return longest Operational approach: define behavior of *, | operators e*match e as many times as possible while allowing the remainder of the regex t o match (greedy semantics) e|eselect leftmost choice while allowing remainder to match [a-zA-Z]+ =.* ( \\ \n.* )? filesToCompile=a.cpp b.cpp \ d.cpp e.h

These are important differences We saw a non-contrived regex can behave differently –personal story: I spent 3 hours debugging a similar regex –despite reading the manual carefully The (greedy) operational semantics of * –does not guarantee longest match (in case you need it) –forces the programmer to reason about backtracking It may seem that backtracking is nice to reason about –because it’s local: no need to consider the entire regex –cognitive load is actually higher, as it breaks composition 12

Where in history of re did things go wrong? It’s tempting to blame perl –but the greedy regex semantics seems older –there are other reasons why backtracking is used Hypothesis 1:creators of re libs knew not that NFA can –can be the target language for compiling regexes –find all matches simultaneously (no backtracking) –be implemented efficiently (convert NFA to DFA) Hypothesis 2: their hands were tied –Ken Thompson’s algorithm for re-to-NFA was patented With backtracking came the greedy semantics –longest match would be expensive (must try all matches) –so semantics was defined greedily, and non-compositionally

Regular Expressions Concepts Syntax tree-directed translation (re to NFA) recognizers: tell strings apart NFA, DFA, regular expressions = equally powerful but \1 (backreference) makes regexes more pwrful Syntax sugar: e+ to e.e* Compositionality: be weary of greedy semantics Metacharacters: characters with special meaning 14

Internal Small Languages a.k.a. internal DSLs 15

Embed your DSL into a host language The host language is an interpreter of the DSL Three levels of embedding where we draw lines is fuzzy (one’s lib is your framework) 1) Library 2) Framework (parameterized library) 3) Language 16

DSL as a library When DSL is implemented as a library, we often don’t think of it as a language even though it defines own abstractions and operations Example: network sockets Socket f = new Socket(mode) f.connect(ipAddress) f.write(buffer) f.close() 17

The library implementation goes very far rfig: formatting DSL embedding into Ruby. see slide 8 in …

The animation in rfig, a Ruby-based language slide!('Overlays', 'Using overlays, we can place things on top of each other.', 'The pivot specifies the relative positions', 'that should be used to align the objects in the overlay.', overlay('0 = 1', hedge.color(red).thickness(2)).pivot(0, 0), staggeredOverlay(true, # True means that old objects disappear 'the elements', 'in this', 'overlay should be centered', nil).pivot(0, 0), cr, pause, # pivot(x, y): -1 = left, 0 = center, +1 = right staggeredOverlay(true, 'whereas the ones', 'here', 'should be right justified', nil).pivot(1, 0), nil) { |slide| slide.label('overlay').signature(8) } 19

DSL as a framework It may be impossible to hide plumbing in a procedure these are limits to procedural abstraction Framework, a library parameterized with client code typically, you register a function with the library library calls this client callback function at a suitable point ex: an action to perform when a user clicks on DOM node 20

Example DSL: jQuery Before jQuery var nodes = document.getElementsByTagName('a'); for (var i = 0; i < nodes.length; i++) { var a = nodes[i]; a.addEventListener('mouseover', function(event) { event.target.style.backgroundColor=‘orange'; }, false ); a.addEventListener('mouseout', function(event) { event.target.style.backgroundColor=‘white'; }, false ); } jQuery abstracts iteration and events jQuery('a').hover( function() { jQuery(this).css('background-color', 'orange'); }, function() { jQuery(this).css('background-color', 'white'); } ); 21

Embedding DSL as a language Hard to say where a framework becomes a language not too important to define the boundary precisely Rules I propose: it’s a language if 1)its abstractions include compile- or run-time checks --- prevents incorrect DSL programs ex: write into a closed socket causes an error 2)we use syntax of host language to create (an illusion) of a dedicated syntax ex: jQuery uses call chaining to pretend it modifes a single object: jQuery('a').hover( … ).css( …) 22

rake rake: an internal DSL, embedded in Ruby Author: Jim Weirich functionality similar to make –has nice extensions, and flexibility, since it's embedded –ie can use any ruby commands even the syntax is close (perhaps better): –embedded in Ruby, so all syntax is legal Ruby 23

Example rake file task :codeGen do # do the code generation end task :compile => :codeGen do # do the compilation end task :dataLoad => :codeGen do # load the test data end task :test => [:compile, :dataLoad] do # run the tests end 24

Ruby syntax rules Ruby procedure call 25

How is rake legal ruby? Deconstructing rake (teaches us a lot about Ruby): task :dataLoad => :codeGen do # load the test data end task :test => [:compile, :dataLoad] do # run the tests end 26

Two kinds of rake tasks File task: dependences between files (as in make) file 'build/dev/rake.html' => 'dev/rake.xml' do |t| require 'paper' maker = PaperMaker.new t.prerequisites[0], t.name maker.run end 27

Two kinds of tasks Rake task: dependences between jobs task :build_refact => [:clean] do target = SITE_DIR + 'refact/' mkdir_p target, QUIET require 'refactoringHome' OutputCapturer.new.run {run_refactoring} end 28

Rake can orthogonalize dependences and rules task :second do #second's body end task :first do #first's body end task :second => :first 29

General rules Sort of like make's %.c : %.o BLIKI = build('bliki/index.html') FileList['bliki/*.xml'].each do |src| file BLIKI => src end file BLIKI do #code to build the bliki end 30

Parsing involved: DSL in a GP language GP: general purpose language 31

Parsing involved: GP in a DSL language GP: general purpose language 32

External DSL Own parser, own interpreter or compiler Examples we have seen: 33

Reading Read the article about the rake DSL 34

Acknowledgements This lecture is based in part on Martin Fowler, “Using the Rake Build Language”Using the Rake Build Language Jeff Friedl, “Mastering Regular Expressions”Mastering Regular Expressions 35