Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004.

Slides:



Advertisements
Similar presentations
Programming Languages Third Edition Chapter 6 Syntax.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Mooly Sagiv and Roman Manevich School of Computer Science
September 28, 2004 Programming By Voice Andrew Begel Advisor: Prof. Susan L. Graham University of California, Berkeley VL/HCC 2004.
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
1 Chapter 5 Compilers Source Code (with macro) Macro Processor Expanded Code Compiler or Assembler obj.
Elkhound: A Fast, Practical GLR Parser Generator Scott McPeak 9/16/02 OSQ Lunch.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Parsing Wrap-up. from Cooper and Torczon2 Filling in the A CTION and G OTO Tables The algorithm Many items generate no table entry  Closure( ) instantiates.
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Invitation to Computer Science 5th Edition
Syntax and Semantics Structure of programming languages.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
LR Parsing Compiler Baojian Hua
1 Spoken Language Support for Software Development Andrew Begel Advisor: Susan L. Graham Computer Science Division, EECS University of California, Berkeley.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Parsing G Programming Languages May 24, 2012 New York University Chanseok Oh
1 Spoken Language Support for Software Development Andrew Begel Advisor: Susan L. Graham Computer Science Division, EECS University of California, Berkeley.
 an efficient Bottom-up parser for a large and useful class of context-free grammars.  the “ L ” stands for left-to-right scan of the input; the “ R.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
Ambiguity in Grammar By Dipendra Pratap Singh 04CS1032.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CPS 506 Comparative Programming Languages Syntax Specification.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
–Exercise: construct the SLR parsing table for grammar: S->L=R, S->R L->*R L->id R->L –The grammar can have shift/reduce conflict or reduce/reduce conflict.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
CS3230R. What is a parser? What is an LR parser? A bottom-up parser that efficiently handles deterministic context-free languages in guaranteed linear.
4. Bottom-up Parsing Chih-Hung Wang
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Syntax and Semantics Structure of programming languages.
Lecture 7 Syntax Analysis (5) Operator-Precedence Parsing
Tools and Analyses for Ambiguous Input Streams
CS 326 Programming Languages, Concepts and Implementation
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Chapter 4 Syntax Analysis.
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Michael Santacroce EECE 6083 Compiler Theory University of Cincinnati
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Bison Marcin Zubrowski.
R.Rajkumar Asst.Professor CSE
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
Compiler Construction
Syntax Analysis - 3 Chapter 4.
Kanat Bolazar February 16, 2010
USING AMBIGUOUS GRAMMARS
Lecture 11 LR Parse Table Construction
Presentation transcript:

Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004

2April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs

3April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech

4April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech Embedded Languages

5April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Each kind of input stream ambiguity requires new language analyses Human Speech Embedded Languages

6April 3, 2004LDTA 2004 Speech Example for (int i = 0; i < 10; i++ ) {  } for int i equals zero i less than ten i plus plus

7April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) {  } 4 int eye equals 0 aye less then 10 i plus plus

8April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) {  } 4 int eye equals 0 aye less then 10 i plus plus KW or #? ID Spelling? KW or ID?

9April 3, 2004LDTA 2004 Another Utterance for times ate equals zero two plus equals one

10April 3, 2004LDTA 2004 Many Valid Parses! 4 * 8 = zero; to += won  for times ate equals zero two plus equals one for (times; ate == 0; to += 1) {  } fore.times(8).equalsZero(2, plus == 1) 

11April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);

12April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID); Why not this interpretation? [_a-zA-Z]([_a-zA-Z0-9])* i++ ; RETURN_TOKEN(ID);

13April 3, 2004LDTA 2004 Legacy Language Example Fortran DO 57 I = 3,10

14April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10

15April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 DO 57 I = 3

16April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO 57 I = 3

17April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO57I = 3

18April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END

19April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END KW ID

20April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification

21April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification Embedded Languages Fall in all Four Categories!

22April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) {  } (

23April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) {  } ( Handles syntactic ambiguities

24April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI FOR I

25April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI 4EYE FOR I Handles input stream ambiguities

26April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

27April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

28April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack FOR KW 3

29April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

30April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

31April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack 2 5

32April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack 2 FOR KW 3 FOR KW 4 5

33April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Not ShownExample 1 Multiple Lexical Categories Example 2Example 1 XGLR in Action

34April 3, 2004LDTA FOR Parsing Homophones BAR

35April 3, 2004LDTA FOR FORE 4 ID KW NUM XGLR Extension: Multiple Spellings, Single and Multiple Lexical Categories BAR FOUR

36April 3, 2004LDTA FOR FORE 4 23 ID KW NUM XGLR Extension: Parsers fork due to input ambiguity BAR FOUR

37April 3, 2004LDTA FOR FORE ID KW NUM Each parser shifts its now unambiguous input BAR FOUR

38April 3, 2004LDTA FOR FORE BAR 23 ID KW NUM The next input is lexed unambiguously FOUR

39April 3, 2004LDTA FOR FORE BAR ID KW NUM ID is only a valid lookahead for two parsers FOUR

40April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L  loop L d W END L loop L  LOOP L |  d W  WHILE W NUM W do W do W  DO W |  L W

41April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L  loop L d W END L loop L  LOOP L |  d W  WHILE W NUM W do W do W  DO W |  LOOP WHILE 34 END WHILE 56 DO END L W

42April 3, 2004LDTA 2004

43April 3, 2004LDTA 2004

44April 3, 2004LDTA 2004

45April 3, 2004LDTA 2004

46April 3, 2004LDTA 2004 S0 Parsing Embedded Languages LOOPWHILE34

47April 3, 2004LDTA 2004 S0LOOPWHILE34 Current parse state has ambiguous lexical language

48April 3, 2004LDTA 2004 S 0 0 W L LOOPWHILE34 XGLR Extension: Fork parsers, assign one to each lexical language

49April 3, 2004LDTA 2004 S 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: Single spelling, Multiple lexical categories Lex lookahead both in language L and W

50April 3, 2004LDTA 2004 S 4 L 0 0LOOP W L W L KW ID WHILE34 Only LOOP L is valid lookahead, and is shifted

51April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: State 4 has lexer lookaheads only in language W

52April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W Lex lookahead in language W

53April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W REDUCE by rule 2 and GOTO state 1 loop L

54April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W loop L

55April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W Shift into state 2 loop L

56April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W W NUM XGLR Extension: Lex lookahead in language W loop L

57April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM loop L

58April 3, 2004LDTA W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3 S loop L

59April 3, 2004LDTA W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3, which has ambiguous lexical language S loop L

60April 3, 2004LDTA W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W 3 L XGLR Extension: Single spelling, Multiple lexical categories Fork parsers, assign one to each lexical language S loop L

61April 3, 2004LDTA 2004 GLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict

62April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict

63April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict 3. Fork parsers on ambiguous lexical language  Single spelling, Multiple lexical categories 4. Fork parsers on ambiguous lexical lookahead  Single/Multiple Spellings, Multiple lexical categories  Shift-shift conflict resolution

64April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests

65April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests Ambiguity causes parsers to fork GLR maintains efficiency by merging parsers when ambiguity is over

66April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 5

67April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 4 5 #

68April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 57 # A AA AA W 5 A

69April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W

70April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 W

71April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A

72April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A

73April 3, 2004LDTA 2004 Out of Sync Parsers DO57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A

74April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3

75April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3 3 A 5 W

76April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A

77April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A

78April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A

79April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W

80April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W

81April 3, 2004LDTA 2004 Implementation Keep map: lookahead  parser to use when looking for parsers to merge with Sort parsers by position of lookahead in the input –Enables pruning of map as parsers move past a particular input location –Extra memory required is bounded by dynamic separation between first and last parsers

82April 3, 2004LDTA 2004 Related Work GLR Parsing Algorithm –Tomita [1985] –Farshi [1991] –Rekers [1992] –Johnstone et. al. [2002] Incremental GLR –Wagner [1997] GLR Implementations (that I heard of before today) –ASF+SDF [1993] –Elkhound [2004] –Bison [2003] –DParser [2002] –Aycock and Horspool [1999] Scannerless Parsing (or Context-Free Scanning) –Salomon and Cormack [1989] –Visser [1997] van den Brand [2002] Ambiguous Input Streams –Aycock and Horspool [2001] Embedded Languages –ASF+SDF [1997] –Van de Vanter and Boshernitsan (CodeProcessor) [2000]

83April 3, 2004LDTA 2004 Future Work Semantic Analysis of Embedded Languages Automated Semantic Disambiguation

84April 3, 2004LDTA 2004 Contributions 1. Generalized GLR to handle input stream ambiguities 2. Classified input stream ambiguities into four categories 3. Implemented XGLR algorithm in Harmonia framework 4. Constructed combined lexer and parser generator to support embedded languages and lexical ambiguities at each stage of analysis 5. Enabled analysis of embedded languages, programming by voice, and legacy languages

85April 3, 2004LDTA 2004