Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004.

Similar presentations


Presentation on theme: "Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004."— Presentation transcript:

1 Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004

2 2April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs

3 3April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech

4 4April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech Embedded Languages

5 5April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Each kind of input stream ambiguity requires new language analyses Human Speech Embedded Languages

6 6April 3, 2004LDTA 2004 Speech Example for (int i = 0; i < 10; i++ ) {  } for int i equals zero i less than ten i plus plus

7 7April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) {  } 4 int eye equals 0 aye less then 10 i plus plus

8 8April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) {  } 4 int eye equals 0 aye less then 10 i plus plus KW or #? ID Spelling? KW or ID?

9 9April 3, 2004LDTA 2004 Another Utterance for times ate equals zero two plus equals one

10 10April 3, 2004LDTA 2004 Many Valid Parses! 4 * 8 = zero; to += won  for times ate equals zero two plus equals one for (times; ate == 0; to += 1) {  } fore.times(8).equalsZero(2, plus == 1) 

11 11April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);

12 12April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID); Why not this interpretation? [_a-zA-Z]([_a-zA-Z0-9])* i++ ; RETURN_TOKEN(ID);

13 13April 3, 2004LDTA 2004 Legacy Language Example Fortran DO 57 I = 3,10

14 14April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10

15 15April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 DO 57 I = 3

16 16April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO 57 I = 3

17 17April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO57I = 3

18 18April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END

19 19April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END KW ID

20 20April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification

21 21April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification Embedded Languages Fall in all Four Categories!

22 22April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) {  } (

23 23April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) {  } ( Handles syntactic ambiguities

24 24April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI FOR I

25 25April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI 4EYE FOR I Handles input stream ambiguities

26 26April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

27 27April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

28 28April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack FOR KW 3

29 29April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

30 30April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack

31 31April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack 2 5

32 32April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack 2 FOR KW 3 FOR KW 4 5

33 33April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Not ShownExample 1 Multiple Lexical Categories Example 2Example 1 XGLR in Action

34 34April 3, 2004LDTA 2004 23FOR Parsing Homophones BAR

35 35April 3, 2004LDTA 2004 23FOR FORE 4 ID KW NUM XGLR Extension: Multiple Spellings, Single and Multiple Lexical Categories BAR FOUR

36 36April 3, 2004LDTA 2004 23FOR FORE 4 23 ID KW NUM XGLR Extension: Parsers fork due to input ambiguity BAR FOUR

37 37April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 23 ID KW NUM Each parser shifts its now unambiguous input BAR FOUR

38 38April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 BAR 23 ID KW NUM The next input is lexed unambiguously FOUR

39 39April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 BAR 23 42 49 ID KW NUM ID is only a valid lookahead for two parsers FOUR

40 40April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L  loop L d W END L loop L  LOOP L |  d W  WHILE W NUM W do W do W  DO W |  L W

41 41April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L  loop L d W END L loop L  LOOP L |  d W  WHILE W NUM W do W do W  DO W |  LOOP WHILE 34 END WHILE 56 DO END L W

42 42April 3, 2004LDTA 2004

43 43April 3, 2004LDTA 2004

44 44April 3, 2004LDTA 2004

45 45April 3, 2004LDTA 2004

46 46April 3, 2004LDTA 2004 S0 Parsing Embedded Languages LOOPWHILE34

47 47April 3, 2004LDTA 2004 S0LOOPWHILE34 Current parse state has ambiguous lexical language

48 48April 3, 2004LDTA 2004 S 0 0 W L LOOPWHILE34 XGLR Extension: Fork parsers, assign one to each lexical language

49 49April 3, 2004LDTA 2004 S 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: Single spelling, Multiple lexical categories Lex lookahead both in language L and W

50 50April 3, 2004LDTA 2004 S 4 L 0 0LOOP W L W L KW ID WHILE34 Only LOOP L is valid lookahead, and is shifted

51 51April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: State 4 has lexer lookaheads only in language W

52 52April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W Lex lookahead in language W

53 53April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W REDUCE by rule 2 and GOTO state 1 loop L

54 54April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W loop L

55 55April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W Shift into state 2 loop L

56 56April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W W NUM XGLR Extension: Lex lookahead in language W loop L

57 57April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM loop L

58 58April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3 S loop L

59 59April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3, which has ambiguous lexical language S loop L

60 60April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W 3 L XGLR Extension: Single spelling, Multiple lexical categories Fork parsers, assign one to each lexical language S loop L

61 61April 3, 2004LDTA 2004 GLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict

62 62April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict

63 63April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict 3. Fork parsers on ambiguous lexical language  Single spelling, Multiple lexical categories 4. Fork parsers on ambiguous lexical lookahead  Single/Multiple Spellings, Multiple lexical categories  Shift-shift conflict resolution

64 64April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests

65 65April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests Ambiguity causes parsers to fork GLR maintains efficiency by merging parsers when ambiguity is over

66 66April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 5

67 67April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 4 5 #

68 68April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 57 # A AA AA W 5 A

69 69April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W

70 70April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 W

71 71April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A

72 72April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A

73 73April 3, 2004LDTA 2004 Out of Sync Parsers DO57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A

74 74April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3

75 75April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3 3 A 5 W

76 76April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A

77 77April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A

78 78April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A

79 79April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W

80 80April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W

81 81April 3, 2004LDTA 2004 Implementation Keep map: lookahead  parser to use when looking for parsers to merge with Sort parsers by position of lookahead in the input –Enables pruning of map as parsers move past a particular input location –Extra memory required is bounded by dynamic separation between first and last parsers

82 82April 3, 2004LDTA 2004 Related Work GLR Parsing Algorithm –Tomita [1985] –Farshi [1991] –Rekers [1992] –Johnstone et. al. [2002] Incremental GLR –Wagner [1997] GLR Implementations (that I heard of before today) –ASF+SDF [1993] –Elkhound [2004] –Bison [2003] –DParser [2002] –Aycock and Horspool [1999] Scannerless Parsing (or Context-Free Scanning) –Salomon and Cormack [1989] –Visser [1997] van den Brand [2002] Ambiguous Input Streams –Aycock and Horspool [2001] Embedded Languages –ASF+SDF [1997] –Van de Vanter and Boshernitsan (CodeProcessor) [2000]

83 83April 3, 2004LDTA 2004 Future Work Semantic Analysis of Embedded Languages Automated Semantic Disambiguation

84 84April 3, 2004LDTA 2004 Contributions 1. Generalized GLR to handle input stream ambiguities 2. Classified input stream ambiguities into four categories 3. Implemented XGLR algorithm in Harmonia framework 4. Constructed combined lexer and parser generator to support embedded languages and lexical ambiguities at each stage of analysis 5. Enabled analysis of embedded languages, programming by voice, and legacy languages

85 85April 3, 2004LDTA 2004


Download ppt "Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004."

Similar presentations


Ads by Google