Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005.

Similar presentations


Presentation on theme: "Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005."— Presentation transcript:

1 Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005

2 Rules and Grammars Perl 6 completely redesigns the regular expression syntax Regular expressions are now "rules" Rules can call/embed other rules Groups of rules can be combined into Grammars

3 Current events in Perl 6 Parrot 1.2 released The Perl Foundation receives $25,000 for completion of Parrot milestones New Parrot pumpking - Chip Salzenburg New version of Parrot Grammar Engine (PGE / Perl 6 rules) to be released this week Pugs - Autrijus Tang Perl 6 test suite

4 Pugs Perl 6 compiler written in Haskell Started by Autrijus Tang Compiles directly to Haskell or to Parrot AST Being used to develop Perl 6 tests and experiment with Perl 6 design Available at http://pugscode.org Discussion on perl6-compiler@perl.org mailing list

5 Perl 6 rules / Parrot Grammar Engine The heart of the Perl 6 compiler is the Perl/Parrot Grammar Engine (PGE) Implements the Perl 6 rules syntax, compiles to Parrot code Perl 6 rules compiler currently written in C Bootstrap to Perl 6

6 Steps to Perl 6 compiler Finish PGE bootstrap in C Parse p6 "rule" statements and grammars Use p6 rules to define the Perl 6 grammar P6 grammar can be used to generate Parrot abstract syntax trees from Perl 6 programs Compile, (optimize), execute the abstract syntax tree to get working Perl 6 program Use Perl 6 to rewrite the grammar engine in Perl 6 (faster)

7 Current state of PGE Handles concatenation, alternation, quantifiers, captures*, subpatterns, subrules Capture semantics redefined in Dec 2004, still not final To be added next Character classes (note: Unicode) Patterns containing scalars, arrays, hashes

8 P6 rule syntax Changes from perl 5 No more trailing /e, /x, /s options [...] denotes non-capturing groups ^ and $ are beginning/end of string ^^ and $$ are beginning/end of line. matches any character, including newline \n and \N match newline/non-newline # marks a comment (to end of line) Quantifiers are *, +, ?, and **{m..n}

9 Character classes [aeiou] changed to [^0-9] now Properties defined as Combine classes using +/- syntax: -[aeiou]>

10 Subrules Patterns are now called "rules" Analogous to subroutines and closures Like {...}, /.../ compiles into a "rule" subroutine P6 rule statement allows named rules: rule ident / [ |_] \w* /; Named rules can be easily used in other rules: m / \:= (.*) /; rule expr / [ ]* /;

11 Interpolation Variables no longer interpolate directly, thus / $var / matches the contents of $var literally, even if it contains rule metacharacters. (No \Q and \E) To treat $var as a rule, use / Interpolated arrays match as an alternation: / @cmds / / [ @cmds[0] | @cmds[1] | @cmds[2] |... ] /

12 Interpolation, cont'd Hashes match the keys of the hash, and the value of the hash is either Executed if it is a closure Treated as a subrule if it's a string or rule object Succeeds if value is 1 Fails for any other value Useful for parsed languages rule expr / [ %infixop ]? /

13 The introduce various forms of metasyntax A leading alphabetic character indicates a subrule or grammatical assertion A leading ! negates the match

14 Leading ' matches a literal string Leading " matches an interpolated string Leading '+' or '-' are character classes / >/

15 Leading '(' indicates code assertion /(\d**{1..3}) / # (fail if $1 is not less than 256) A $, @, or % indicates a variable subrule, where each value (or key) is a subrule to be matched

16 A cool and somewhat scary example %cmd{'^\d+'} = { say "You entered a number" }; %cmd{'^hello'} = { say "world" }; %cmd{'^print \s (.*)'} = { say $1; }; %cmd{'^exit'} = { exit() }; while =$*IN { / / || say "Unrecognized command"; }

17 Backtracking control Single colons skip previous atom m/ \( [, ]* : \) / (if we don't find closing paren, no point in trying to match fewer s) Two colons break an alternation: m:w/ [ if :: | for :: | loop :: ? ] (once we've found "if", "for", or "loop", no point in trying the other branches of the alternation)

18 Backtracking control Three colons (:::) fail the current rule The assertion fails the entire match (including any rules that called the current rule) The assertion matches successfully, removes the matched portion of the string up to the, and if backtracked over fails the match entirely Useful for throwing away successfully processed input when matching from an input stream Like, say, when writing a compiler :-)

19 Backslash \L, \U, \Q, \E, \A, \z gone from rules \n and \N match newline/not newline \s matches any Unicode space backreferences are gone, use $1, $2, $3 (non-interpolated) Perl 6 allows defining custom backslash sequences for use in rules

20 Closures Anything in curlies is executed as a Perl 6 closure / (\w+) { say "Got $1"; } /

21 Capture semantics Captures are different in Perl 6 The result of a match is a "match object" If a match succeeds, the match object has: Boolean value true Numeric value 1 (except for global matches) String value the matched substring Array component is matched subpatterns Hash component is matched subrules

22 Subpattern captures Part of a rule in parenthesis is a subpattern Each subpattern produces its own match object /Scooby (dooby) (doo)!/ $1 $2 Quantified subpatterns produce arrays of match objects: /Scooby (\w+ \s+)* (doo)!/ $1 $2 $1 is a (possibly empty) array of matches

23 Non-capturing groups Brackets do not capture, thus they don't result in a match object /Scooby [ (\w+ \s+)* (doo) ]!/ $1 $2 Quantified brackets replace nested subpatterns with the last component matched: /Scooby [ (\w+ \s+)* (doo) ]+ !/ $1 $2

24 Nested capturing subpatterns Each capturing subpattern introduces a new lexical scope, with nested captures inside the new match object: /Scooby ( (\w+ \s+)* (doo) ) !/ $1[0] $1[1]

25 Alternations Alternations introduce a new lexical scope, thus subpatterns restart counting at zero for each alternative branch (unlike p5): $1 $2 m/ Scooby (dooby)* (doo)! | Yabba (dabba)* (doo) / $1 $2 This avoids lots of empty subpatterns when an alternation doesn't match.

26 Subrules Subrules capture into a hash keyed by the name of the subrule: rule ident / [ |_] \w* /; rule num / \d+ /; m/ \:= /; places match objects into $ and $

27 Quantified subrules Like subpatterns, quantified subrules produce arrays of matches m:w / dir * / produces matches in $ [0], $ [1], etc. Nested parens in a subrule capture to the subrule's match object

28 Named captures Portions of a match can be captured directly into a match object without a subrule: m:w/ $ := \w+, := \d+ / captures the first sequence of alphanumerics into $, and digits following the comma into $.

29 Grammars Rules can be packaged together into separate name spaces to form Grammars grammar Perl6 { rule ident {... }; rule term {... }; rule expr {... }; }

30 :parsetree The :parsetree flag to a rule causes the grammar engine to keep all information about a match. Thus, one can do something like $parse = ($source ~~ Perl6::program); to get the entire parsetree for a program (including comments)

31 Questions?


Download ppt "Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005."

Similar presentations


Ads by Google