Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction To Antlr

Similar presentations


Presentation on theme: "An Introduction To Antlr"— Presentation transcript:

1 An Introduction To Antlr

2 Content What is Antlr? Why use Antlr? How to use Antlr?
Components of Antlr grammar file Writing Lexer Class Writing Parser Class What does Antlr generates? Predicates Automatic Parse Tree generation Tree Parsing Conclusion

3 What is Antlr? ANTLR, ANother Tool for Language Recognition, is a pred-LL(k) parser and translator generator tool. It generates front end of compilers, and source-to- source translators grammatical descriptions in Java, C++, Python, C#.

4 Why Antlr? Antlr supports writing grammars in EBNF LL[k] that is very handy in compare to LR grammars. The generated code of Antlr is much more readable than others LR/LL parser, which makes debugging much more easy. Re-entrant parser Re-usability Antlr can outputs multiple languages. Lower Memory requirement as it doesn’t simulate a push down automata like LALR(yacc/bison)

5 Why Antlr?(contd.) Antlr generates code in Object oriented languages. So It allows to inherit the basic functionality and add your own functionality. Antlr supports exception handling, makes easy error recovery. Same meta-language specification for lexer/parser/tree parser. Antlr allows to build AST from input token stream.

6 How to use Antlr? calc.g is my antlr grammer file containing both lexer and parser. It contains the parser with name CalcParser class CalcParser extends Parser; It contains the lexer with name CalcLexer class CalcLexer extends Lexer; Now invoke ANTLR on the grammer file to generate the lexer and the parser code java -cp $ANTLR_HOME/antlr.jar antlr.Tool calc.g Compile the generated code gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcLexer.cpp gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcParser.cpp

7 How to use Antlr? Compile the main function with instance of lexer and parser class. Example of main() functions body CalcLexer lexer(cin); CalcParser parser(lexer); parser.expr(); gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall main.cpp Link the generated obj files with antlr static library to create the parser executable gcc main.o CalcLexer.o CalcParser.o $ANTLR_HOME/lib/cpp/src/libantlr.a -lstdc++

8 Writing Parser Class All parser rules must be associated with a parser class. A parser specification in a grammar file often looks like: { optional class code preamble } class YourParserClass extends Parser; options section tokens section { optional parser class members } parser rules

9 Options section The section is preceded by the ‘options’ keyword and contains a series of option/value assignments. options { importVocab = lexerVocab; k = 2; buildAst = true; defaultErrorHandler = true; }

10 Token section Token section contains all the keywords that parser will
use in parser rules. For example: tokens { "void"; "char"; "short"; "int"; ..... }

11 Rule Section The structure of an input stream of atoms is specified by a set of mutually-referential rules. Each rule has a a name, optionally a set of arguments, optionally an init-action, optionally a return value, and an alternative or alternatives. Each alternative contains a series of elements that specify what to match and where.

12 Rule Section(contd.) The basic form of an ANTLR rule is: rulename
: alternative_1 | alternative_2 ... | alternative_n ; If parameters are required for the rule, use the following form: rulename[formal parameters] : ... ;

13 Rule Section(contd.) If you want to return a value from the rule, use the returns keyword: rulename[formal parameters] returns [type id] : ... ; If you want to pass arguments to any rule reference use the following from: rulename : alternative_1[arg1, arg2] ; If the rule reference return any value, to capture that value simply assign that value to a variable using assignment.

14 Rule Section Init-action can also be specified for rule. rulename {
type id; } : id=alternative_1[arg1, arg2] ... ; Init-action can also be specified for rule. // init-action : ....;

15 Rule Section User action can follow any rule reference. It excutes after that rule reference have matched except in non guessing mode. rule : rule_ref1 { // user code } rule_ref2 { // user code } ; ANTLR supports extended BNF notation according to the following four subrule syntax.

16 Rule Section(contd.) ( P1 | P2 | ... | Pn ) ( P1 | P2 | ... | Pn )*

17 Writing Lexer Class All lexer rules must be associated with a lexer class. A lexer specification in a grammar file often looks like: { optional class code preamble } class YourLexerClass extends Lexer; options section tokens section { optional lexer class members } lexer rules

18 What does Antlr Generate?
Antlr will generate the following files from calc.g grammer. CalcLexer.hpp CalcLexer.cpp CalcParser.hpp CalcParser.cpp CalcLexerTokenTypes.hpp CalcLexerTokenTypes.txt For every rule, Antlr defines a function call inside the parser/lexer class. For example, the code for rules expr looks very much like this:

19 Rule Section(contd.) void CalcParser::expr() {
try { // for error handling mexpr(); { // ( ... )* for (;;) { if ((LA(1) == PLUS)) { match(PLUS); } else { goto _loop14; _loop14:; } // ( ... )* match(SEMI); catch (ANTLR_USE_NAMESPACE(antlr)RecognitionException& ex) { // report error consume this token and forward the token stream pointer from where // parser can resume parsing

20 Predicates Antlr provides two types of predicates to resolve ambiguities between alternatives. Semantic predicate A semantic predicate specifies a condition that must be met (at run- time) before parsing may proceed. It is specified as {...}? Example: stat : {isTypeName(LT(1))}? ID ID ";" // declaration "type varName;" | ID "=" expr ";" // assignment ;

21 Predicate(contd.) Syntatic predicate
Semantic predicate allows you to use arbitrary lookahead when parsing decisions cannot be deterministic with finite lookahead. It is specified as ( prediction block ) => production. Example: stat: ( list "=" )=> list "=" list | list ;

22 Automatic Parse Tree Generation
ANTLR comes with it’s own tree data structure. Antlr tree is a Nery Tree. With each node containing a list of child nodes Each node has a token with tokenId and Value How to generate In options region buildAST = true; With each rule specify the parent with ^ e.g assign: lvalue “=“^ expr’; expr: term (“+”^ term)*; term: ID (“*”^ ID)*;

23 Accessing Parse Tree The tree is available from parser Object via member function getAST() after parsing myParser.topRule(myLexer); AST *parseTree = myParser.getAST(); The parse Tree information can be accessed via the following member functions of parse Tree int getType(); // type of the token std::string getText(); // text of the token int getNumberOfChildren(); AST *getFirstChild(); AST *getNextSibling();

24 Customizing AST parser
Put ‘!’ to prevent automatic AST generation Add customized tree generation Term: explicit_mult | implicit_mult ; explicit_mult: ID MULT^ ID; imlicit_mult !: left:ID right:ID { #implicit_mult = #(#[MULT,”*”], #left, #right); }

25 Tree Parser The browser for the AST tree can also be generated by ANTLR The parser/browser needs to be derived from TreeParser class myTreeParser extends TreeParser; rule written similar to parser with # denoting a node information in in-order form e.g poly : #(ADD term poly) | term ; term : INT | ID | #(EXP ID INT) | #(MULT INT #(EXP ID INT)) Action can be added with each rule. The rule can create a new modified AST.

26 Conclusion ANTLR is a newer and powerful substitute of old yacc parser generator The input language is BNF based and is better organized than yacc input Lot of free language parser code is already available in this language Re-entrant parser in true OOPs. Each rule available as separate parser entry point so the parser is more re-usable. Already in use at Interra. In e2Vera and Tiger. We should use antlr for new projects Will probably have some porting issues as it heavily depends on Exception handling and templates.


Download ppt "An Introduction To Antlr"

Similar presentations


Ads by Google