CS 3304 Comparative Languages

Slides:



Advertisements
Similar presentations
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Advertisements

ISBN Chapter 3 Describing Syntax and Semantics.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Syntax. Syntax defines what is grammatically valid in a programming language – Set of grammatical rules – E.g. in English, a sentence cannot begin with.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Grammars CPSC 5135.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Department of Software & Media Technology
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
More on scanning: NFAs and Flex
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Describing Syntax and Semantics
Concepts of Programming Languages
Lexical and Syntax Analysis
CS 326 Programming Languages, Concepts and Implementation
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
Chapter 2 :: Programming Language Syntax
CSc 453 Lexical Analysis (Scanning)
Chapter 3 – Describing Syntax
Concepts of Programming Languages
What does it mean? Notes from Robert Sebesta Programming Languages
Automata and Languages What do these have in common?
PROGRAMMING LANGUAGES
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Syntax.
CS416 Compiler Design lec00-outline September 19, 2018
CSE 3302 Programming Languages
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
Review: Compiler Phases:
Chapter 2 :: Programming Language Syntax
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Chapter 4: Lexical and Syntax Analysis Sangho Ha
CS416 Compiler Design lec00-outline February 23, 2019
Chapter 2 :: Programming Language Syntax
Chapter 3 Describing Syntax and Semantics.
Chapter 2 :: Programming Language Syntax
High-Level Programming Language
Describing Syntax and Semantics
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Lecture 5 Scanning.
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Faculty of Computer Science and Information System
Presentation transcript:

CS 3304 Comparative Languages Lecture 3: Scanning 24 January 2012

Introduction Syntax: the form or structure of the expressions, statements, and program units. Semantics: the meaning of the expressions, statements, and program units. Syntax and semantics provide a language’s definition. Users of a language definition: Other language designers. Implementers. Programmers (the users of the language). Basic terminology: A sentence is a string of characters over some alphabet. A language is a set of sentences. A lexeme is the lowest level syntactic unit of a language. A token is a category of lexemes (e.g., identifier).

Defining Languages Recognizers: Generators: A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language. Example: syntax analysis part of a compiler (scanning). Generators: A device that generates sentences of a language. One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator.

BNF Fundamentals In BNF, abstractions are used to represent classes of syntactic structures: they act like syntactic variables (also called nonterminal symbols, or just nonterminals). Terminals are lexemes or tokens. A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals. Nonterminals are often italic or enclosed in angle brackets. Examples of BNF rules: <ident_list> → identifier | identifier, <ident_list> <if_stmt> → if <logic_expr> then <stmt> Grammar: a finite non-empty set of rules. A start symbol is a special element of the nonterminals of a grammar.

Scanner A scanner is responsible for: Tokenizing source. Removing comments. (Often) dealing with pragmas (i.e., significant comments). Saving text of identifiers, numbers, strings. Saving source locations (file, line, column) for error messages.

Scanning Example I Suppose we are building an ad-hoc (hand-written) scanner for Pascal: We read the characters one at a time with look-ahead. If it is one of the one-character tokens: { ( ) [ ] < > , ; = + - } etc. we announce that token. If it is a ., we look at the next character: If that is a dot, we announce . Otherwise, we announce . and reuse the look-ahead.

Scanning Example II If it is a <, we look at the next character if that is a = we announce <= otherwise, we announce < and reuse the look-ahead, etc. If it is a letter, we keep reading letters and digits and maybe underscores until we can't anymore: Then we check to see if it is a reserved word. If it is a digit, we keep reading until we find a non-digit: If that is not a . we announce an integer. Otherwise, we keep looking for a real number. If the character after the . is not a digit we announce an integer and reuse the . and the look-ahead.

Deterministic Finite Automaton Pictorial representation of a scanner for calculator tokens, in the form of a finite automaton. This is a deterministic finite automaton (DFA): Lex, scangen, ANTLR, etc. build these things automatically from a set of regular expressions. Specifically, they construct a machine that accepts the language.

The Longest Possible Token Rule We scan over and over to get one token after another. Nearly universal rule: always take the longest possible token from the input, thus: foobar is foobar and never f or foo or foob. The rule means you return only when the next character can't be used to continue the current token: The next character will generally be saved for the next token. In some cases, you may need to peek at more than one character of look-ahead in order to know whether to proceed: In Pascal, for example, when you have a 3 and you a see a dot Do you proceed (in hopes of getting 3.14)? or Do you stop (in fear of getting 3..5)? Regular expressions "generate" a regular language. DFAs "recognize” a regular language.

Building Scanners Scanners tend to be built three ways: Ad-hoc. Semi-mechanical pure DFA (usually as nested case statements). Table-driven DFA. Ad-hoc generally yields the fastest, most compact code by doing lots of special-purpose things, though good automatically-generated scanners come very close. Writing a pure DFA as a set of nested case statements is a surprisingly useful programming technique (Figure 12.1): It is often easier to use perl, awk, sed or similar tools. Table-driven DFA is what lex and scangen produce: lex (flex): C code scangen: numeric tables and a separate driver (Figure 2.12). ANTLR: Java code.

Summary BNF and context-free grammars are equivalent meta- languages that are well-suited for describing the syntax of programming languages. Syntax analysis is a common part of language implementation Scanners (lexical analyzers) use pattern matching to isolate small-scale parts of a program. ANTLR provides supports for scanners (lexers), parsers, and tree-parsers.