1 Introduction to Regular Expressions EELS Meeting, Dec. 2 2009 Tom Horton Dept. of Computer Science Univ. of Virginia

Slides:



Advertisements
Similar presentations
Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
Advertisements

Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
YES-NO machines Finite State Automata as language recognizers.
CFGs and PDAs Sipser 2 (pages ). Long long ago…
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Regular Grammars Formal definition of a regular expression.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
CFGs and PDAs Sipser 2 (pages ). Last time…
CFG => PDA Sipser 2 (pages ).
CFG => PDA Sipser 2 (pages ). CS 311 Fall Formally… A pushdown automaton is a sextuple M = (Q, Σ, Γ, δ, q 0, F), where – Q is a finite set.
Theory of Computation What types of things are computable? How can we demonstrate what things are computable?
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Languages, grammars, and regular expressions
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 4 Updated by Marek Perkowski.
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Languages and Machines Unit two: Regular languages and Finite State Automata.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Pushdown Automata (PDAs)
Design contex-free grammars that generate: L 1 = { u v : u ∈ {a,b}*, v ∈ {a, c}*, and |u| ≤ |v| ≤ 3 |u| }. L 2 = { a p b q c p a r b 2r : p, q, r ≥ 0 }
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
Lecture # 3 Regular Expressions 1. Introduction In computing, a regular expression provides a concise and flexible means to "match" (specify and recognize)
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Dept. of Computer Science & IT, FUUAST Automata Theory 2 Automata Theory V Context-Free Grammars andLanguages.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Introduction to Language Theory
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 3813: Introduction to Formal Languages and Automata
Recursive Definations Regular Expressions Ch # 4 by Cohen
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 10 Automata, Grammars and Languages.
Formal Languages and Grammars
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
Lecture 6: Context-Free Languages
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Language Recognition MSU CSE 260.
Theory of Computation Lecture #
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz
Natural Language Processing - Formal Language -
Formal Language Theory
Equivalence, DFA, NDFA Sequential Machine Theory Prof. K. J. Hintz
Presentation transcript:

1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia

Basics  A regular expression defines a pattern Strings match that pattern. (Perhaps many!) Thus the regular expression is short-hand for a set of strings Alternatively: the regex defines a grammar and thus a set of valid strings (statements) for that grammar  Search / Matching with regexs The pattern is applied to one or more strings  Words, lines, etc Matches or not, or Find next (or all) matching string(s) (e.g. line in a file) 2

Website for this Presentation 3

How to Express Patterns  Done live on board without slides, and with demo 4

Theoretical Background  The following might interest those who want to see how those in math and CS think about theoretical aspects of such things 5

6 Phrase Structured Grammars  A phrase structured grammar G is a four-tuple (V, T, S, P) where: V is the:Vocabulary T is the set of:Terminals S is the:Start symbol P is the set of:Productions  T is a subset of V  The set V – T is the set N, the Non-terminals  Productions are literally the way in which one string can replace (or produce) another  Language of G is all strings derivable from S

7 Types Of Languages  Types distinguished by the form of the productions in the languages that generate them  Classification introduced by Chomsky  Type 0  Type 1Context-sensitive languages  Type 2Context-free languages Productions:LHS- A (i.e., single non-terminal)  Type 3Regular languages Productions:LHS- A RHS- a or aB where A and B are non-terminals and a is a terminal Chomsky Hierarchy

Type 3 Languages  The REGULAR LANGUAGES or REGULAR EXPRESSIONS  Productions:LHS- A RHS - a or aB where:  A and B are non-terminals  a is a terminal  Simplest kind of formal language structure  Useful for defining things in CS File name completion Search patterns 8 This form for the RHS defines the REGULAR LANGUAGES

Type 3 Language Example  V={a, b, A, B, S}  T={a, b}  N={A, B}  S=S  P=S ª aBS ª bA A ª aA A ª a B ª bB B ª b  So what is the language? An “a” followed by a string of “b”s and vice versa 9

10 Quick Bits Of Notation  x* (aka the Kleene star or closure) means the set of elements with zero or more x’s e.g. ‘a’* = { , a, aa, aaa, aaaa, aaaaa, … }  x + means the set of elements with one or more x’s e.g. ‘a’ + = {a, aa, aaa, aaaa, aaaaa, … }  x m means exactly m x’s  x | y means x or y e.g. a | b  x can be a set in which case the result is concatenation of set elements e.g. {‘a’, ‘b’}* = { , a, b, aa, ab, bb, aaa, aab, aba, baa, abb, bab, bba, … }

11 Quick Bits Of Notation  These ideas are used to specify regular languages: (a | b | c)* Examples include:  aabbbccc, abcabc, aaccbbb (ab + | ba + ) Examples include:  ab, abbb, baaa, ba, baaaaaaa  Regular languages occur all the time This is the example we looked at earlier

12 Finite State Automata  A finite-state automaton is a five-tuple: IA set of symbols, the input alphabet Literally the set of input symbols SA set of states that it can be in S 0 A designated initial state AA set of designated states called the accepting states NThe next state function N:S  I  S

13 Example: Vending Machine $0.75 Deposited $1.00 Deposited $0.50 Deposited $0.25 Deposited $0 Deposited $0.25 $0.50 $0.25 $0.50 $0.25  Based on Epp, page 746 but simpler $0.25

14 Example: Vending Machine $0.75 Deposited $1.00 Deposited $0.50 Deposited $0.25 Deposited $0 Deposited $0.25 $0.50 $0.25 $0.50 $0.25  Based on Epp, page 746 but simpler

15 Example: Parity Checking OddEven  Example strings: Initial And Accepting State This is just a recognizer for strings in a language

16 Language Recognizers  Kleene’s theorem: The set of languages defined by type 3 (regular) grammars is identical to the set of languages accepted by finite-state automata  Thus, for any regular language there is a finite state automaton that recognizes it  Another theorem: The set of languages defined by type 2 (context free) grammars is identical to the set of languages accepted by pushdown automata  Thus, for any context-free language, there is a pushdown automaton that recognizes it  A pushdown automaton is a finite state automaton supplemented with a pushdown stack  Really cool thing : given a context-free or regular language, there are programs (parser generators) that will build the automaton for us!