Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Regular Expressions EELS Meeting, Dec. 2 2009 Tom Horton Dept. of Computer Science Univ. of Virginia

Similar presentations


Presentation on theme: "1 Introduction to Regular Expressions EELS Meeting, Dec. 2 2009 Tom Horton Dept. of Computer Science Univ. of Virginia"— Presentation transcript:

1 1 Introduction to Regular Expressions EELS Meeting, Dec. 2 2009 Tom Horton Dept. of Computer Science Univ. of Virginia horton@virginia.edu

2 Basics  A regular expression defines a pattern Strings match that pattern. (Perhaps many!) Thus the regular expression is short-hand for a set of strings Alternatively: the regex defines a grammar and thus a set of valid strings (statements) for that grammar  Search / Matching with regexs The pattern is applied to one or more strings  Words, lines, etc Matches or not, or Find next (or all) matching string(s) (e.g. line in a file) 2

3 Website for this Presentation http://www.cs.virginia.edu/~horton/eels 3

4 How to Express Patterns  Done live on board without slides, and with demo 4

5 Theoretical Background  The following might interest those who want to see how those in math and CS think about theoretical aspects of such things 5

6 6 Phrase Structured Grammars  A phrase structured grammar G is a four-tuple (V, T, S, P) where: V is the:Vocabulary T is the set of:Terminals S is the:Start symbol P is the set of:Productions  T is a subset of V  The set V – T is the set N, the Non-terminals  Productions are literally the way in which one string can replace (or produce) another  Language of G is all strings derivable from S

7 7 Types Of Languages  Types distinguished by the form of the productions in the languages that generate them  Classification introduced by Chomsky  Type 0  Type 1Context-sensitive languages  Type 2Context-free languages Productions:LHS- A (i.e., single non-terminal)  Type 3Regular languages Productions:LHS- A RHS- a or aB where A and B are non-terminals and a is a terminal Chomsky Hierarchy

8 Type 3 Languages  The REGULAR LANGUAGES or REGULAR EXPRESSIONS  Productions:LHS- A RHS - a or aB where:  A and B are non-terminals  a is a terminal  Simplest kind of formal language structure  Useful for defining things in CS File name completion Search patterns 8 This form for the RHS defines the REGULAR LANGUAGES

9 Type 3 Language Example  V={a, b, A, B, S}  T={a, b}  N={A, B}  S=S  P=S ª aBS ª bA A ª aA A ª a B ª bB B ª b  So what is the language? An “a” followed by a string of “b”s and vice versa 9

10 10 Quick Bits Of Notation  x* (aka the Kleene star or closure) means the set of elements with zero or more x’s e.g. ‘a’* = { , a, aa, aaa, aaaa, aaaaa, … }  x + means the set of elements with one or more x’s e.g. ‘a’ + = {a, aa, aaa, aaaa, aaaaa, … }  x m means exactly m x’s  x | y means x or y e.g. a | b  x can be a set in which case the result is concatenation of set elements e.g. {‘a’, ‘b’}* = { , a, b, aa, ab, bb, aaa, aab, aba, baa, abb, bab, bba, … }

11 11 Quick Bits Of Notation  These ideas are used to specify regular languages: (a | b | c)* Examples include:  aabbbccc, abcabc, aaccbbb (ab + | ba + ) Examples include:  ab, abbb, baaa, ba, baaaaaaa  Regular languages occur all the time This is the example we looked at earlier

12 12 Finite State Automata  A finite-state automaton is a five-tuple: IA set of symbols, the input alphabet Literally the set of input symbols SA set of states that it can be in S 0 A designated initial state AA set of designated states called the accepting states NThe next state function N:S  I  S

13 13 Example: Vending Machine $0.75 Deposited $1.00 Deposited $0.50 Deposited $0.25 Deposited $0 Deposited $0.25 $0.50 $0.25 $0.50 $0.25  Based on Epp, page 746 but simpler $0.25

14 14 Example: Vending Machine $0.75 Deposited $1.00 Deposited $0.50 Deposited $0.25 Deposited $0 Deposited $0.25 $0.50 $0.25 $0.50 $0.25  Based on Epp, page 746 but simpler

15 15 Example: Parity Checking OddEven 1 1 0 0  Example strings: 1010101010 1100110010 Initial And Accepting State This is just a recognizer for strings in a language

16 16 Language Recognizers  Kleene’s theorem: The set of languages defined by type 3 (regular) grammars is identical to the set of languages accepted by finite-state automata  Thus, for any regular language there is a finite state automaton that recognizes it  Another theorem: The set of languages defined by type 2 (context free) grammars is identical to the set of languages accepted by pushdown automata  Thus, for any context-free language, there is a pushdown automaton that recognizes it  A pushdown automaton is a finite state automaton supplemented with a pushdown stack  Really cool thing : given a context-free or regular language, there are programs (parser generators) that will build the automaton for us!


Download ppt "1 Introduction to Regular Expressions EELS Meeting, Dec. 2 2009 Tom Horton Dept. of Computer Science Univ. of Virginia"

Similar presentations


Ads by Google