Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital State Machines

Similar presentations


Presentation on theme: "Digital State Machines"— Presentation transcript:

1 Digital State Machines
Regular Expressions & Languages

2 Chapter Outline Regular Expressions Summary
Basic Regular Expression Patterns Disjunction, Grouping and Precedence Examples Advanced Operators Regular Expression Substitution, Memory and ELIZA Summary 28 November 2018 Veton Këpuska

3 Regular Expressions (RE)
Algebraic Description of finite state automata. Regular Expressions can define exactly the same languages that the various forms of automata describe: regular languages. Regular Expressions (RE) offer a declarative way to express the strings we want to accept – FSA do not! REs serve as the input language for many systems that process strings: Search commands such as UNIX grep (egrep, etc.) for finding strings: WWW Browsers, Text-formatting systems, etc. Search Systems convert REs into FSA(s) (D-FSA or N-FSA). Lexical-analyzer generators, such as LEX or FLEX. Compiler, Language Modeling System in a Speech Recognizer. Grammar and Spell Checkers. 28 November 2018 Veton Këpuska

4 FSA, RE and Regular Languages
Regular expressions Finite automata Regular languages 28 November 2018 Veton Këpuska

5 The Operators of Regular Expressions
Regular Expressions denote languages. 01*+10* - denotes the language consisting of all strings that are either a: {0, 01, 011, 0111, ,…}, or {1, 10, 100, 1000, 10000, …} Operations on Regular Languages that Regular Expressions Represent. Let L, L1 and L2 be regular languages, L={0,1}, L1 = {10, 001, 111} & L2 = {e, 001}, then The union: L1 ∪ L2, the union or disjunction of L1 and L2. L1 ∪ L2 = {e, 10, 001, 111} The concatenation: L1L2 = {xy|x ∈ L1, y ∈ L2}. L1 L2 = {10, 001, 111, 10001, 00001, } The closure (or star, *, or Kleene closure): L*. L* = {L0, L1, L2,…, Li,…, L∞} 28 November 2018 Veton Këpuska

6 Example L={0,11}, L0 = {e} – independent of what language L is.
L1 = L – represents the choice of one string from L. {L0, L1} = {e, 0, 11} L2 = {00, 011,110,1111} L3 = {000, 0011, 0110, 01111,1100,11011,11110,111111} To compute L* must compute Li for each i (i) Li has 2i members. Union of infinite number of terms Li is generally an infinite language (L*) as it is this example. 28 November 2018 Veton Këpuska

7 Example Let L={e, 0, 00, 000, …} – a set of strings consisting all zeros. L – is infinite language L0 = {e} – independent of what language L is. L1 = L – represents the choice of one symbol from L. {L0, L1} = {e, 0, 00, 000, 0000, …...} L2 = {e, 0, 00,000,0000, ...} = L L3 = L L*= L0  L1  L2  … = L  - empty set. One of only two languages that its closure, *, is not infinite. 0 = {e} 1 = {e} i = {e} * = {e} 28 November 2018 Veton Këpuska

8 Distinction of Star (*) and Closure (*) Operator
Star *: *- forms all strings whose symbols were chosen from alphabet . Closure * operator is essentially the same with a subtle difference. Let: L – be a language containing strings of length 1, and for each symbol a in  there is a string a in L. Thus:  - set of symbols, while L – set of strings * and L* denote the same language. 28 November 2018 Veton Këpuska

9 Building Regular Expressions
The algebra of regular expressions follows the pattern of classical algebra. Constants and Variables denote Languages Operators ⇒ {Union, Product, Star/Closure} Define Regular Expression (E - the language that it represents is denoted by L(E)), Recursively: BASIS: The constants e and  and are regular expressions, denoting the languages L(e)={e} and L()= respectively. If a is any symbol, then a is a regular expression. L(a)={a}. Any variable, e.g., L, typically capitalized and italic represents any language. 28 November 2018 Veton Këpuska

10 Building Regular Expressions
INDUCTION: If E and F are regular expressions, than E+F is a regular expressions denoting their union: L(E+F) = L(E)  L(F). EF is a regular expression denoting their concatenation: L(EF) = L(E)L(F). A dot can optionally be used to denote the concatenation operator on languages or in a regular expression. A regular expression 0.1 is same as 01 that represents the language {01} E* is a regular expression denoting the closure of L(E): L(E*) = (L(E))*. (E) is also a regular expression denoting the same language as E: L((E))=L(E) 28 November 2018 Veton Këpuska

11 Example Develop a regular expression for the language consisting of the single string 01. 0 and 1 are expressions denoting the languages {0} and {1} Concatenation of the two expressions results in regular expression 01 for the language {01}. As a general rule, if we want a regular expression for the language consisting of only the string w, we use w itself as the regular expression. Write a regular expression for set of strings that consists of alternating 0’s and 1’s. Thus from the above we get (01)* Note 1: 01* ≠ (01)* Note 2: L((01)*) – is not exactly what we want – what about when 1 is at the beginning and/or 0 at the end? (01)*+(10)*+1(01)*+0(10)* “+” operator indicates union of the corresponding languages. 28 November 2018 Veton Këpuska

12 Example Alternate Solution: Note: L(e+1)= L(e)L(1)={e}{1}={e,1}
28 November 2018 Veton Këpuska

13 Precedence of Regular Expression Operators
* operator has the highest precedence. Concatenation or dot operator. Union (+) operator Controlling the order of operations by grouping operator “()”. Example: (0(1*))+1 (01)*+1 0(1*+1) 28 November 2018 Veton Këpuska

14 Exercise Examples Exercise 3.1.1:
Write regular expression for the following languages:a The set of strings over alphabet {a, b, c} containing at least one a and at least one b. (aba*b*c*) what about other combinations? ((e+a*)+(e+b*)+(e+c*))*(ab + ba)((e+a*)+(e+b*)+(e+c*))* The set of strings of 0’s and 1’s whose tenth symbols from the right end is 1. (0+1)*1(0+1) (0+1)… (0+1) (0+1) The set of strings of 0’s and 1’s with at most one pair of consecutive 1’s. (0+1)(0+(00)+(01)+(10))* 28 November 2018 Veton Këpuska

15 Finite Automata and Regular Expressions
Regular-expressions describe languages in fundamentally different form from the finite automata. However, they both describe the same set of languages – “Regular Languages”. To show this one must: Every language defined by one of these automata is also defined by a regular expression. Must show that the language is accepted by some D-FSA. Every language defined by a regular expression is defined by one of these automata. Must show that there is an N-FSA with e-transitions accepting the same language. 28 November 2018 Veton Këpuska

16 Finite Automata and Regular Expressions
e-NFSA NFSA RE DFSA Plan for showing the equivalency of four different notations for regular languages. 28 November 2018 Veton Këpuska

17 Converting Regular Expressions to Automata
We can show that every language L, that is L(R) for some regular expression R, is also L(E) for some e-NFSA E. Start by showing how to construct automata for basis expressions, single symbols e and f. Show how to combine these automata into larger automata that accept the union, concatenation, or closure. 28 November 2018 Veton Këpuska

18 Converting Regular Expressions to Automata
Theorem: Every language defined by a regular expression is also defined by a finite automata. Proof: Suppose L=L(R) for a regular expression R. We will show that L=L(E) for some e-NFSA E with: Exactly one accepting state No arcs into the initial state. No arcs out of the accepting state. The proof is by structural induction on R, following the recursive definition of regular expressions. 28 November 2018 Veton Këpuska

19 Converting Regular Expressions to Automata
BASIS: The language of automaton is {e} Depicts construction for f, since there is no path from start state to accepting state. Thus f is the language of automaton. Language of the automaton is L(a) which is the one string a. 28 November 2018 Veton Këpuska

20 Converting Regular Expressions to Automata
INDUCTION: It assumed that the statement of the theorem is true for the immediate sub-expressions of a given regular expression. R+S: L(R)  L(S) RS: L(R)L(S) R*: L(R*) 28 November 2018 Veton Këpuska

21 Example Convert (0+1)*1(0+1) to an e-NFSA. (0+1) (0+1)* (0+1)*1(0+1)
28 November 2018 Veton Këpuska

22 Applications of Regular Expressions
28 November 2018 Veton Këpuska

23 Lexical Analysis (lex, flex, yacc) http://dinosaur.compilertools.net/
Finding Patterns in Text 28 November 2018 Veton Këpuska

24 Regular Expressions Formally, a regular expression is an algebraic notation for characterizing a set of strings. Thus they can be used to specify search strings as well as to define a language in a formal way. Regular Expression requires A pattern that we want to search for, and A corpus of text to search through. Thus when we give a search pattern, we will assume that the search engine returns the line of the document returned. This is what the UNIX grep command does. We will underline the exact part of the pattern that matches the regular expression. A search can be designed to return all matches to a regular expression or only the first match. We will show only the first match. 28 November 2018 Veton Këpuska

25 Basic Regular Expression Patterns
The simplest kind of regular expression is a sequence of simple characters: /woodchuck/ /Buttercup/ /!/ RE Example Patterns Matched /woodchucks/ “interesting links to woodchucks and lemurs” /a/ “Mary Ann stopped by Mona’s” /Claire says,/ “Dagmar, my gift please,” Claire says,” /song/ “all our pretty songs” /!/ “You’ve left the burglar behind again!” said Nori 28 November 2018 Veton Këpuska

26 Basic Regular Expression Patterns
Regular Expressions are case sensitive /s/ /S/ /woodchucks/ will not match “Woodchucks” Disjunction: “[“ and “]”. RE Match Example Pattern /[wW]oodchuck/ Woodchuck or woodchuck “Woodchuck” /[abc]/ ‘a’, ‘b’, or ‘c’ “In uomini, in soldati” /[ ]/ Any digit “plenty of 7 to 5” 28 November 2018 Veton Këpuska

27 Basic Regular Expression Patterns
Specifying range in Regular Expressions: “-” RE Match Example Patterns Matched /[A-Z]/ An uppercase letter “we should call it ‘Drenched Blossoms’” /[a-z]/ A lower case letter “my beans were impatient to be hoed!” /[0-9]/ A single digit “Chapter 1: Down the Rabbit Hole” 28 November 2018 Veton Këpuska

28 Basic Regular Expression Patterns
Negative Specification – what pattern can not be: “^” If the first symbol after the open square brace “[” is “^” the resulting pattern is negated. Example /[^a]/ matches any single character (including special characters) except a. RE Match (single characters) Example Patterns Matched /[^A-Z]/ Not an uppercase letter “Oyfn pripetchik” /[^Ss]/ Neither ‘S’ nor ‘s’ “I have no exquisite reason for ’t” /[^\.]/ Not a period “our resident Djinn” /[e^]/ Either ‘e’ or ‘^’ “look up ^ now” /a^b/ Pattern ‘a^b’ “look up a^b now” 28 November 2018 Veton Këpuska

29 Basic Regular Expression Patterns
How do we specify both woodchuck and woodchucks? Optional character specification: /?/ /?/ means “the preceding character or nothing”. RE Match Example Patterns Matched /woodchucks?/ woodchuck or woodchucks “woodchuck” Colou?r color or colour “colour” 28 November 2018 Veton Këpuska

30 Basic Regular Expression Patterns
Question-mark “?” can be though of as “zero or one instances of the previous character”. It is a way to specify how many of something that we want. Sometimes we need to specify regular expressions that allow repetitions of things. For example, consider the language of (certain) sheep, which consists of strings that look like the following: baa! baaa? baaaa? baaaaa? baaaaaa? 28 November 2018 Veton Këpuska

31 Basic Regular Expression Patterns
Any number of repetitions is specified by “*” which means “any string of 0 or more”. Examples: /aa*/ - a followed by zero or more a’s /[ab]*/ - zero or more a’s or b’s. This will match aaaa or abababa or bbbb 28 November 2018 Veton Këpuska

32 Basic Regular Expression Patterns
We know enough to specify part of our regular expression for prices: multiple digits. Regular expression for individual digit: /[0-9]/ Regular expression for an integer: /[0-9][0-9]*/ Why is not just /[0-9]*/? Because it is annoying to specify “at least once” RE since it involves repetition of the same pattern there is a special character that is used for “at least once”: “+” Regular expression for an integer becomes then: /[0-9]+/ Regular expression for sheep language: /baa*!/, or /ba+!/ 28 November 2018 Veton Këpuska

33 Basic Regular Expression Patterns
One very important special character is the period: /./, a wildcard expression that matches any single character (except carriage return). Example: Find any line in which a particular word (for example Veton) appears twice: /Veton.*Veton/ RE Match Example Pattern /beg.n/ Any character between beg and n begin beg’n, begun 28 November 2018 Veton Këpuska

34 Repetition Metacharacters
Description Example * Matches any number of occurrences of the previous character – zero or more /ac*e/ - matches “ae”, “ace”, “acce”, “accce” as in “The aerial acceleration alerted the ace pilot” ? Matches at most one occurrence of the previous characters – zero or one. /ac?e/ - matches “ae” and “ace” as in “The aerial acceleration alerted the ace pilot” + Matches one or more occurrences of the previous characters /ac+e/ - matches “ace”, “acce”, “accce” as in “The aerial acceleration alerted the ace pilot” {n} Matches exactly n occurrences of the previous characters. /ac{2}e/ - matches “acce” as in “The aerial acceleration alerted the ace pilot” {n,} Matches n or more occurrences of the previous characters /ac{2,}e/ - matches “acce”, “accce” etc., as in “The aerial acceleration alerted the ace pilot” {n,m} Matches from n to m occurrences of the previous characters. /ac{2,4}e/ - matches “acce”, “accce” and “acccce” , as in “The aerial acceleration alerted the ace pilot” . Matches one occurrence of any characters of the alphabet except the new line character /a.e/ matches aae, aAe, abe, aBe, a1e, etc., as in ““The aerial acceleration alerted the ace pilot” .* Matches any string of characters and until it encounters a new line character 28 November 2018 Veton Këpuska

35 Anchors Anchors are special characters that anchor regular expressions to particular places in a string. The most common anchors are: “^” – matches the start of a line “$” – matches the end of the line Examples: /^The/ - matches the word “The” only at the start of the line. Three uses of “^”: /^xyz/ - Matches the start of the line [^xyz] – Negation /^/ - Just to mean a caret /⌴$/ - “⌴” Stands for space “character”; matches a space at the end of line. /^The dog\.$/ - matches a line that contains only the phrase “The dog”. 28 November 2018 Veton Këpuska

36 Anchors /\b/ - matches a word boundary /\B/ - matches a non-boundary
/\bthe\b/ - matches the word “the” but not the word “other”. Word is defined as a any sequence of digits, underscores or letters. /\b99/ - will match the string 99 in “There are 99 bottles of beer on the wall” but NOT “There are 299 bottles of beer on the wall” and it will match the string “$99” since 99 follows a “$” which is not a digit, underscore, or a letter. 28 November 2018 Veton Këpuska

37 Disjunction, Grouping and Precedence.
Suppose we need to search for texts about pets; specifically we may be interested in cats and dogs. If we want to search for either “cat” or the string “dog” we can not use any of the constructs we have introduced so far (why not “[]”?). New operator that defines disjunction, also called the pipe symbol is “|”. /cat|dog/ - matches either cat or the string dog. 28 November 2018 Veton Këpuska

38 Grouping In many instances it is necessary to be able to group the sequence of characters to be treated as one set. Example: Search for guppy and guppies. /gupp(y|ies)/ Useful in conjunction to “*” operator. /*/ - applies to single character and not to a whole sequence. Example: Match “Column 1 Column 2 Column 3 …” /Column⌴[0-9]+⌴*/ - will match “Column # …“ /(Column⌴[0-9]+⌴*)*/ - will match “Column 1 Column 2 Column 3 …” 28 November 2018 Veton Këpuska

39 Operator Precedence Hierarchy
Operator Class Precedence from Highest to Lowest Parenthesis () Counters * + ? {} Sequences and anchors ^ $ Disjunction | 28 November 2018 Veton Këpuska

40 Simple Example Problem Statement: Want to write RE to find cases of the English article “the”. /the/ - It will miss “The” /[tT]he/ - It will match “amalthea”, “Bethesda”, “theology”, etc. /\b[tT]he\b/ - Is the correct RE Problem Statement: If we want to find “the” where it might also have underlines or numbers nearby (“The-” , “the_” or “the25”) one needs to specify that we want instances in which there are no alphabetic letters on either side of “the”: /[^a-zA-Z][tT]he/[^a-zA-Z]/ - it will not find “the” if it begins the line. /(^|[^a-zA-Z])[tT]he/[^a-zA-Z]/ 28 November 2018 Veton Këpuska

41 A More Complex Example Problem Statement: Build an application to help a user purchase a computer on the Web. The user might want “any PC with more than 1000 MHz and 80 Gb of disk space for less than $1000 To solve the problem must be able to match the expressions like 1000 MHz, 1 GHz and 80 Gb as well as $ etc. 28 November 2018 Veton Këpuska

42 Solution – Dollar Amounts
Complete regular expression for prices of full dollar amounts: /$[0-9]+/ Adding fractions of dollars: /$[0-9]+\.[0-9][0-9]/ or /$[0-9]+\.[0-9] {2}/ Problem since this RE only will match “$199.99” and not “$199”. To solve this issue must make cents optional and make sure the $ amount is a word: /\b$[0-9]+(\.[0-9][0-9])?\b/ 28 November 2018 Veton Këpuska

43 Solution: Processor Speech
Processor speech in megahertz = MHz or gigahertz = GHz) /\b[0-9]+⌴*(MHz|[Mm]egahertz|GHz|[Gg]igahertz)\b/ ⌴* is used to denote “zero or more spaces”. 28 November 2018 Veton Këpuska

44 Solution: Disk Space Dealing with disk space: Gb = gigabytes
Memory size: Mb = megabytes or Must allow optional fractions: /\b[0-9]+⌴*(M[Bb]|[Mm]egabytes?)\b/ /\b[0-9]+(\.[0-9]+)?⌴*(G[Bb]|[Gg]igabytes?)\b/ 28 November 2018 Veton Këpuska

45 Solution: Operating Systems and Vendors
/\b((Windows)+⌴*(XP|Vista)?)\b/ /\b((Mac|Macintosh|Apple)\b/ 28 November 2018 Veton Këpuska

46 Aliases for common sets of characters
Advanced Operators RE Expansion Match Example Patterns \d [0-9] Any digit “Party of 5” \D [^0-9] Any non-digit “Blue moon” \w [a-zA-Z0-9⌴] Any alphanumeric or space Daiyu \W [^\w] A non-alphanumeric !!!! \s [⌴\r\t\n\f] Whitespace (space, tab) “ ” \S [^\s] Non-whitespace “in Concord” Aliases for common sets of characters 28 November 2018 Veton Këpuska

47 Literal Matching of Special Characters & “\” Characters
RE Match Example Patterns \* An asterisk “*” “K*A*P*L*A*N” \. A period “.” “Dr. Këpuska, I presume” \? A question mark “?” “Would you like to light my candle?” \n A newline \t A tab \r A carriage return character Some characters that need to be backslashed “\” 28 November 2018 Veton Këpuska

48 Regular Expression Substitution, Memory, and ELIZA
Substitutions are an important use of regular expressions. s/regexp1/regexp2/ - allows a string characterized by one regular expression (regexp1) to be replaced by a string characterized by a second regular expressions (regexp2). s/colour/color/ It is also important to refer to a particular subpart of the string matching the first pattern. Example: replace “the 35 boxes”, to “the <35> boxes” s/([0-9]+)/<\1>/ - “\1” refers to the first pattern matched by the first regular expression. 28 November 2018 Veton Këpuska

49 Regular Expression Substitution, Memory, and ELIZA
The parenthesis and number operators can also be used to specify that a certain string or expression must occur twice in the text. Example: “the Xer they were, the Xer they will be” We want to constrain the two X’s to be the same string: /[Tt]he (.*)er they were, the \1er they will be/ This RE will match: “The bigger they were, the faster they will be” The number operator can be used with other numbers: if you match two different sets of parenthesis, \2 means whatever matched the second set. /[Tt]he (.*)er they (.*), the \1er they \2/ This Re will match: “The bigger they were, the bigger they were”, but not “The bugger the were, the bigger they will be”, 28 November 2018 Veton Këpuska

50 Registers Numbered memories are called registers: \1 – register 1
28 November 2018 Veton Këpuska

51 ELIZA Substitutions using memory are very useful in implementing simple natural-language understanding programs like ELIZA. Here is example of dialog with ELIZA: User1: Men are all alike. ELIZA1: IN WHAT WAY User2: They’re always bugging us about something or other. ELIZA2: CAN YOU THINK OF A SPECIFIC EXAMPLE User3: Well, my boyfriend made me come here. ELIZA3: YOUR BOYFRIEND MADE YOU COME HERE User4: He says I’m depressed much of the time. ELIZA4: I AM SORRY TO HEAR YOU ARE DEPRESSED. 28 November 2018 Veton Këpuska

52 ELIZA Eliza worked by having a cascade of regular expression substitutions that each matched some part of the input lines and changed them. The first substitutions changed all instances of: “my” ⇨ “YOUR” “I’m” ⇨ “YOU ARE” Next set of substitutions looked for relevant patterns in the input and created an appropriate output; s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/ s/.* ALL .*/IN WHAT WAY/ s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ 28 November 2018 Veton Këpuska

53 ELIZA Since multiple substitutions could apply to a given input, substitutions were assigned a rank and were applied in order. Creation of such patterns is addressed in Exercise 2.2. 28 November 2018 Veton Këpuska

54 Algebraic Laws for Regular Expressions
28 November 2018 Veton Këpuska

55 Algebraic Laws for Regular Expressions
Collection of laws that define when two regular expressions are equivalent. Arithmetic: Commutativity: (x+y = y+x) Switching of order of operands does not change results. Associativity: (xy)z = x(yz) Regroup the operands when the operator is applied twice. Regular expressions have a number of laws similar to the laws for arithmetic. 28 November 2018 Veton Këpuska

56 Associativity and Commutativity
For L,M and N Languages (defined by Regular Expressions or equivalently by FSA) Commutative Law for Union: L+M=M+L Associative Law for Union: (L+M)+N=L+(M+N) Associative Law for Concatenation: (LM)N=L(MN) 28 November 2018 Veton Këpuska

57 Identities and Annihilators
Arithmetic Identity: 0 is identity for addition: 0+x = x+0 = x 1 is identity for multiplication: 1x = x1 = x Annihilator: 0 is annihilator for multiplication: 0x = x0 = 0 Regular Expressions Identity for Union and Concatenation: ∅+L = L+∅ = L ∊L = L∊ = L Annihilator for Concatenation: ∅+L = L+∅ = ∅ Important in simplification of regular expressions. 28 November 2018 Veton Këpuska

58 Distributive Laws Regular Expressions
Arithmetic A distributive law involves two operators. Distributive law of multiplication over addition (most common): x (y+z) = xy+ xz Regular Expressions Left Distributive Law of Concatenation over union: L(M+N) = LM + LN Right Distributive Law of Concatenation over union: (M+N)L = ML + NL 28 November 2018 Veton Këpuska

59 Distributive Laws Theorem: If L, M, and N are any languages, then:
L(M  N) = LM  LN Proof: Show first that a string w is in L(M  N) if and only if it is in LM  LN. (Only-if) If w is in L(M  N) then w=xy, where x is in L and y is in (M  N) ⇒ y is in M or N. If y is in M then w=xy is in LM ⇒ is in LM  LN If y is in N then w=xy is in LN ⇒ is in LM  LN (if) If w is in LM  LN then w is either in LM or in LN If w=xy and w is in LM then x is in L and y in M ⇒ y is in M  L, thus w is in L(M  N) If w=xy and w is in LN then x is in L and y in N ⇒ y is in M  L, thus w is in L(M  N) 28 November 2018 Veton Këpuska

60 The Idempotent Law Arithmetic:
Common arithmetic operators are not idempotent: x+x ≠ x and xx ≠ x Regular Expressions: Idempotent law L+L=L 28 November 2018 Veton Këpuska

61 Laws Involving Closures
(L*)* = L* - Closing an expression that is already closed does not change the language. ∅* =  - The closure of ∅ contains only the string . * =  L+ = LL* = L*L L+ = L + LL + LLL + … L* =  + L + LL + LLL + … =  + L+ LL* = L + LL + LLL + LLLL + … L = L = L L* = L+ +  L? =  + L 28 November 2018 Veton Këpuska

62 End 28 November 2018 Veton Këpuska


Download ppt "Digital State Machines"

Similar presentations


Ads by Google