Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing Lecture 4 : Regular Expressions and Automata.

Similar presentations


Presentation on theme: "Natural Language Processing Lecture 4 : Regular Expressions and Automata."— Presentation transcript:

1 Natural Language Processing Lecture 4 : Regular Expressions and Automata

2 2 A language is a set of strings String: A sequence of letters  Examples: “cat”, “dog”, “house”, …  Defined over an alphabet: Definitions

3 3 Alphabets and Strings We will use small alphabets: Strings

4 Regular Expressions In computer science, RE is a language used for specifying text search string. A regular expression is a formula in a special language that is used for specifying a simple class of string. Formally, a regular expression is an algebraic notation for characterizing a set of strings. RE search requires  a pattern that we want to search for, and  a corpus of texts to search through.

5 5 Basic Regular Expression Patterns The use of the brackets [] to specify a disjunction of characters. The use of the brackets [] plus the dash - to specify a range.

6 6 Basic Regular Expression Patterns Uses of the caret ^ for negation or just to mean ^ The question-mark ? marks optionality of the previous expression. The use of period. to specify any character

7 7 Disjunction, Grouping, and Precedence Disjunction /cat|dog Precedence /gupp(y|ies) To find the English article the /the/ /[tT]he/ /[^a-zA-Z][tT]he[^a-zA-Z]/

8 8 Aliases for common sets of characters

9 9 Regular expression operators for counting

10 10 Some characters that need to be backslashed

11 11 Finite State Automata FSAs recognize the regular languages represented by regular expressions  SheepTalk: /baa+!/ Directed graph with labeled nodes and arc transitions Five states: q0 the start state, q4 the final state, 5 transitions q0 q4 q1 q2 q3 ba a a!

12 12 Formally FSA is a 5-tuple consisting of  Q: set of states {q0,q1,q2,q3,q4}   : an alphabet of symbols {a,b,!}  q0 : A start state  F : a set of final states in Q {q4}   (q,i): a transition function mapping Q x  to Q q0 q4 q1 q2 q3 ba a a!

13 13 FSA recognizes (accepts) strings of a regular language  baa!  baaa!  baaaa!  … Tape Input: a rejected input aba!b q0 q4 q1 q2 q3 ba a a!

14 14 State Transition Table for SheepTalkSheepTalk State Input ba! 01ØØ 1Ø2Ø 2Ø3Ø 3Ø34 4ØØØ baa! baaa! baaaa! baaaaa !... q0 q4 q1 q2 q3 ba a a!

15 15 Non-Deterministic FSAs for SheepTalk q0 q4 q1 q2 q3 ba a a! q0 q4 q1 q2 q3 baa! 

16 16 Finite Accepter Input “Accept” or “Reject” String Finite Automata Output

17 17 Transition Graph initial state final state “accept” state transition abba -Finite Accepter

18 18 Initial Configuration Input String

19 12/21/201519 Reading the Input

20 12/21/201520

21 12/21/201521

22 12/21/201522

23 12/21/201523 Output: “accept”

24 12/21/201524 Rejection

25 12/21/201525

26 12/21/201526

27 12/21/201527

28 12/21/201528 Output: “reject”

29 12/21/201529 Another Example

30 12/21/201530

31 12/21/201531

32 12/21/201532

33 12/21/201533 Output: “accept”

34 12/21/201534 Rejection

35 12/21/201535

36 12/21/201536

37 12/21/201537

38 12/21/201538 Output: “reject”

39 12/21/201539 Formalities Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states

40 12/21/201540 About Alphabets Alphabets means we need a finite set of symbols in the input. These symbols can and will stand for bigger objects that can have internal structure.

41 12/21/201541 Input Aplhabet

42 12/21/201542 Set of States

43 12/21/201543 Initial State

44 12/21/201544 Set of Final States

45 12/21/201545 Transition Function

46 12/21/201546

47 12/21/201547

48 12/21/201548

49 12/21/201549 Transition Function

50 12/21/201550 Extended Transition Function (Reads the entire string)

51 12/21/201551

52 12/21/201552

53 12/21/201553

54 12/21/201554 Observation: There is a walk from to with label

55 12/21/201555 Example accept

56 12/21/201556 Another Example accept

57 12/21/201557 More Examples accept trap state

58 12/21/201558 = { all substrings with prefix } accept

59 12/21/201559 = { all strings without substring }

60 12/21/201560 Regular Languages A language is regular if there is a DFA such that All regular languages form a language family

61 12/21/201561 Example The language is regular:

62 12/21/201562 Dollars and Cents


Download ppt "Natural Language Processing Lecture 4 : Regular Expressions and Automata."

Similar presentations


Ads by Google