Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.

Similar presentations


Presentation on theme: "The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2."— Presentation transcript:

1 The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2

2 Searching for a Single String Using a Nondeterministic FSM c o c o n u t  1 2 3 4 5 6 7 8

3 Searching for a Single String Using the Boyer Moore Algorithm

4 Searching for Multiple Strings c o c o n u t  1 2 3 4 5 6 7 8 o c o s 2 3 4 5 6 l  Example: lococonut

5 Converting to a Deterministic FSM c o c o n u t  1 2 3 4 5 6 7 8 o c o s 2 3 4 5 6 l 

6 Regular Expressions Two different (but related) uses of the term: Expressions that define all and only the regular languages (aa  ab  ba  bb)* Expressions in a useful pattern language Matching ip addresses: S! ([0-9]+ (\. [0-9]+) {3}) ! $1 ! Finding doubled words: \

7 REs: Syntax and Semantics Syntax The regular expressions over an alphabet  are all strings over the alphabet   {(, ), , , *} that can be obtained as follows: 1.  and each member of  is a regular expression. 2. If ,  are regular expressions, then so is . 3. If ,  are regular expressions, then so is . 4. If  is a regular expression, then so is  *. 5. If  is a regular expression, then so is (  ). 6. Nothing else is a regular expression.

8 REs: Syntax and Semantics Regular expressions define languages via a semantic interpretation function we'll call L: 1. L(  ) =  and L(a) = {a} for each a   2. If ,  are regular expressions, then L(  ) = L(  ) L(  ) = all strings that can be formed by concatenating to some string from L(  ) some string from L(  ). 3. If ,  are regular expressions, then L(  ) = L(  )  L(  ) 4. If  is a regular expression, then L(  *) = L(  )* 5. If (  ) is a regular expression, thenL( (  ) ) = L(  ) A language is regular if and only if it can be described by a regular expression. Note: L is compositional.

9 The Importance of Compositionality What is the meaning of: Mary cooked the yujutes. Mary tyroked the yujutes.

10 Morphological Analysis Read J & M Chapter 3 Recognize words Parse words

11 Morphological Parsing Goal: to represent the facts declaratively so that a single representation can be used for both recognition and generation. Note: ^ marks morpheme boundaries. # marks word boundaries.

12 From Lexical to Intermediate Note: All the transducers in the book are described as lexical:intermediate, but they can run the other direction.

13 Where Did reg-noun-stem Come From?

14 We Can Cascade or Compose

15 From Intermediate to Surface For text, we need spelling rules. x   e / s ^ ___ s # z Read this as “Replace  as e in the context after the /.

16 Turning the Rule into a Transducer foxes xerox fox#sat

17 Disambiguation - Local Local ambiguities: asses # s# luxury

18 Disambiguation - Harder Sometimes additional knowledge is necessary: foxes: fox +N + PL or fox +V +SG Can we think of nouns that cannot also be verbs?

19 Search For FSMs, we can build a deterministic machine. In other cases, we will have to search: Depth-first Breadth-first – chart parsing S VP NP PP NP NP V V PR N det N PREP DET N I hit the boy with a bat.


Download ppt "The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2."

Similar presentations


Ads by Google