Download presentation

Presentation is loading. Please wait.

1
**Regular expressions Day 2**

LING Computational Linguistics Harry Howard Tulane University

2
**LING 681.02, Prof. Howard, Tulane University**

Course organization 24-Aug-2009 LING , Prof. Howard, Tulane University

3
Regular expressions SLP 2.1

4
**LING 681.02, Prof. Howard, Tulane University**

Questions What is a string? A sequence of symbols. In text, a sequence of alphanumeric characters. What is a regular expression (RE or regex)? A language for specifying text search strings, requiring a pattern to search for and and a corpus to search through. What is an algebra? A set of elements and a group of operations defined for them e.g. the set of real numbers and the operations +, –, *, and /. What is a false positive? a string that is incorrectly matched > decreases accuracy What is a false negative? a string that is incorrectly excluded > decreases coverage What is precedence? 24-Aug-2009 LING , Prof. Howard, Tulane University

5
**LING 681.02, Prof. Howard, Tulane University**

Notation in Perl * + - ^ ? . | () {n} \b \w $ \1 0 or more occurrences of the previous character or RE 1 or more occurrences of the previous character or RE The two ends of a range Not (negation) or beginning of line; "caret" the previous character is optional any character either … or "pipe" grouping or put in a register n occurrences of previous character or RE word boundary white space end of line replace with RE in register 1 24-Aug-2009 LING , Prof. Howard, Tulane University

6
**LING 681.02, Prof. Howard, Tulane University**

Exercise 2.1: REs The set of all alphabetic strings. [a-zA-Z][a-zA-Z]* [a-zA-Z]+ The set of all lower case alphabetic strings ending in a b. [a-z]*b The set of all strings with two consecutive repeated words (e.g., “Humbert Humbert” and “the the” but not “the bug” or “the big bug”). ([a-zA-Z]+)\s+\1 24-Aug-2009 LING , Prof. Howard, Tulane University

7
**LING 681.02, Prof. Howard, Tulane University**

Exercise 2.1: REs, cont. The set of all strings from the alphabet a, b such that each a is immediately preceded by and immediately followed by a b. (b+(ab+)+)? All strings that start at the beginning of the line with an integer and that end at the end of the line with a word. ˆ\d+\b.*\b[a-zA-Z]+$ 24-Aug-2009 LING , Prof. Howard, Tulane University

8
**LING 681.02, Prof. Howard, Tulane University**

Exercise 2.1: REs, cont. All strings that have both the word grotto and the word raven in them (but not, e.g., words like grottos that merely contain the word grotto). \bgrotto\b.*\braven\b|\braven\b.*\bgrotto\b Write a pattern that places the first word of an English sentence in a register. Deal with punctuation. ˆ[ˆa-zA-Z]*([a-zA-Z]+) 24-Aug-2009 LING , Prof. Howard, Tulane University

9
**LING 681.02, Prof. Howard, Tulane University**

Exercise 2.2 patterns (r"\b(i’m|i am)\b", "YOU ARE"), (r"\b(i|me)\b", "YOU"), (r"\b(my)\b", "YOUR"), (r"\b(well,?) ", ""), (r".* YOU ARE (depressed|sad) .*", r"I AM SORRY TO HEAR YOU ARE \1"), (r".* YOU ARE (depressed|sad) .*", r"WHY DO YOU THINK YOU ARE \1"), (r".* all .*", "IN WHAT WAY"), (r".* always .*", "CAN YOU THINK OF A SPECIFIC EXAMPLE"), (r"[%s]" % re.escape(string.punctuation), ""), 24-Aug-2009 LING , Prof. Howard, Tulane University

10
NLPP

11
**LING 681.02, Prof. Howard, Tulane University**

REs in Python The re module provides Perl-type regular expression patterns, see NLPP goes into REs in §3.4, p. 97ff 24-Aug-2009 LING , Prof. Howard, Tulane University

12
**Next time SLP Automata: §2.2-end & Ex. 2.3-end**

NLPP: finish §1, do as many of the exercises as you can

Similar presentations

OK

System Programming Regular Expressions Regular Expressions

System Programming Regular Expressions Regular Expressions

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on 5g technology Ppt on magic tee Ppt on rocks and minerals Ppt on eye os online Ppt on cross-sectional study design Download ppt on coordinate geometry for class 9th physics Ppt on formation of company Ppt on blood stain pattern analysis research Ppt on tsunami warning system to mobile Ppt on dollar vs rupee