Presentation is loading. Please wait.

Presentation is loading. Please wait.

REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.

Similar presentations


Presentation on theme: "REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University."— Presentation transcript:

1 REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

2 Course organization 08-Sept-2014NLP, Prof. Howard, Tulane University 2  http://www.tulane.edu/~howard/LING3820/ http://www.tulane.edu/~howard/LING3820/  The syllabus is under construction.  http://www.tulane.edu/~howard/CompCultEN/ http://www.tulane.edu/~howard/CompCultEN/

3 The quiz was the review. Review 08-Sept-2014 3 NLP, Prof. Howard, Tulane University

4 Open Spyder 08-Sept-2014 4 NLP, Prof. Howard, Tulane University

5 §4. Regular expressions 08-Sept-2014 5 NLP, Prof. Howard, Tulane University

6 Regular expressions, or regex  >>> import re  re.findall(pattern, target string) 08-Sept-2014NLP, Prof. Howard, Tulane University 6

7 4.2. Fixed-length matching 08-Sept-2014 7 NLP, Prof. Howard, Tulane University

8 The test string >>> S = '''This above all: to thine own self be true,... And it must follow, as the night the day,... Thou canst not then be false to any man.''' 08-Sept-2014NLP, Prof. Howard, Tulane University 8

9 Strings as regular expressions >>> re.findall(' be ', S) [' be ', ' be '] 08-Sept-2014NLP, Prof. Howard, Tulane University 9

10 Match one character of a disjunction with | >>> re.findall(' to | be | it | as ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to '] >>> set(re.findall(' to | be | it | as ', S)) set([' it ', ' as ', ' to ', ' be ']) 08-Sept-2014NLP, Prof. Howard, Tulane University 10

11 Match a group of characters with capturing or non-capturing parentheses, () >>> re.findall(' (to|be|it|as) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] R>>> re.findall(' (?:to|be|it|as) ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to ']  The default behavior of parentheses is to capture the string inside them in the output. The ?: prefix turns capturing off. For the rest of this discussion, we prefer to exclude the spaces from the output. 08-Sept-2014NLP, Prof. Howard, Tulane University 11

12 Match one character of a range with [] and its negation with [^] >>> re.findall(' ([a-z][a-z]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([^0-9][^0-9]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([a-e][a-e]) ', S) ['be', 'be'] >>> re.findall(' ([^a-e][^a-e]) ', S) ['to', 'it', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 12

13 Match a number of repetitions of a character with {} >>> re.findall(' ([a-z]{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 13

14 Match any character with. >>> re.findall(' (..) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' (.{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 14

15 4.2.7. and following Next time 08-Sept-2014NLP, Prof. Howard, Tulane University 15


Download ppt "REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University."

Similar presentations


Ads by Google