Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388: Computers and Language

Similar presentations


Presentation on theme: "LING 388: Computers and Language"— Presentation transcript:

1 LING 388: Computers and Language
Lecture 14

2 Administrivia Homework 7 out today – due Friday night by midnight

3 Unicode characters ok in Python 3.x
Python regex recap Unicode characters ok in Python 3.x Summary: \w a character [A-Za-z0-9_] \d [0-9] \b word boundary \s space character [ \t\n\r\f\v] Operators: * zero or more repeats + one or more repeats ( ) grouping Raw string (avoid escaping \): r"\w+" Negation: \W anything not in \w \D anything not in \d Methods: m = re.search(pattern, string) return match object or None m = re.match(pattern, string) l = re.findall(pattern, string) return list of strings/tuples Full Documentation:

4 Python regex More examples from

5 The trouble with re.findall()
Only capturing groups (…) are reported Example: >>> text = "ababcababababacabd" >>> import re >>> re.findall(r'(ab)+', text) ['ab', 'ab', 'ab'] >>> re.findall(r'((ab)+)', text) [('abab', 'ab'), ('abababab', 'ab'), ('ab', 'ab')]

6 The trouble with re.findall()
Example (using list comprehension): >>> text = "ababcababababacabd" >>> [tuple[0] for tuple in  re.findall(r'((ab)+)', text)] ['abab', 'abababab', 'ab']

7 Review examples Regex for money: $ followed by digits
comma (for thousands, optional) decimal point (optional)

8 Python regex Other useful meta-characters: ^ matches beginning of line
$ matches end of line \n n = group number, must match identically to group

9 Python's re module

10 Python's re module

11 Python's re module

12 Homework 7 What went wrong on the High Street in 2018?
?intlink_from_url= ext/long-reads&link_location=live-reporting-story hw7.txt Using regexs in Python, find: Find the numbers in the article. List them. How many of them are there? Find all the named entities (approximately everything beginning with an uppercase letter denoting people, places, organizations etc.), e.g. Toys R Us or New Look. List them. How many of them are there? How could you filter out the words at the beginning of each sentence that aren't really named entities? Show your code. How many named entities now?

13 Homework 7 One PDF file Show your Python work
Submission by to me by Friday night


Download ppt "LING 388: Computers and Language"

Similar presentations


Ads by Google