Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.

Slides:



Advertisements
Similar presentations
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Advertisements

Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.
Regular expressions Day 2
1 Regular Expressions and Automata September Lecture #2.
Python regular expressions. “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.”
Python: Regular Expressions
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 CSCI-2400 Models of Computation. 2 Computation CPU memory.
Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.
Regular Expressions Lecture 3. Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Computational Language Finite State Machines and Regular Expressions.
1 Languages and Finite Automata or how to talk to machines...
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Modern Information Retrieval Chapter 4 Query Languages.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Post-Module JavaScript BTM 395: Internet Programming.
Lecture 5 Regular Expressions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Searching and Regular Expressions. Proteins 20 amino acids Interesting structures beta barrel, greek key motif, EF hand... Bind, move, catalyze, recognize,
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Notes on Python Regular Expressions and parser generators (by D. Parson) These are the Python supplements to the author’s slides for Chapter 1 and Section.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Models of Computing Regular Expressions 1. Formal models of computation What can be computed? What is a valid program? What is a valid name of a variable.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Regular Expressions. What is it 4? Text searching & replacing Sequence searching (input, DNA) Sequence Tracking Machine Operation logic machines that.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Regular Expressions Upsorn Praphamontripong CS 1110
Strings and Serialization
Notes on Python Regular Expressions and parser generators (by D
CSC 594 Topics in AI – Natural Language Processing
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Python regular expressions
CSC 594 Topics in AI – Natural Language Processing
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Regular Expressions
i206: Lecture 19: Regular Expressions, cont.
Selenium WebDriver Web Test Tool Training
CS 1111 Introduction to Programming Fall 2018
Python regular expressions
Natural Language Processing (NLP)
PHP –Regular Expressions
Presentation transcript:

Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions

Today Regular Expressions Snippet on Speech Recognition –At least half of it. 1

Regular Expressions Can be viewed as a way to specify –Search patterns over a text string –Design a particular kind of machine, a Finite State Automaton (FSA) we probably won’t cover this today. –Define a formal “language” i.e. a set of strings 2

Uses of Regular Expressions Simple powerful tools for large corpus analysis and ‘shallow’ processing –What word is most likely to begin a sentence –What word is most likely to begin a question? –Are you more or less polite than the people you correspond with? 3

Definitions Regular Expression: Formula in algebraic notation for specifying a set of strings String: Any sequence of characters Regular Expression Search –Pattern: specifies the set of strings we want to search for –Corpus: the texts we want to search through 4

Simple Example 5

More Examples 6

And still more examples 7

Optionality and Repetition /[Ww]oodchucks?/ /colou?r/ /he{3}/ /(he){3}/ /(he){3},/ 8

Character Groups Some groups of characters are used very frequently, so the RE language includes shorthands for them 9

Special Characters These enable the matching of multiple occurrences of a pattern 10

Escape Characters Sometimes you want to use an asterisk “*” as an asterisk and not as a modifier. 11

RE Matching in Python NLTK Set up: –import re –from nltk.util import re_show –sent = “colourless green ideas sleep furiously re_show(pattern, str) –shows where the pattern matches 12

Substitutions Replace every l with an s re.sub(‘l’, ‘s’, sent) –‘cosoursess green ideas sseep furioussy’ re.sub(‘green’, ‘red’, sent) –‘colourless red ideas sleep furiously’ 13

Findall re.findall(pattern, sent) –will return all of the substrings that match the pattern –re.findall(‘(green|sleep)’, sent) [‘green’, ‘sleep’] 14

Match Matches from the beginning of the string match(pattern, string) –Returns: a Match object or None (if not found) Match objects contain information about the search 15

Methods in Match 16

More Match Methods 17

Search re.search(pattern, string) –Finds the pattern anywhere in the string. –re.search(‘\d+’, ‘ 1034 ’).group() ‘1034’ –re.search(‘\d+’, ‘ abc123 ‘).group() ‘123’ 18

Splitting ‘text can be made into lists’.split() re.split(pattern, split) –uses the pattern to identify the split point –re.split(‘\d+’, “I want 4 cats and 13 dogs”) [“I want ”, “ cats and ”, “ dogs”] –re.split(‘\s*\d+\s*’, “I want 4 cats and 13 dogs”) [“I want”, “cats and”, “dogs”] 19

Joining ‘ ‘.[‘lists’, ‘can’, ‘be’, ‘made’, ‘into’, ‘strings’] This simple formatting can be helpful to report results or merge information 20

Stemming with Regular Expressions def stem(word): regexp = r'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$' stem, suffix = re.findall(regexp, word)[0] return stem 21

Play with some code 22

Snippet on Speech Recognition 23