REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
Regular expressions Day 2
Advertisements

LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
TEXT STATISTICS 7 DAY /05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
UNICODE & CONTROL DAY /24/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
TEXT STATISTICS 5 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
NLTK & BASIC TEXT STATS DAY /08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
UNICODE DAY /22/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Structured programming 3 Day 33 LING Computational Linguistics Harry Howard Tulane University.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
January 23, 2007Spring Unix Lecture 2 Special Characters for Searches & Substitutions Shell Scripts Hana Filip.
SCRIPTS & FUNCTIONS DAY /06/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
TWITTER DAY /07/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER 2 DAY /10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Basic Text Processing Regular Expressions. Dan Jurafsky 2 The original slides from: tml Some changes.
WEB TEXT DAY /14/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
REGULAR EXPRESSIONS 3 DAY 8 - 9/12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Comparison the RANGE block Day 8 Computer Programming through Robotics CPST 410 Summer 2009.
REGULAR EXPRESSIONS 4 DAY 9 - 9/15/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Semantics Day 38 LING Computational Linguistics Harry Howard Tulane University.
REGULAR EXPRESSIONS 2 DAY 7 - 9/10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
CompSci 6 Introduction to Computer Science November 8, 2011 Prof. Rodger.
CompSci 101 Introduction to Computer Science November 18, 2014 Prof. Rodger.
LING 408/508: Programming for Linguists Lecture 14 October 19 th.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
A.4b Synthetic Division To thine own self be true, And it must follow, as the night the day, Thou canst not then be false to any man. -William Shakespeare.
ON-LINE DOCUMENTS DAY /13/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CONTROL 2 DAY /26/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER 3 DAY /12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
CONTROL 3 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lists 2 Day /19/14 LING 3820 & 6820 Natural Language Processing
Flat text Day 6 - 9/12/16 LING 3820 & 6820 Natural Language Processing
Computation with strings 2 Day 3 - 9/02/16
Computation with strings 3 Day 4 - 9/07/16
Regular expressions 2 Day /23/16
control 4 Day /01/14 LING 3820 & 6820 Natural Language Processing
LING 3820 & 6820 Natural Language Processing Harry Howard
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
Hamlet Journal Prompts.
Regular expressions 3 Day /26/16
A.4b Synthetic Division To thine own self be true, And it must follow, as the night the day, Thou canst not then be false to any man. -William Shakespeare.
Computation with strings 4 Day 5 - 9/09/16
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
LING 388: Computers and Language
Presentation transcript:

REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

Course organization 08-Sept-2014NLP, Prof. Howard, Tulane University 2   The syllabus is under construction. 

The quiz was the review. Review 08-Sept NLP, Prof. Howard, Tulane University

Open Spyder 08-Sept NLP, Prof. Howard, Tulane University

§4. Regular expressions 08-Sept NLP, Prof. Howard, Tulane University

Regular expressions, or regex  >>> import re  re.findall(pattern, target string) 08-Sept-2014NLP, Prof. Howard, Tulane University 6

4.2. Fixed-length matching 08-Sept NLP, Prof. Howard, Tulane University

The test string >>> S = '''This above all: to thine own self be true,... And it must follow, as the night the day,... Thou canst not then be false to any man.''' 08-Sept-2014NLP, Prof. Howard, Tulane University 8

Strings as regular expressions >>> re.findall(' be ', S) [' be ', ' be '] 08-Sept-2014NLP, Prof. Howard, Tulane University 9

Match one character of a disjunction with | >>> re.findall(' to | be | it | as ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to '] >>> set(re.findall(' to | be | it | as ', S)) set([' it ', ' as ', ' to ', ' be ']) 08-Sept-2014NLP, Prof. Howard, Tulane University 10

Match a group of characters with capturing or non-capturing parentheses, () >>> re.findall(' (to|be|it|as) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] R>>> re.findall(' (?:to|be|it|as) ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to ']  The default behavior of parentheses is to capture the string inside them in the output. The ?: prefix turns capturing off. For the rest of this discussion, we prefer to exclude the spaces from the output. 08-Sept-2014NLP, Prof. Howard, Tulane University 11

Match one character of a range with [] and its negation with [^] >>> re.findall(' ([a-z][a-z]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([^0-9][^0-9]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([a-e][a-e]) ', S) ['be', 'be'] >>> re.findall(' ([^a-e][^a-e]) ', S) ['to', 'it', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 12

Match a number of repetitions of a character with {} >>> re.findall(' ([a-z]{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 13

Match any character with. >>> re.findall(' (..) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' (.{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 14

and following Next time 08-Sept-2014NLP, Prof. Howard, Tulane University 15