LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 13: 10/9.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
LING/C SC/PSYC 438/538 Lecture 4 9/1 Sandiway Fong.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/11.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/6.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Chapter 2 How to Compile and Execute a Simple Program.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong. Today’s Topics File input/output – open, References Perl modules Homework 2: due next Monday by midnight.
Tutorial 14 Working with Forms and Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Programming for Linguists An Introduction to Python 24/11/2011.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 9 More About Strings.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong. Today’s Topics Did you read Chapter 1 of JM? – Short Homework 2 (submit by midnight Friday) Today is Perl.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
LING/C SC/PSYC 438/538 Lecture 3 8/30 Sandiway Fong.
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Homework 6 graded.
Decision Structures, String Comparison, Nested Structures
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong. Homework 4 Submit one PDF file Your submission should include code and sample runs Due date Monday 21.
INLS 560 – S TRINGS Instructor: Jason Carter. T YPES int list string.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
LING/C SC/PSYC 438/538 Online Lecture 7 Sandiway Fong.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
C++ Memory Management – Homework Exercises
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
LING 408/508: Computational Techniques for Linguists
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Topics Basic String Operations String Slicing
Python Strings.
Topics Basic String Operations String Slicing
Perl Regular Expressions – Part 1
Topics Basic String Operations String Slicing
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30

Administrivia Homework 1 Note –some of you have already submitted Homework 1 –there is some question as to whether you should treat the letter ‘y’ as a consonant or vowel –(I just used the traditional 5 (orthographical) vowels: a, e, i, o and u –but either way is fine, up to you: as long as you state your assumptions –I’m looking for letter sequences within a word only when it comes to palindromes, e.g. levels or racecar

Regexp: Recap Grouping –metacharacters ( and ) delimit a group –inside a regexp, each group can be referenced using backreferences \1, \2, and so on... –outside a regexp, each group is stored in a variable $1, $2, and so on... Example: open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ( ) { print $1, "\n" if (/(\b\w*([aeiou])\2\w*\b)/); } Grouping: –( 1 h( 2 e)ed) $1 \2 –( 1 b( 2 o)oks)

More on Perl and regexps In the previous example:... print $1 if (/(\b\w*([aeiou])\2\w*\b )/);.... it is assumed by default we are matching with the variable $_ We can also match against a variable of our own choosing using the =~ operator Example: $x = “this string”; if ($x =~ /^this/) { print “ok” } Note: =~ returns 1 (to be interpreted as boolean true) when there is a match, ‘’ (false) otherwise

More on Perl and regexps Matching is by default case sensitive: this can be changed using the modifier i /regexp/i Example: –/the/i and –/[tT][hH][eE]/ are equivalent

Perl and regexps Normally, Perl takes the first match only Multiple matches within a string can be made using the g modifier with a loop: /regexp/g Example: $x = “the cat sat on the mat”; while ( $x =~ /the/g ) { print “match!\n” } prints match! twice The number 0, the strings '0' and '', the empty list (), and undef are all false in a boolean context. All other values are true.undef

Perl and regexps For multiple matching cases: /regexp/g Perl must “remember” where in the string it has matched up to the function pos string can be used to keep track of where it is Example: $x = “heed head book”; while ($x =~ /([aeiou])\1/g) { print “Match ends at position “, pos $x, “\n” } heedheadbook

Perl and regexps Substitution: –s/re 1 /re 2 / –replace 1st regexp (r e 1 ) with 2nd regexp ( re 2 ) –s/re 1 /re 2 /g –global ( g ) replacement version –relevant if re 1 occurs more than once in a line Example: $x = “\$150 million was spent”; $x =~ s/(\d+) million/$1,000,000/; =~ means apply expression on the right hand side of =~ to the string referenced on the left hand side NB. $ in the string needs to be escaped in Perl because $ starts a variable

Perl and regexps Homework file wsj2000.txt contains sentences with spaces separating punctuation characters Let’s try writing a Perl program to eliminate those extra spaces Example: –Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. –Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Modified: –Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. –Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.

Perl and regexps Sample code: open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ( ) { s/ ([.,?;:])/$1/g; print; } Example: –Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. –Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Modified: –Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. –Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. concepts: search and replace, grouping, global print takes $_ as default

Perl and regexps Good start, but more needed... –`` “ ‘s % $ –exercise for the reader... `` We have no useful information on whether users are at risk, '' said James A. Talcott of Boston 's Dana- Farber Cancer Institute. Exports in October stood at $ 5.29 billion, a mere 0.7 % increase from a year earlier, while imports increased sharply to $ 5.39 billion, up 20 % from last October.

Next time New topic: Regular grammars Make sure you’ve installed SWI-Prolog...