LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

Computer Science & Engineering 2111 Text Functions 1CSE 2111 Lecture-Text Functions.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Administrivia Homework 1 graded – you should have gotten an from me.
LING/C SC/PSYC 438/538 Lecture 4 9/1 Sandiway Fong.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Comparing Numeric Values If Val(Text1.Text) = MaxPrice Then (Is the current numeric value stored in the Text property of Text1 equal to the value stored.
COS 381 Day 22. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong. Today’s Topics File input/output – open, References Perl modules Homework 2: due next Monday by midnight.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
Lists in Python.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Continuing with Perl Homework 3: first Perl homework – due Sunday by midnight – one PDF file, by .
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
LING/C SC/PSYC 438/538 Lecture 3 8/30 Sandiway Fong.
Artificial Intelligence Lecture No. 26 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
2. WRITING SIMPLE PROGRAMS Rocky K. C. Chang September 10, 2015 (Adapted from John Zelle’s slides)
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong. Homework 4 Submit one PDF file Your submission should include code and sample runs Due date Monday 21.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
LING/C SC/PSYC 438/538 Online Lecture 7 Sandiway Fong.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Review: A Computational View Programming language common concepts: 1. sequence of instructions -> order of operations important 2. conditional structures.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
Strings in Python String Methods. String methods You do not have to include the string library to use these! Since strings are objects, you use the dot.
Python Syntax tips Henrike Zschach. 2DTU Systems Biology, Technical University of Denmark Why are we talking about syntax ’Good’ coding Good syntax should.
Chapter 18 The HTML Tag
Arrays and Strings. Arrays in PHP Arrays are made up of multiple memory blocks, each with the same name and differentiated by an index number Each block.
Regular Expressions.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
Regular Expressions Upsorn Praphamontripong CS 1110
Lecture 19 Strings and Regular Expressions
Perl-Compatible Regular Expressions Part 1
Regular Expressions in Perl
Tutorial On Lex & Yacc.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong.
LING 408/508: Computational Techniques for Linguists
CSCI 431 Programming Languages Fall 2003
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
LING 408/508: Computational Techniques for Linguists
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
EECE.2160 ECE Application Programming
EECE.2160 ECE Application Programming
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong

Administrivia Homework 4 Perl regex Python re import re slightly complicated string handling: use raw https://docs.python.or g/3/library/re.html

Regular Expressions to the rescue https://xkcd.com/208/

Regular Expressions from Hell Email validation: RFC 5322: (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~- ]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0- 9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1- 9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9- ]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])+)\])

File I/O Summary Common: Perl: Python: open filehandle (concept comes from the underlying OS) close Perl: https://perldoc.perl.org/perlopentut.html <filehandle> (context: reads a line or the whole file) print filehandle String Python: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files .read() (methods) .readline() .readlines() .write(String) (no newline) print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False) (function)

Homework 4 File: population.txt Contents: Source: Wikipedia rank name continent population (2016) population (2017) fields are separated by a tab (\t) Source: Wikipedia

Homework 4: Question 1 Using Perl Hints: read the file create hash table(s) indexed by country name containing the following information: continent/2016 population/2017 population Compute and print the country that decreased in population. Compute and print the country with the smallest positive increase in population. Print a table of countries in Asia and 2016 population ranked by 2016 population Print a table of countries in Africa and 2016 population ranked inversely by 2016 population Hints: read about split read about tr: $num =~ tr/,//d deletes the pesky commas in $num revisit sort parameters: https://perldoc.perl.org/functions/sort.html if you need to trim whitespace from the ends: $line =~ s/^\s+|\s+$//g; for nicely-formatted lists, read http://perldoc.perl.org/functions/sprintf.html about printf FORMAT

Homework 4: Question 2 Review Do the same exercise in Python3 using a dictionary or dictionaries These may prove useful: str.strip() str.replace() str.split() sys.argv int()

Homework 4: Question 3 In Your Opinion: which code is simpler?

Homework 4 Usual submission rule: ONE PDF file Submit code/run/comments Email subject heading: 438/538 Homework 4 Your Name Due date by midnight of next Monday (review in class on Tuesday)

regex Read textbook chapter 2: section 1 on Regular Expressions

Perl regex Read up on the syntax of Perl regular expressions Online tutorials http://perldoc.perl.org/perlrequick.html http://perldoc.perl.org/perlretut.html

Perl regex Perl regex matching: Perl regex match and substitute: $a =~ /foo/ (/…/ contains a regex) can use in a conditional: e.g. if ($a =~ /foo/) … evaluates to true/false depending on what’s in $a can also use as a statement: e.g. $a =~ /foo/; variable $& contains the match Perl regex match and substitute: $a =~ s/foo/bar/ s/…match… /…substitute… / contains two expressions will modify $a by looking for a single occurrence of match and replacing that with substitute s/…match… /…substitute… /g global substitution

Perl regex Most useful with the code template for reading in a file line-by-line: open($txtfile, $ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$txtfile>) { do RE stuff with $line } close($txtfile)

Chapter 2: JM character class: Perl lingo

Chapter 2: JM range: in ASCII table backslash lowercase letter for class Uppercase variant for all but class

Chapter 2: JM

Chapter 2: JM Sheeptalk

Perl regex \s is a whitespace, so \S is a non- whitespace \S+ing\b \s is a whitespace, so \S is a non- whitespace + is repetition (1 or more) \b is a word boundary, (words are made up of \w characters)

Perl regex \b or \b{wb} global variables

Perl regex: Unicode and \b \b{wb} Note: global match in while-loop

Perl regex: Unicode and \w \w is [0-9A-Za-z_] Definition is expanded for Unicode: use utf8; use open qw(:std :utf8); my $str = "school école École šola trường स्कूल škole โรงเรียน"; @words = ($str =~ /(\w+)/g); foreach $word (@words) { print "$word\n" } list context Pragma https://perldoc.perl.org/open.html

Chapter 2: JM

Chapter 2: JM Precedence of operators Perl: Precedence Hierarchy: Example: Column 1 Column 2 Column 3 … /Column [0-9]+ */ /(Column [0-9]+ *)*/ /house(cat(s|)|)/ (| = disjunction; ? = optional) Perl: in a regular expression the pattern matched by within the pair of parentheses is stored in global variables $1 (and $2 and so on) Precedence Hierarchy: space

returns 1 (true) or "" (empty if false) Perl regex http://perldoc.perl.org/perlretut.html returns 1 (true) or "" (empty if false) A shortcut: list context for matching returns a list