Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.

Similar presentations


Presentation on theme: "LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong."— Presentation transcript:

1 LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong

2 Administrivia Homework 4 Perl regex Python re
import re slightly complicated string handling: use raw g/3/library/re.html

3 Regular Expressions to the rescue

4 Regular Expressions from Hell
validation: RFC 5322: (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~- ]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01- 9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1- 9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9- ]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])+)\])

5 File I/O Summary Common: Perl: Python: open
filehandle (concept comes from the underlying OS) close Perl: <filehandle> (context: reads a line or the whole file) print filehandle String Python: .read() (methods) .readline() .readlines() .write(String) (no newline) print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False) (function)

6 Homework 4 File: population.txt Contents: Source: Wikipedia rank name
continent population (2016) population (2017) fields are separated by a tab (\t) Source: Wikipedia

7 Homework 4: Question 1 Using Perl Hints: read the file
create hash table(s) indexed by country name containing the following information: continent/2016 population/2017 population Compute and print the country that decreased in population. Compute and print the country with the smallest positive increase in population. Print a table of countries in Asia and 2016 population ranked by 2016 population Print a table of countries in Africa and 2016 population ranked inversely by 2016 population Hints: read about split read about tr: $num =~ tr/,//d deletes the pesky commas in $num revisit sort parameters: if you need to trim whitespace from the ends: $line =~ s/^\s+|\s+$//g; for nicely-formatted lists, read about printf FORMAT

8 Homework 4: Question 2 Review
Do the same exercise in Python3 using a dictionary or dictionaries These may prove useful: str.strip() str.replace() str.split() sys.argv int()

9 Homework 4: Question 3 In Your Opinion: which code is simpler?

10 Homework 4 Usual submission rule: ONE PDF file
Submit code/run/comments subject heading: 438/538 Homework 4 Your Name Due date by midnight of next Monday (review in class on Tuesday)

11 regex Read textbook chapter 2: section 1 on Regular Expressions

12 Perl regex Read up on the syntax of Perl regular expressions
Online tutorials

13 Perl regex Perl regex matching: Perl regex match and substitute:
$a =~ /foo/ (/…/ contains a regex) can use in a conditional: e.g. if ($a =~ /foo/) … evaluates to true/false depending on what’s in $a can also use as a statement: e.g. $a =~ /foo/; variable $& contains the match Perl regex match and substitute: $a =~ s/foo/bar/ s/…match… /…substitute… / contains two expressions will modify $a by looking for a single occurrence of match and replacing that with substitute s/…match… /…substitute… /g global substitution

14 Perl regex Most useful with the code template for reading in a file line-by-line: open($txtfile, $ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$txtfile>) { do RE stuff with $line } close($txtfile)

15 Chapter 2: JM character class: Perl lingo

16 Chapter 2: JM range: in ASCII table
backslash lowercase letter for class Uppercase variant for all but class

17 Chapter 2: JM

18 Chapter 2: JM Sheeptalk

19 Perl regex \s is a whitespace, so \S is a non- whitespace
\S+ing\b \s is a whitespace, so \S is a non- whitespace + is repetition (1 or more) \b is a word boundary, (words are made up of \w characters)

20 Perl regex \b or \b{wb} global variables

21 Perl regex: Unicode and \b
\b{wb} Note: global match in while-loop

22 Perl regex: Unicode and \w
\w is [0-9A-Za-z_] Definition is expanded for Unicode: use utf8; use open qw(:std :utf8); my $str = "school école École šola trường स्कूल škole โรงเรียน"; @words = ($str =~ /(\w+)/g); foreach $word { print "$word\n" } list context Pragma

23 Chapter 2: JM

24 Chapter 2: JM Precedence of operators Perl: Precedence Hierarchy:
Example: Column 1 Column 2 Column 3 … /Column [0-9]+ */ /(Column [0-9]+ *)*/ /house(cat(s|)|)/ (| = disjunction; ? = optional) Perl: in a regular expression the pattern matched by within the pair of parentheses is stored in global variables $1 (and $2 and so on) Precedence Hierarchy: space

25 returns 1 (true) or "" (empty if false)
Perl regex returns 1 (true) or "" (empty if false) A shortcut: list context for matching returns a list


Download ppt "LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong."

Similar presentations


Ads by Google