LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong
Today's Topics Homework 2 Review A note on Windows and Unicode More basic programming: on string operations and file I/O Reading Homework: Chapter 2 of JM on regular expressions (regex) Perl is great at regexs!
Homework 2 Review Q1: Perl. Q2: Perl. Q3: Python. Q4: Python. What does @a = 4 x 4 do? Q2: Perl. @l1 = ("a", "e", "i", "o" , "u"); @l2 = ("あ", "え", "い", "お", "う"); write a program that builds a hash that maps romaji to hiragana, e.g. $h{i} should be "い" Give examples to show your program works Hint: use a loop and shift (or pop) Q3: Python. Do the same for l1 and l2 in Python using zip. Q4: Python. Do the same, without zip, using list comprehensions
Homework 2 Review Q2: either pop or shift will work here…
Homework 2 Review Q3: Python with zip() Start with Perl (and remove @ and ;) Note: we're using Python3 in this course Python 2.7 doesn't handle non-ASCII characters by default.
Homework 2 Review Q4: Python: use list comprehension (instead of zip()) avoiding zip() is tough in Python: e.g. comprehension {k:v for k,v in zip(l1, l2)} is zip() unavoidable? it creates a temporary list (that's thrown away: garbage collected)
Unicode and PowerShell copy from PowerPoint into NotePad Save As … with encoding set to UTF-8
Unicode and PowerShell Windows 10: Default console is not UTF-8 and uses ancient codepage technology (437 = US)! Set it to UTF-8. Note codepage change. Unfortunately, it now understands UTF-8, but fails to print the character!
Unicode and PowerShell Right-click menu bar Properties > Font Consult https://docs.micro soft.com/en- us/typography/fon t-list/ for the codepages that each font supports
Unicode and PowerShell Default console font is actually called Consolas Even the Lucida Console font family is limited.
Unicode and PowerShell Pick a known Japanese font licensed by Microsoft from Ricoh (Japan). MS Mincho
Unicode and PowerShell Et voilà!
Perl: useful string functions chomp (useful with file I/O) vs. chop To split a string into an array of words: Note: multiple spaces ok with " " variant
Python: .split() string (sentence) splitting is an important part of text processing. Oftentimes we split strings by a regular expression: import re re.split(regex,s)
Perl: useful string functions substr
Perl: useful string functions Transliterate: tr/matchingcharacters/replacementcharacters/modifiers modifiers are optional:
Perl: useful string functions Perl doesn't have a built-in trim-whitespace-from-both-ends-of-a- string function. Can be mimicked using regex (more later) Python:
Python: strings Many methods that work on lists also work on strings
Python: strings List comprehension: sentence = ['A', 'big', 'cat', 'in', 'Tucson'] [x.lower() for x in sentence] Suppose we want to use .endswith() in a list comprehension: Reference: https://docs.python.org/3.7/library/stdtypes.html#text-sequence-type-str