Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.

Similar presentations


Presentation on theme: "LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong."— Presentation transcript:

1 LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong

2 Today's Topics Homework 2 Review A note on Windows and Unicode
More basic programming: on string operations and file I/O Reading Homework: Chapter 2 of JM on regular expressions (regex) Perl is great at regexs!

3 Homework 2 Review Q1: Perl. Q2: Perl. Q3: Python. Q4: Python.
What = 4 x 4 do? Q2: Perl. @l1 = ("a", "e", "i", "o" , "u"); @l2 = ("あ", "え", "い", "お", "う"); write a program that builds a hash that maps romaji to hiragana, e.g. $h{i} should be "い" Give examples to show your program works Hint: use a loop and shift (or pop) Q3: Python. Do the same for l1 and l2 in Python using zip. Q4: Python. Do the same, without zip, using list comprehensions

4 Homework 2 Review Q2: either pop or shift will work here…

5 Homework 2 Review Q3: Python with zip()
Start with Perl (and and ;) Note: we're using Python3 in this course Python 2.7 doesn't handle non-ASCII characters by default.

6 Homework 2 Review Q4: Python: use list comprehension (instead of zip()) avoiding zip() is tough in Python: e.g. comprehension {k:v for k,v in zip(l1, l2)} is zip() unavoidable? it creates a temporary list (that's thrown away: garbage collected)

7 Unicode and PowerShell
copy from PowerPoint into NotePad Save As … with encoding set to UTF-8

8 Unicode and PowerShell
Windows 10: Default console is not UTF-8 and uses ancient codepage technology (437 = US)! Set it to UTF-8. Note codepage change. Unfortunately, it now understands UTF-8, but fails to print the character!

9 Unicode and PowerShell
Right-click menu bar Properties > Font Consult soft.com/en- us/typography/fon t-list/ for the codepages that each font supports

10 Unicode and PowerShell
Default console font is actually called Consolas Even the Lucida Console font family is limited.

11 Unicode and PowerShell
Pick a known Japanese font licensed by Microsoft from Ricoh (Japan). MS Mincho

12 Unicode and PowerShell
Et voilà!

13 Perl: useful string functions
chomp (useful with file I/O) vs. chop To split a string into an array of words: Note: multiple spaces ok with " " variant

14 Python: .split() string (sentence) splitting is an important part of text processing. Oftentimes we split strings by a regular expression: import re re.split(regex,s)

15 Perl: useful string functions
substr

16 Perl: useful string functions
Transliterate: tr/matchingcharacters/replacementcharacters/modifiers modifiers are optional:

17 Perl: useful string functions
Perl doesn't have a built-in trim-whitespace-from-both-ends-of-a- string function. Can be mimicked using regex (more later) Python:

18 Python: strings Many methods that work on lists also work on strings

19 Python: strings List comprehension:
sentence = ['A', 'big', 'cat', 'in', 'Tucson'] [x.lower() for x in sentence] Suppose we want to use .endswith() in a list comprehension: Reference:


Download ppt "LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong."

Similar presentations


Ads by Google