Presentation on theme: "1 A pair of sometimes useful functions Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with."— Presentation transcript:
1 A pair of sometimes useful functions Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with the given character code >>> ord('ff') Traceback (most recent call last): File " ", line 1, in ? TypeError: ord() expected a character, but string of length 2 found >>> ord('f') 102 >>> ord('.') 46 >>> chr(46) '.'
2 Command-line arguments python myprogram.py 10 100 output.txt sys.argv: [ “myprogram.py”, 10, 100, “output.txt” ] First item in sys.argv list: program name, next items are command-line arguments
3 Long headers in Fasta file >AC121234 Medicago truncatula clone mth2-19o7, WO RKING DRAFT SEQUENCE, 5 unordered pieces. GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… … Some applications can’t handle long headers Python program for “pruning” the headers, leaving just the unique ID..?
4 prune.py for line in infile : memory efficient! Note: line ends in newline No newline in first field after splitting the line; use print Instructions for how to use the function Newline still there, write the line as it is with no extra newline sys.argv
5 Before / after pruning >AC121234 Medicago truncatula clone mth2-19o7, WO RKING DRAFT SEQUENCE, 5 unordered pieces. GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… … >AC121234 GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… …
6 Parents Music Resource Center Concerning: crude language in much of today’s music Task: implement censorship to remove bad words More string methods: splitlines, join, replace
7 censorship.py If any words were BEEPed, print line and play one beep per word Split text in list of lines In each line, replace each bad word with BEEP Join censored lines with newlines and return full text
8 "In these moments, moments of our lives All the world is ours And this world is so right You and I sharing this time together Sharing the same dream As the time goes by we will find These are the special times Times we'll remember These are the precious times The tender times we'll hold in our hearts forever These are the sweetest times These times together And through it all, one thing will always be true The special times are the times I share with you With each moment, moment passing by We'll make memories that will last all our lives As you and I travel through time together Living this sweet dream And every day we can say.. With each moment, moment pBEEPing by Beeped words: 1 Program tested on two songs. Celine Dion: We find words containing a bad word: not desirable here. See exercise.
9 Ol' stankin BEEP (Hoe) Jank BEEP (Hoe) Suck my BEEP you (Hoe) Ol' fat BEEP (Hoe) But aiight! We finna get these lame BEEP niggaz You see a hoe BEEP nigga, call his BEEP out. Aye! Aye! Stomp his BEEP like (Hoe) Ol' lame BEEP (Hoe) I'ma tell you how it is nigga you betta get the BEEP back cause a nigga like me don't give a BEEP A nigga suppose to gon leave yo BEEP choked You sound like a BEEP yo BEEP I'ma hit we don't give a BEEP cause you is a lame One hitter quitter yo BEEP get popped Back the BEEP up 'fore I show you who reala Whats up wit ya BEEP nigga Ol' sucka BEEP, busta BEEP, cryin to yo momma BEEP I'ma keep up drama I'm a muthaBEEPin plum BEEP See you just a dumb BEEP go on wit yo young BEEP Try me like a sucka but I know you just a lame BEEP In my section they glad to see a nigga that don't give a BEEP Stomp you to the floor and tell you get yo pussy BEEP up Pick that nigga BEEP up, tear his lame BEEP up Niggaz representin Ellenwood time to mBEEP up Throwin blows like Johnny Cage, you think you wanna BEEP wit me Do this BEEP like Pastor Troy Uuh Huh I'm outside hoe Take my BEEPin word I ain't got no reason to lie hoe Beeped words: 34 Crime Mob :
10 Regular Expressions – Motivation Problem: search suspicious text for any Danish email address: @.dk text1 = "No Danish email here firstname.lastname@example.org *@$@.hls.29! fj3a“ text2 = "But here: email@example.com what a *(.@#$ nice @#*.( el ds“ text3 = "And here perhaps? firstname.lastname@example.org@bogus@dk @.dk a@.dk" - Cumbersome using ordinary string methods.
11 Text2 contains this Danish email address: email@example.com RegExp solution (to be explained later)
12 Regular Expressions Instead of searching for a specific string we can search for a text pattern – Don’t have to search explicitly for ‘Monday’, ‘Tuesday’, ‘Wednesday’.. : there is a pattern in these search strings. – A regular expression is a text pattern In Python, regular expression processing capabilities provided by module re
13 Example Simple regular expression: regExp = “football” - matches only the string “football” To search a text for regExp, we can use re.search( regExp, text )
14 Compiling Regular Expressions re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search for this SRE_Pattern in text 3.Result is an SRE_Match object If we need to search for regExp several times, it is more efficient to compile it once and for all: compiledRE = re.compile( regExp) 1.Now compiledRE is an SRE_Pattern object compiledRE.search( text ) 2.Use search method in this SRE_Pattern to search text 3.Result is same SRE_Match object
15 Searching for ‘football’ import re text1 = "Here are the football results: Bosnia - Denmark 0-7" text2 = "We will now give a complete list of python keywords." regularExpression = "football" compiledRE = re.compile( regularExpression) SRE_Match1 = compiledRE.search( text1 ) SRE_Match2 = compiledRE.search( text2 ) if SRE_Match1: print "Text1 contains the substring ‘football’" if SRE_Match2: print "Text2 contains the substring ‘football’" Text1 contains the substring 'football' Compile regular expression and get the SRE_Pattern object Use the same SRE_Pattern object to search both texts and get two SRE_Match objects (or none if the search was unsuccesful)
16 Building more sophisticated patterns Metacharacters: ? : matches zero or one occurrences of the expression it follows + : matches one or more occurrences of the expression it follows * : matches zero or more occurrences of the expression it follows # search for zero or one t, followed by two a’s: regExp1 = “t?aa“ # search for g followed by one or more c’s followed by one a: regExp1 = “gc+a“ #search for ct followed by zero or more g’s followed by one a: regExp1 = “ctg*a“
17 Text contains the regular expression t?aa Text contains the regular expression gc+a Text contains the regular expression ctg*a Use the SRE_Pattern objects to search the text and get SRE_Match objects