Making Decision : if Statement

Making Decision : if Statement
Conditions “in”

String methods

Simple if statement uin = raw_input("Please enter a sentence: ")
if "python" in uin.lower(): print("You mentioned Python.") uin = raw_input("Please enter a sentence: ") if "computer" in uin.lower(): print("You mentioned Computer.") else: print("Didn't see Computer there.")

If statement if (condition 1): suite 1 elif (condition 2): suite 2
else: ... uin = raw_input("Please enter a sentence: ") if “computer" in uin.lower(): print("You mentioned computer.") elif “science" in uin.lower(): print(“You mentioned Science!") elif “department" in uin.lower(): print(“You mentioned Department") else: print(“You did not say anything about computer science department.")

and or

For loop s = raw_input("Enter any string: ") vcount = 0 for c in s:
if c in "aeiouAIEOU": vcount += 1 print("C is ", c, "; Vowel count:", vcount)

break s = raw_input("Please enter a string: ") pos = 0 for c in s:
if c == " ": break pos += 1 print("First space occurred at position", pos)

else with loop Else가 for, while과 같은 indentation에 있으면 for, while이 정상적으로 끝난 경우만 실행되고 break로 끝난 경우는 실행이 안된다. s = raw_input("Please enter a string: ") pos = 0 for c in s: if c == " ": print("First space occurred at position", pos) break pos += 1 else: print("No spaces in that string")

else with loop s = raw_input("Please enter a sentence: ") while True:
pos = 0 for c in s: if c == " ": print(s[:pos]) s = s[pos+1:] break pos += 1 else: print(s)

Terminating a loop while True:
s = raw_input("Enter a line (or Enter to quit): ") if not s: break if s.startswith("#"): continue print("Length", len(s))

Some functions

Sequence container: lists and tuples
List와 tuple은 sequence type : index Tuple은 non-modifiable sequence로 활용

Accessing sequence [ ] 를 이용해서 access

Mutable and immutable sequence
String과 tuple은 immutable List는 mutable

Mutable list del append, remove,…

List - split string = "this is a test" wordlist = string.split()
print(wordlist) uin = raw_input("Enter your string: ") words = uin.strip().split() # no leading or trailing space for word in words: print(word)

enumerate # string enumerate string = "this is a test"
text = """this is a test for the count""" len = len(text) for i, ch in enumerate(text): print(i,ch) # list enumerate lst = ["zero","one","two","three"] for i, e in enumerate(lst): print(i,e) # tuple enumerate tuple1 = (1,2,3) for i, e in enumerate(tuple1): string = "this is a test" wordlist = string.split() for word in wordlist: print(word) for i, ch in enumerate(string): print(i,ch) for i, word in enumerate(wordlist): print(i,word)

Multi line string = """this is a long line of string,
and the last of them""" print('string : ', string) lineLst = string.split('\n') print('lineLst : ', lineLst) wordCnt = len(string) lineCnt = len(lineLst) print('wordCnt = ', wordCnt, ' lineCnt = ', lineCnt) for i, ch in enumerate(string): print(i,ch)

An Example - if Write a program that asks the user to input a string. Verify with tests that the string is all in upper case and ends with a period. If either of these tests fails, print an appropriate message. If both tests succeed, print a message that indicates the string is acceptable. uin = raw_input("Input a sentence = ") if uin.isupper() and uin.endswith('.'): print("Input meets both requirements") elif uin.isupper() and not uin.endswith('.'): print("Input does not end with a period") elif uin.endswith('.') and not uin.isupper(): print("Input is not all upper case") else: print("Input does not meet both requirements")

An Example - while Guess a number target = 63 guess = 0
while guess != target: guess = int(raw_input("Guess an integer: ")) if guess > target: print ("Too high...") elif guess < target: print ("Too low...") else: print ("Just right!")

An Example -split Make the program read a string from the user and create a list of words from the input. Create two lists, one containing the words that contain at least one upper-case letter and one of the words that don't contain any upper-case letters. Use a single for loop to print out the words with upper-case letters in them, followed by the words with no upper-case letters in them, one word per line.

uin = raw_input("Input your text : ")
upper_case = [] lower_case = [] words = uin.split() for word in words: if word.islower() : lower_case.append(word) else: upper_case.append(word) ordered_list = upper_case + lower_case for word in ordered_list : print(word)

Set Similar to lists Use “in”
With larger number of elements, the performance of list becomes poor No-dup in sets {1,2,3…}

{1,2,3} Out[38]: {1, 2, 3} set1 = set('123') print set1 set(['1', '3', '2']) set2 = set('aeiou') print(set2) set(['a', 'i', 'e', 'u', 'o']) set2.add('this') print set2 set(['a', 'e', 'i', 'o', 'this', 'u']) set2.add([1,2,3]) TypeError Traceback (most recent call last) TypeError: unhashable type: 'list' set2.add((1,2,3)) set(['a', 'e', 'i', 'o', (1, 2, 3), 'this', 'u'])

Set – no dup example text = """this is a test for set in python
for showing no duplication test. Will you try this? Your job: do it!""" textLst = text.split() print ('textLst : ', textLst, ' length = ', len(textLst)) textSet = set(textLst) print('textSet : ', textSet, ' lenght = ', len(textSet))

Set – count distinct words in text, replace punc
text = """this is a test for set in python for showing no duplication test. Will you try this? Your job: do it!""" for punc in "!?,:.": text = text.replace(punc,'') textLst = text.split() textSet = set(textLst) print('textSet : ', textSet, ' lenght = ', len(textSet))

Set operation

Set operation example w1 = set(raw_input("Sentence 1: ").lower().split()) w2 = set(raw_input("Sentence 2: ").lower().split()) print("Words in both strings", w1 & w2) print("Unique to sentence 1:", w1 - w2) print("Unique to sentence 2:", w2 - w1)

dict A structure for storing values against arbitrary keys {key:value}
Out[89]: {'ford': 'mustang', 'hyundai': 'sonata', 'toyota': 'camry'} dict['ford'] Out[90]: 'mustang‘ dict['toyota'] = 'lexus' Out[92]: {'ford': 'mustang', 'hyundai': 'sonata', 'toyota': 'lexus'} del dict['toyota'] Out[94]: {'ford': 'mustang', 'hyundai': 'sonata'} dict.keys() Out[95]: ['hyundai', 'ford'] dict.items() Out[96]: [('hyundai', 'sonata'), ('ford', 'mustang')]

Print out dict and check membership
d = {'ford': 'mustang', 'hyundai': 'sonata', 'toyota': 'camry'} for k in d.keys(): print(d[k]) print d.keys() print d.values() print d.items() print 'ford' in d print 'kia' in d

Use of dict : counting number
text = """This is a test for set in python for showing no duplication test. Will you try this? Your job: do it!""" for punc in "!?,:.": text = text.replace(punc,'') count = {} for word in text.lower().split() : if word in count: count[word] +=1 else: count[word] = 1 for word in sorted(count.keys()): print(word, count[word]) for k, v in enumerate(sorted(count.keys())): print(v,k)

dict get() method The method get() returns a value for the given key. If key is not available then returns default value None Example #!/usr/bin/python dict = {'Name': 'Zara', 'Age': 27} print "Value : %s" % dict.get('Age') print "Value : %s" % dict.get('Sex', "Never")

Use of dict & get() : counting number
text = """This is a test for set in python for showing no duplication test. Will you try this? Your job: do it!""" for punc in "!?,:.": text = text.replace(punc,'') count = {} for word in text.lower().split() : # if word in count: # count[word] +=1 # else: # count[word] = 1 count[word] = count.get(word, 0)+1 for word in sorted(count.keys()): print(word, count[word]) for k, v in enumerate(sorted(count.keys())): print(v,k)

An Example – set & dict Use set and dict.
Write a while loop that repeatedly creates a list of words from a line of input from the user. Add each word to the set. If the set increases in size (indicating this word has not been processed before), add the word to the dict as a key with the value being the new length of the set. Display the list of words in the dict along with their value, which represents the order in which they were discovered by the program. If the user presses Enter without any text, print Finished and exit.

words_set = set() dict= {} while True: uin = raw_input("Enter text: ") if uin=="" : print("Finished") break words = uin.split() for word in words: length = len(words_set) words_set.add(word) if len(words_set) > length: dict[word] = len(words_set) for key in dict.keys(): print(key, dict[key])

String format() Produce a formatted value or set of values, use format() method Literal text and replacement fields {replacement fields}.format(…) "{2},{1},and {0}".format('kim','lee','choi','lee') Out[13]: 'choi,lee,and kim' "{who} is a smart {what}".format(what = "student", who='kim') Out[14]: 'kim is a smart student' "I want {1[1]}".format(['kim'],['choi','park'],'lee') Out[15]: 'I want park' d1={'kim':'student','lee':'prof'} d2={'hyundai':'sonata','toyota':'camry'} 'Hyundai car is {1['hyundai']}'.format(d1,d2) File "<ipython-input-18-f6acc81a9124>", line 1 ^ SyntaxError: invalid syntax 'Hyundai car is {1[hyundai]}'.format(d1,d2) Out[19]: 'Hyundai car is sonata'

String format i = 42 r = 31.97 c = 2.2 + 3.3j s = "String"
lst = ["zero", "one", "two", "three", "four", "five"] dct = {"Jim": "Dandy", "Stella": "DuBois", 1: "integer"} while True: fmt = raw_input("Format string: ") if not fmt: break fms = "{"+fmt+"}" print("Format:", fms, "output:", fms.format(i, r, c, s, e=lst, f=dct)) Format string: 0 ('Format:', '{0}', 'output:', '42') Format string: 1 ('Format:', '{1}', 'output:', '31.97') Format string: 2.imag ('Format:', '{2.imag}', 'output:', '3.3') Format string: 3 ('Format:', '{3}', 'output:', 'String') Format string: e[4] ('Format:', '{e[4]}', 'output:', 'four') Format string: f[Stella] ('Format:', '{f[Stella]}', 'output:', 'DuBois')

alignment <:left, >:right, ^:center alignment
data = [ ("Steve", 59, 202), ("Dorothy", 49, 156), ("Simon", 39, 155), ("David", 61, 135)] print(' ') for row in data: print("{0[0]:<12s}{0[1]:4d}{0[2]:4d}".format(row)) print("{0[0]:>12s}{0[1]:<4d}{0[2]:4d}".format(row)) for name, age, weight in data: print("{0:.12s}{1:^4d}{2:^4d}".format(name, age, weight)) data = [("Steve", 59, 202),("Dorothy", 49, 156),("Simon", 39, 155), ("David", 61, 135)] print(' ') for row in data: print("{0[0]:<12s}{0[1]:4d}{0[2]:4d}".format(row)) print("{0[0]:>12s}{0[1]:<4d}{0[2]:4d}".format(row)) for name, age, weight in data: print("{0:<12s}{1:^4d}{2:^4d}".format(name, age, weight))

Loop Again students = ['kim','lee','choi','park']
for i, name in enumerate(students): print('%s, %s' % (i+1, name)) print('{0}, {1}'.format(i+1, name))

join The method join() returns a string in which the string elements of sequence have been joined by str separator. str.join(sequence) Example: str = "-"; seq = ("a", "b", "c"); # This is sequence of strings. print str.join( seq ); a = ' or '.join(['yes','no']) a Out[59]: 'yes or no' type(a) Out[60]: str ' or '.join(('yes','no')) Out[56]: 'yes or no' ' or '.join(['yes','no']) Out[57]: 'yes or no'

join-example valid_inputs = ['yes', 'no', 'maybe']
input_query_string = 'Type %s: ' % ' or '.join(valid_inputs) while True: s = raw_input(input_query_string) if s in valid_inputs: break print("Wrong! Try again.") print(s)

Reading/writing file f = open('test.txt','w')
f.write('this is the frist line\n') f.write('this is the second line\n') names= ['choi','kim','lee'] f.writelines(names) # The writelines() method takes a list of strings and adds each element to the file f.close() this is the frist line this is the second line choikimlee

Read file f = open('test.txt','r') print f.read() print f.readline()
print f.readlines() for i, line in enumerate(f): print(i+1, line)

Append file f = open('test.txt','a') f.write(30*"*")
f.write('this is new line\n') f.write('this is the second line') f.close() this is the frist line this is the second line choikimlee******************************this is new line

Example Write a program that uses a while loop to accept input from the user (if the user presses Enter, exit the program). Save the input to a file, then print it. Upon starting, the program will display any previous content of the file. import os.path fName = 'test.py' while True: if os.path.isfile(fName) : f = open(fName,'r').read() print(f) text = raw_input('Enter text : ') if text : f = open(fName,'a') f.write(text) f.close() else: break

Example : replace, split, strip, loop
Read sentences Split sentences into a list of sentence using split(‘.’) Split a sentence into a list of phrases using split(‘,’) Print out sentences with associated phrases with index.

wiki = '''The Beatles were an English rock band that formed in Liverpool,
in With John Lennon, Paul McCartney, George Harrison, and Ringo Starr, they became widely regarded as the greatest and most influential act of the rock era. Rooted in skiffle, beat, and 1950s rock and roll, the Beatles later experimented with several genres, ranging from pop ballads to psychedelic and hard rock, often incorporating classical elements in innovative ways. In the early 1960s, their enormous popularity first emerged as "Beatlemania", but as their songwriting grew in sophistication they came to be perceived as an embodiment of the ideals shared by the era's sociocultural revolutions.''' sentences = wiki.replace("\n","").split('.') for index, sentence in enumerate(sentences): print('Sentence #' + str(index+1)) phrases = sentence.split(',') for j, phrase in enumerate(phrases): phrase1 = phrase.strip() print('Phrase '+str(j+1)+':'+ phrase1) print(100*"*")

Example : built-in function (chr, ord,reverse(!))
Read a line of input from user Write an encoder New_char = char(ord(char) + 1) Reverse of the constructed string string = raw_input('Message: ') lst = [] for ch in string: newCh = chr(ord(ch)+1) lst.append(newCh) lst.reverse() print(''.join(lst))

function def functionName(Argument): Body return Value def avg(lst):
return sum(lst)/float(len(lst)) avg([1,2,3,4]) # list avg((1,2,3,4)) # tuple avg({1,2,3,4}) # set

lambda function Python uses a lambda notation for anonymous functions
def square(x): return x*x def applier(func, x): return func(x) print applier(square, 7) print applier(lambda x : x * x * x, 2) f = lambda x, y : x+y print f(3,4)

Optional parameter def print_option(lst, reverse = False):
if reverse : lst.reverse() #be carefule not lst = lst.reverse() for i in lst: print(i) print_option(['i', 'am', 'happy']) print("\n") print_option(['i','am','happy'], True)

Multiple returns in python
def sumDiff(x,y): return x+y, x-y # multipel inputs and use of map function num1, num2 = map(int, raw_input("enter two numbers :").split()) sum, diff = sumDiff(num1, num2) print "Sum : ", sum, " Diff : ", diff

Multiple arguments Use *args to denote multiple arguments
def multiplier(*lst): if not lst: return 0 product = lst[0] for num in lst[1:]: product *= num return product print multiplier() print multiplier(1,2,3)

Simple function example
A function with three parameters, the first one is required, and the rest of them have the default values, ‘b was not entered’ and ‘c was not entered’ The function print the value of three parameters def my_func(a, b='b was not entered', c='c was not entered'): print(a) print(b) print(c) my_func('test') my_func('test','test') my_func('test','test','test') print my_func

Some python standard library
import textwrap In [153]: textwrap.wrap(' ',7) Out[153]: [' ', ' ', ' ', ' ', '890'] In [154]: textwrap.wrap('this is a very important class',3) Out[154]: ['thi', 's', 'is', 'a v', 'ery', 'imp', 'ort', 'ant', 'cla', 'ss'] In [155]: import time In [156]: time.time() Out[156]: In [157]: time.gmtime() Out[157]: time.struct_time(tm_year=2014, tm_mon=3, tm_mday=19, tm_hour=15, tm_min=26, tm_sec=0, tm_wday=2, tm_yday=78, tm_isdst=0) In [158]: time.asctime(time.gmtime()) Out[158]: 'Wed Mar 19 15:26: '

Python namespace Global namespace
Function call을 할때마다 new local namespace 가 생성 Bind argument values to parameter names def function(parameter names): Body return function(argument values) When return, destroy local namespaces

Python module A module is a collection of statements to be executed
import modx Python interpreter looks for the modx.py, also its compiled version modx.pyc We can access a function f in modx with modx.f() Suppose we import module y, module y import module z, and module z defines function g() y.z.g()

Import example """moda.py: Imports modb to gain indirect access to the triple function.""" import modb """modb.py: Defines a function that can be used by importing the module.""" def triple(x): """Simple function returns its argument times 3.""" return x*3 """importer.py: imports moda and calls a function from a module moda imports.""" import moda print(moda.modb.triple("Yes! "))

__name__ When a module is imported, the interpreter binds the special name __name__ in the module’s namespace to its name. When the module is run, __name__ receives a special name “__main__” s Out[57]: 'i am a boy. you are a girl.she is a mother.' In [58]: s.split('.') Out[58]: ['i am a boy', ' you are a girl', 'she is a mother', ''] In [59]: s.split('.',1) Out[59]: ['i am a boy', ' you are a girl.she is a mother.'] In [60]: s.split('.',1)[1] Out[60]: ' you are a girl.she is a mother.' def commafy(val): if len(val) < 4: return val out = [] while val: out.append(val[-3:]) val = val[:-3] return ",".join(reversed(out)) def commareal(val): if "." in val: before, after = val.split(".",1) # before, after = {number | list | tuple|..} else: before, after = val, "0" return "{0}.{1}".format(commafy(before), after) # Testing code only ... if __name__ == "__main__": for i in [0, 1, 12, 123, 1234, 12345, , , , , ]: print(i, ":", commafy(str(i)), ":", commareal("{0:.2f}".format(i/1000)))

Other imports import … as import other_module as my
other_module의 function f를 access할 때 my.f from…import from other_module import function Call function

System.path Python 인터프리터가 사용하는 system path을 알고 싶은 경우
인터프리터가 모듈을 챃을 떄 sys.path의 디렉토리 부터 검색 import sys for p in sys.path: print(p)

Functions with keyword arguments
*args : unknown number of arguments **kwargs : dict-parameter Keyword arguments whose names do not correspond to the name of any parameter Unmatched arguments are put into a dictionary def keywords(**kwargs): "Prints the keys and arguments passed through" for key in kwargs: print("{0}: {1} ".format(key, kwargs[key])) def keywords_as_dict(**kwargs): "Returns the keyword arguments as a dict" return kwargs if __name__ == "__main__": keywords(guido="Founder of Python", python="Used by NASA and Google") print(keywords_as_dict(guido="Founder of Python", python="Used by NASA and Google"))

myfunc.py : example def myprint(*students, **books):
"""this is myprint function """ for title, name in books.items(): print("{0:<10}: {1}".format(title.capitalize(),name)) print("{0:*^30}".format(' students ')) for student in students: print(student) if __name__ == "__main__" : myprint("kim","park", publisher = "korean book store", bookname = "python") import func help(func.myprint)

Function execution def add(a,b): return a+b def sub(a,b): return a-b
dict = {'adder':add, 'subtractor':sub} print dict['adder'](3,4) print dict['subtractor'](10,5) print dict

Arguments in Functions
# Extra arguments def foo(x, y, *args): print(x,y,args) def bar(x,y=1, **kwargs): print x,y,kwargs if __name__=="__main__": print("{0:-^30}".format("print foo")) foo(2,3,'hello',4) print("{0:%^30}".format("print bar")) bar(1,2,student1 = 'park', student2='kim') bar(['ssu','kaist'],student1 = 'park', student2='kim',y='test for 2')

import sys def print_prof(professor, *students, **args): print('professor : ' + professor) def print_students(professor, *students, **args): print('students:') for i, student in enumerate(students): print("{0}, {1}".format(i+1, student)) def print_staffs(professor, *students, **args): print('staffs:') for k, v in args.items(): print('{0} : {1}'.format(k,v)) def quit(professor, *students, **args): """Terminates the program.""" print("Quitting the program") sys.exit() if __name__ == "__main__": switch = { 'prof': print_prof, 'students': print_students, 'staffs': print_staffs, 'quit': quit } options = switch.keys() prompt = 'Pick an option from the list ({0}): '.format(', '.join(options)) while True: inp = raw_input(prompt) option = switch.get(inp, None) if option: option('young tack park','student1', ta="wangon lee") print('-' * 40) else: print('Please select a valid option!')

Classes

Namespaces

__init__ and self init When you create an instance of a class by calling it, the interpreter looks to see whether the class has an __init__() method. If it finds __init__(), it calls that method on the newly-created instance. Because it's an instance method call, the new instance is inserted as the first argument to the call. Further, if the call to the class has any arguments, they are passed to __init__() as additional argument Self self must be the first parameter to any object method represents the "implicit parameter" (this in Java)

Simple example import sys class Dog: def __init__(self, name, breed):
self.name = name self.breed = breed dogs = [] while True: name = input('Name: ') if name=="": sys.exit() else: breed = input('Breed: ') dog = Dog(name,breed) dogs.append(dog) print('DOGS') for i, dog in enumerate(dogs): print(i, dog.name,dog.breed) print('*'*30)

Inheritance '''Python Inheritance Parent > Child1
class Parent: race = "asian" status = "married" class Child1(Parent): pass class Child2(Parent): status = "single" print("Child1.status : {0}\nChild2.status : {1}".format(Child1.status, Child2.status)) print("\nLocally defined variable of Child2 :\n{0}".format(Child2.__dict__)) print("\nAll variables defined for Child2 :\n{0}".format(dir(Child2)))

polymorphism Ability for the same code to be used with different types of objects and behave differently with each class Animal: def __init__(self, name): # Constructor of the class self.name = name def talk(self): # Abstract method, defined by convention only raise NotImplementedError("Subclass must implement abstract method") class Cat(Animal): def talk(self): return 'Meow!' class Dog(Animal): return 'Woof! Woof!' animals = [Cat('Missy'), Cat('Mr. Mistoffelees'), Dog('Lassie')] for animal in animals: print animal.name + ': ' + animal.talk() ''' animal = Animal('Mystery Mat') animal.talk()

Overriding When searching for an attribute (including a method), the interpreter first looks in the instance's namespace; next it looks in the namespace of the instance's class; after that it looks in the base classes one by one, raising an AttributeError exception if the attribute is not found. If a class defines a method of the same name as a method of one of its base classes, it is said to override the method of the base class.

'''Overiding --- call superclass, super and __str__ for printing''' class Car(object): def __init__(self, color, cc): self.color = color self.cc = cc def __str__(self): return "Car: Color = {0}, Year = {1}".format(self.color, self.cc) class Hyundai(Car): def __init__(self, color, cc, model): Car.__init__(self, color, cc) self.model = model return "Hyundai Car : Color = {0}, Year = {1}, Model = {2}".\ format(self.color, self.cc, self.model) class Toyota(Car): super(Toyota,self).__init__(color, cc) return "Toyota Car : Color = {0}, Year = {1}, Model = {2}".\ car1 = Car("red",2000) car2 = Hyundai("red",2013,"Genesis") car3 = Toyota("yellow",2013,"Camry") print car1 print car2 print car3 isinstance(car3,Car) isinstance(car3,Toyota)

Exception Handling courses = {221:'python'} try: courses[321]
except KeyError: print('KeyError') 3/0 except ZeroDivisionError: print('zero division error')

A simple example def divide(a, b):
""" Return result of dividing a by b """ print("=" * 20) print("a: ", a, "/ b: ", b) try: return a/b except TypeError: print("Invalid types for division") except ZeroDivisionError: print("Divide by zero") if __name__ == "__main__": print(divide(1, "string")) print(divide(2, 0)) print(divide(123, 4))

raise def divide(a, b): """ Return result of dividing a by b """
Using ‘raise’, exception can be handled by some outer handler, This will cause the same exception to be presented to the outer handlers Also, the exception specification can be followed by an as clause, which specifies a name to bind to the exception that is being handled. def divide(a, b): """ Return result of dividing a by b """ print("=" * 20) print("a: ", a, "/ b: ", b) try: return a/b except (ZeroDivisionError, TypeError): print("Something went wrong!") raise if __name__ == "__main__": for arg1, arg2 in ((1, "string"), (2, 0), (123, 4)): print(divide(arg1, arg2)) except Exception as msg: print("Problem: {0}".format(msg))

An example print('Dividing 10 by an integer') while True:
uin = input('Provide an integer: ') if uin=="": break else: try: div = int(uin) value = 10/div print(value) except ValueError: print('Your input must be an integer') except ZeroDivisionError: print('Your input must not be zero(0)')

Lecture11 – coordinate.py
import math def sq(x): return x*x class Coordinate(object): def __init__(self, x, y): self.x = x self.y = y def __str__(self): return "<"+str(self.x)+","+str(self.y)+">" def distance(self,other): return math.sqrt(sq(self.x - other.x) + sq(self.y - other.y)) c = Coordinate(3,4) Origin = Coordinate(0,0)

. class intSet(object): """An intSet is a set of integers
The value is represented by a list of ints, self.vals. Each int in the set occurs in self.vals exactly once.""" def __init__(self): """Create an empty set of integers""" self.vals = [] def insert(self, e): """Assumes e is an integer and inserts e into self""" if not e in self.vals: self.vals.append(e) def member(self, e): """Assumes e is an integer Returns True if e is in self, and False otherwise""" return e in self.vals def remove(self, e): """Assumes e is an integer and removes e from self Raises ValueError if e is not in self""" try: self.vals.remove(e) except: raise ValueError(str(e) + ' not found') def __str__(self): """Returns a string representation of self""" self.vals.sort() return '{' + ','.join([str(e) for e in self.vals]) + '}' # s = intSet() # print s # s.insert(3) # s.insert(4) # s.member(3) # s.member(5) # s.insert(6) # s.remove(3)

def intersect(self, other) :
res = intSet() for e in self.vals: if other.member(e): res.insert(e) return res def __len__(self): return len(self.vals)

class Queue(object): def __init__(self): self.vals = [] def insert(self, e): self.vals.append(e) def remove(self): if self.vals == []: raise ValueError('ValueError') else: pop = self.vals[0] self.vals.remove(pop) return pop

import datetime class Person(object): def __init__(self, name): """create a person called name""" self.name = name self.birthday = None self.lastName = name.split(' ')[-1] def getLastName(self): """return self's last name""" return self.lastName def setBirthday(self,month,day,year): """sets self's birthday to birthDate""" self.birthday = datetime.date(year,month,day) def getAge(self): """returns self's current age in days""" if self.birthday == None: raise ValueError return (datetime.date.today() - self.birthday).days def __lt__(self, other): """return True if self's ame is lexicographically less than other's name, and False otherwise""" if self.lastName == other.lastName: return self.name < other.name return self.lastName < other.lastName def __str__(self): """return self's name""" return self.name #me = Person("William Eric Grimson") # print me #foo = 'William Eric Grimson' # foo.split(' ') # foo.split(' ')[-1] # me.getLastName() # # me.setBirthday(1,2,1927) # me.getAge() # her = Person("Cher") # her.getLastName() # plist = [me, her] # for p in plist: print p # plist.sort()

class MITPerson(Person):
nextIdNum = 0 # next ID number to assign def __init__(self, name): Person.__init__(self, name) # initialize Person attributes # new MITPerson attribute: a unique ID number self.idNum = MITPerson.nextIdNum MITPerson.nextIdNum += 1 def getIdNum(self): return self.idNum # sorting MIT people uses their ID number, not name! def __lt__(self, other): return self.idNum < other.idNum p1 = MITPerson('Eric') p2 = MITPerson('John') p3 = MITPerson('John') p4 = Person('John') # print p1 # p1.getIdNum() # p2.getIdNum() # p1 < p2 # p3 < p2 # p4 < p1 # p1 < p4

Multiple Inheritance Python supports multiple inheritance: class DerivedClassName(B1, B2, B3): <statement-1> <statement-N>

Multiple Inheritance Resolving conflicts: (Given the expression object.attribute, find definition of attribute) Depth-first, left-to-right search of the base classes. “Depth-first”: start at a leaf node in the inheritance hierarchy (the object); search the object first, its class next, superclasses in L-R order as listed in the definition.

Illustration of Single & Multiple Inheritance
B1 .a .b B2 .x B3 .y .b SUPERCLASSES B4 .x .q DERIVED CLASS I0 .x I1 I2 INSTANCES I0.a and I0.b are defined in B1; I0.x is defined in I0 I1.a, I1.b and I2.a, I2.b are defined in B1; I1.x & I2x are defined in B4 I1.y or I2.y are defined in B3, and so forth.

List clone a = [1,2,3] b = a c = a[:]

Some tasks on gradeReport
gradeReport(six00) print gradeReport(six00) six00.allStudents() for s in six00.allStudents(): print s For s in six00.allStudents(): grades = six00.getGrades(s) print(“{0} : {1}”.format(s, grades))

Generating Random Numbers
Programs often must generate random numbers to simulate events that are often based on probability You can import the random module into your Python program Use the randrange() function (method) to generate random numbers in a given range. Not really a “true” random number generator…it is a pseudo-random number generator September 7, 2004 ICP: Chapter 3: Control Structures

Use the import statement to include a module Files that contain functions that can be used in any program Import statements are usually at the top of your Python program Example import random September 7, 2004 ICP: Chapter 3: Control Structures

The randrange() function will return a random integer in the range [start..end) From the value start up to but not including end. Example anydigit = random.randrange(10) [0..10) -> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 die = random.randrange(6) + 1 [1..7) = [1..6] -> 1, 2, 3, 4, 5, 6 September 7, 2004 ICP: Chapter 3: Control Structures

import string def buildCoder(shift): """ Returns a dict that can apply a Caesar cipher to a letter. The cipher is defined by the shift value. Ignores non-letter characters like punctuation, numbers, and spaces. shift: 0 <= int < 26 returns: dict letters = string.ascii_letters dict = {} for ch in letters: encryptedCh = convert(ch, shift) dict[ch] = encryptedCh return dict def convert(ch, shift): if ch.islower() : val = ord('z') base = ord('a') elif ch.isupper(): val = ord('Z') base = ord('A') ords = ord(ch) + shift if (ords/val and ords % val) : # need shift return chr(base + (ords % val) - 1) else: return chr(ords)

def applyCoder(test, coder):
""" Applies the coder to the text. Returns the encoded text. text: string coder: dict with mappings of characters to shifted characters returns: text after mapping coder chars to original text >>> applyCoder("Hello, world!", buildCoder(3)) 'Khoor, zruog!' >>> applyCoder("Khoor, zruog!", buildCoder(23)) 'Hello, world!' encryptedList = [] dict = coder stringToignore = string.punctuation + ' ' + string.digits + '\n' for ch in test: if not ch in stringToignore: encryptedList.append(dict[ch]) else: encryptedList.append(ch) return ''.join(encryptedList)

def applyShift(text, shift):
""" Given a text, returns a new text Caesar shifted by the given shift offset. Lower case letters should remain lower case, upper case letters should remain upper case, and all other punctuation should stay as it is. text: string to apply the shift to shift: amount to shift the text (0 <= int < 26) returns: text after being shifted by specified amount. return applyCoder(text, buildCoder(shift))

def findBestShift(wordList, text):
""" Finds a shift key that can decrypt the encoded text. text: string returns: 0 <= int < 26 ### TODO numValidWord = 0 bestShift = 0 for i in range(1,26): sftedSentence = applyShift(text, i) num = validWord(wordList,sftedSentence) if num > numValidWord: numValidWord = num bestShift = i return bestShift def validWord(wordList, sftedSentence): num = 0 list = sftedSentence.split(' ') for word in list: if isWord(wordList, word): num +=1 return num

def decryptStory(): """ Using the methods you created in this problem set, decrypt the story given by the function getStoryString(). Use the functions getStoryString and loadWords to get the raw data you need. returns: string - story in plain text ### TODO. story = getStoryString() wordList = loadWords() sftNumber = findBestShift(wordList,story) return applyShift(story, sftNumber) def loadWords(): print("Loading word list from file...") inFile = open(WORDLIST_FILENAME, 'r') wordList = inFile.read().split() print(" ", len(wordList), "words loaded.") return wordList def isWord(wordList, word): """ Example: >>> isWord(wordList, 'bat') returns True >>> isWord(wordList, 'asdf') returns False word = word.lower() word = word.strip(" return word in wordList def getStoryString(): return open("C:\Users\Park\Downloads\story.txt", "r").read()

class binaryTree(object): def __init__(self, value):
n5 = binaryTree(5) n2 = binaryTree(2) n1 = binaryTree(1) n4 = binaryTree(4) n8 = binaryTree(8) n6 = binaryTree(6) n7 = binaryTree(7) n3 = binaryTree(3) n5.setLeftBranch(n2) n2.setParent(n5) n5.setRightBranch(n8) n8.setParent(n5) n2.setLeftBranch(n1) n1.setParent(n2) n2.setRightBranch(n4) n4.setParent(n2) n8.setLeftBranch(n6) n6.setParent(n8) n6.setRightBranch(n7) n7.setParent(n6) n4.setLeftBranch(n3) n3.setParent(n4) print n5.getValue() print n5.getLeftBranch().getValue() print n5.getRightBranch().getValue() class binaryTree(object): def __init__(self, value): self.value = value self.leftBranch = None self.rightBranch = None self.parent = None def setLeftBranch(self, node): self.leftBranch = node def setRightBranch(self, node): self.rightBranch = node def setParent(self, parent): self.parent = parent def getValue(self): return self.value def getLeftBranch(self): return self.leftBranch def getRightBranch(self): return self.rightBranch def getParent(self): return self.parent def __str__(self):

def DFSBinary(root, fcn):
queue = [root] while len(queue) > 0: print 'at node ' + str(queue[0].getValue()) if fcn(queue[0]): return True else: temp = queue.pop(0) if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) return False def find6(node): return node.getValue() == 6 # test examples print 'DFS' DFSBinary(n5, find6)

def BFSBinary(root, fcn):
queue = [root] while len(queue) > 0: print 'at node ' + str(queue[0].getValue()) if fcn(queue[0]): return True else: temp = queue.pop(0) if temp.getLeftBranch(): queue.append(temp.getLeftBranch()) if temp.getRightBranch(): queue.append(temp.getRightBranch()) return False print '' print 'BFS' BFSBinary(n5, find6)

def lt6(node): return node.getValue() > 6 def DFSBinaryOrdered(root, fcn, ltFcn): queue = [root] while len(queue) > 0: print 'at node ' + str(queue[0].getValue()) if fcn(queue[0]): return True elif ltFcn(queue[0]): temp = queue.pop(0) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) else: if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) return False DFSBinaryOrdered(n5, find6, lt6)

Building decision tree
def buildDTree(sofar, todo): if len(todo) == 0: return binaryTree(sofar) else: withelt = buildDTree(sofar + [todo[0]], todo[1:]) withoutelt = buildDTree(sofar, todo[1:]) here = binaryTree(sofar) here.setLeftBranch(withelt) here.setRightBranch(withoutelt) return here tree = buildDTree([],['a','b']) print tree.getValue() print tree.getLeftBranch().getValue() print tree.getLeftBranch().getLeftBranch().getValue()

def DFSBinaryPath(root, fcn):
queue = [root] while len(queue) > 0: if fcn(queue[0]): return TracePath(queue[0]) else: temp = queue.pop(0) if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) return False def TracePath(node): if not node.getParent(): return [node] return [node] + TracePath(node.getParent()) print'' print 'DFS path' pathTo6 = DFSBinaryPath(n5, find6) print [e.getValue() for e in pathTo6]

Decision Tree def buildDTree(sofar, todo): if len(todo) == 0:
return binaryTree(sofar) else: withelt = buildDTree(sofar + [todo[0]], todo[1:]) withoutelt = buildDTree(sofar, todo[1:]) here = binaryTree(sofar) here.setLeftBranch(withelt) here.setRightBranch(withoutelt) return here a = [6,3] b = [7,2] c = [8,4] d = [9,5] treeTest = buildDTree([], [a,b,c,d]) print treeTest.getValue() print treeTest.getLeftBranch().getValue() print treeTest.getRightBranch().getValue() print treeTest.getLeftBranch().getLeftBranch().getValue() print treeTest.getLeftBranch().getRightBranch().getValue() Decision Tree lst = [(3,'a'),(4,'c'),(5,'b')] def test(lst): firsts = [e[0] for e in lst] seconds = [e[1] for e in lst] print sum(firsts) print ''.join(seconds) test(lst)

Searching a DFS decision tree
Queue=[ [[6,3],[7,2]], [[6,3]] ] Queue=[ [[6,3],[7,2],[8,4]] [[6,3],[7,2]], [[6,3]] ] Best = [[6,3],[7,2],[9,5] Value : total value of a node Constraint : maximum space allowed (6,3) [] (7,2) Value=13 Space=5 (6,3), (7,2) (6,3) [] Value=13 Space=9 (8,4) [] (8,4) (6,3), (7,2), (8,4) (6,3), (7,2) (6,3),(8,4) (6,3) [] [] (9,5) X [] [] (9,5) (9,5) X (9,5) (6,3), (7,2), (8,4) (6,3), (7,2),(9,5) (6,3), (7,2) (6,3),(8,4) (6,3),(9,5) (6,3)

def DFSDTree(root, valueFcn, constraintFcn):
queue = [root] best = None visited = 0 while len(queue) > 0: visited += 1 if constraintFcn(queue[0].getValue()): if best == None: best = queue[0] print best.getValue() elif valueFcn(queue[0].getValue()) > valueFcn(best.getValue()): print 'old best node', best.getValue() print 'queue[0] node', best.getValue() print temp = queue.pop(0) if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) else: queue.pop(0) print 'visited', visited return best def sumValues(lst): vals = [e[0] for e in lst] return sum(vals) def sumWeights(lst): wts = [e[1] for e in lst] return sum(wts) def WeightsBelow10(lst): return sumWeights(lst) <= 10 a = [6,3] b = [7,2] c = [8,4] d = [9,5] treeTest = buildDTree([], [a,b,c,d]) print '' print 'DFS decision tree' foobar = DFSDTree(treeTest, sumValues, WeightsBelow10) print foobar.getValue()

def DFSDTree(root, valueFcn, constraintFcn):
queue = [root] best = None visited = 0 while len(queue) > 0: visited += 1 if constraintFcn(queue[0].getValue()): if best == None: best = queue[0] print best.getValue() elif valueFcn(queue[0].getValue()) > valueFcn(best.getValue()): print 'old best node', best.getValue() print 'queue[0] node', best.getValue() print else: print '{0} has been discarded'.format(queue[0].getValue()) temp = queue.pop(0) if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) queue.pop(0) print 'visited', visited return best

def DFSDTreeGoodEnough(root, valueFcn, constraintFcn, stopFcn):
stack = [root] best = None visited = 0 while len(stack) > 0: visited += 1 if constraintFcn(stack[0].getValue()): if best == None: best = stack[0] print best.getValue() elif valueFcn(stack[0].getValue()) > valueFcn(best.getValue()): if stopFcn(best.getValue()): print 'visited', visited return best temp = stack.pop(0) if temp.getRightBranch(): stack.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): stack.insert(0, temp.getLeftBranch()) else: stack.pop(0) Approximate Search def atLeast15(lst): return sumValues(lst) >= 15 print '' print 'DFS decision tree good enough' foobar = DFSDTreeGoodEnough(treeTest, sumValues, WeightsBelow10, atLeast15) print foobar.getValue() print 'BFS decision tree good enough' foobarnew = BFSDTreeGoodEnough(treeTest, sumValues, WeightsBelow10, print foobarnew.getValue()

Tuple again To write a tuple containing a single value you have to include a comma, even though there is only one value: tup1 = (50,) tup1 + (30,)

Implicit Search toConsider : search할 노드들 - toConsider[0] : node
def DTImplicit(toConsider, avail): if toConsider == [] or avail == 0: result = (0, ()) elif toConsider[0][1] > avail: result = DTImplicit(toConsider[1:], avail) else: nextItem = toConsider[0] withVal, withToTake = DTImplicit(toConsider[1:], avail - nextItem[1]) withVal += nextItem[0] withoutVal, withoutToTake = DTImplicit(toConsider[1:], avail) if withVal > withoutVal: result = (withVal, withToTake + (nextItem,)) result = (withoutVal, withoutToTake) return result stuff = [a,b,c,d] val, taken = DTImplicit(stuff, 10) print '' print 'implicit decision search' print 'value of stuff' print val print 'actual stuff' print taken toConsider : search할 노드들 - toConsider[0] : node - toConsider[0][0] : value - toConsider[0][1] : space Avail : available space toConsider==[] - no more to search Avail == 0 - no more space Result = (Value, Tuples of Node)

def DTImplicit(toConsider, avail):
if toConsider == [] or avail == 0: result = (0, ()) elif toConsider[0][1] > avail: result = DTImplicit(toConsider[1:], avail) else: print 'toConsider = ', toConsider nextItem = toConsider[0] withVal, withToTake = DTImplicit(toConsider[1:], avail - nextItem[1]) withVal += nextItem[0] withoutVal, withoutToTake = DTImplicit(toConsider[1:], avail) if withVal > withoutVal: result = (withVal, withToTake + (nextItem,)) print 'Result withVal = {0}'.format(result) result = (withoutVal, withoutToTake) print 'Result withoutVal = {0}'.format(result) return result

Acyclic def DFSBinaryNoLoop(root, fcn): queue = [root] seen = []
while len(queue) > 0: print 'at node ' + str(queue[0].getValue()) if fcn(queue[0]): return True else: temp = queue.pop(0) seen.append(temp) if temp.getRightBranch(): if not temp.getRightBranch() in seen: queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): if not temp.getLeftBranch() in seen: queue.insert(0, temp.getLeftBranch()) return False Acyclic n3.setLeftBranch(n5) n5.setParent(n3) DFSBinary(n5, find6) DFSBinaryNoLoop(n5, find6)

2. treeTest = buildTree([], [A1,A2,A3])를 수행한 경우에
(a) 가장 먼저 return 되는 binaryTree의 getValue()하면 어떤 값이 나올까요? [A1,A2,A3] (b) 가장 먼저 만들어 지는 이진 트리를 그려 보시오 (c) return의 getValue()의 값은 무었일까요? [] def buildDTree(sofar, todo): if len(todo) == 0: return binaryTree(sofar) else: withelt = buildDTree(sofar + [todo[0]], todo[1:]) withoutelt = buildDTree(sofar, todo[1:]) here = binaryTree(sofar) here.setLeftBranch(withelt) here.setRightBranch(withoutelt) return here

3. 아래 코드에서 valueFcn(Lst)는 Lst element의 첫 번째 value 부분을 더하는 함수이고, constraintFcn(Lst)는 Lst element의 두 번째 값을 더해서 10 이하인 것을 테스트하는 함수이다. 이때 a=[6,3],b=[7,2],c=[8,4],d=[9,5]이고 treeTest = buildTree([],[a,b,c,d], foobar = DFSDTree(treeTest, sumValues, WeightsBelow10)이 실행된다고 가정을 하자 (a) queue의 첫 번째 element.getValue()하면 무슨 값이 나올까요? [] (b) 아래 프로그램에서 첫 번째 수행되는 constraintFcn의 argument는 무었이고 constraintFcn의 값은 얼마인가? [], 0, true (c) queue에 root가 pop 된후에 처음으로 들어가는 element와 두 번째로 들어가는 element들의 element.getValue()하면 무슨 값이 나올까요? (힌트 : insert 되는 순서대로) [], [[6,3]] def DFSDTree(root, valueFcn, constraintFcn): queue = [root] best = None visited = 0 while len(queue) > 0: visited += 1 if constraintFcn(queue[0].getValue()): if best == None: best = queue[0] elif valueFcn(queue[0].getValue()) > valueFcn(best.getValue()): temp = queue.pop(0) if temp.getRightBranch(): queue.insert(0, temp.getRightBranch()) if temp.getLeftBranch(): queue.insert(0, temp.getLeftBranch()) else: queue.pop(0) return best

4. (a) 아래 프로그램에서 첫 번째 nextItem의 값은 무었인가? [6,3]
(b) 아래 프로그램ㅇ[서 첫 번째 call하는 DTImplicit(Arg1, Arg2)의 값은 무었일까요? [b,c,d], 7 (c 두 번째 nextItem의 값은 무었일까요? [7,2] (d) 두번째 call하는 DTImplicit(Arg1, Arg2)의 값은 무었일까요? [c,d], 5 def DTImplicit(toConsider, avail): if toConsider == [] or avail == 0: result = (0, ()) elif toConsider[0][1] > avail: result = DTImplicit(toConsider[1:], avail) else: nextItem = toConsider[0] withVal, withToTake = DTImplicit(toConsider[1:], avail - nextItem[1]) withVal += nextItem[0] withoutVal, withoutToTake = DTImplicit(toConsider[1:], avail) if withVal > withoutVal: result = (withVal, withToTake + (nextItem,)) result = (withoutVal, withoutToTake) return result stuff = [a,b,c,d] val, taken = DTImplicit(stuff, 10)

assert assert 1==1 assert 1==2
AssertionError Traceback (most recent call last) <ipython-input-7-a174714cc486> in <module>() ----> 1 assert 1==2 AssertionError: assert 1==2, "Error" <ipython-input c723d325> in <module>() ----> 1 assert 1==2, "Error" AssertionError: Error

Unit Testing in Python Kent Beck developed one of the first tools for the unit testing of classes (for Smalltalk) Beck and Erich Gamma wrote junit and pyunit A unit test consists of a set of test cases for a given class Each test case is a method that runs tests on an individual method in the class The tests take the form of assertions A test suite can be run whenever a change is made to the class under development

preliminary

unittest

Example from student import Student class Student(object):
import unittest import sys class TestStudent(unittest.TestCase): def setUp(self): self.student = Student("John",30) def testGetName(self): self.assertEquals("Jo", self.student.getName()) def testGetScore(self): self.assertEquals(40, self.student.getScore()) if __name__ == "__main__": suite = unittest.TestSuite() suite.addTest(unittest.makeSuite(TestStudent)) unittest.TextTestRunner(verbosity=2, stream=sys.stdout).run(suite) class Student(object): def __init__(self, name, score): self.name = name self.score = score def getName(self): return self.name def getScore(self): return self.score

assertEqual et al assertTrue(x, msg) assertFalse(x, msg)
assertEqual(a,b, msg) …

class NewsStory(object):
def __init__(self, guid, title, subject, summary, link): self.guid = guid self.title = title self.subject = subject self.summary = summary self.link = link def getGuid(self): return self.guid def getTitle(self): return self.title def getSubject(self): return self.subject def getSummary(self): return self.summary def getLink(self): return self.link def __str__(self): return 'id = '+ self.getGuid()+' title = '+ self.getTitle()+ ' subject = '+ self.getSubject() + \ ' summary = ' + self.getSummary()

class Trigger(object):
def evaluate(self, story): raise NotImplementedError # TODO: WordTrigger class WordTrigger(Trigger): def __init__(self, word): self.word = word #modified def isWordIn(self, word, text, caseSensitive='no'): for i in string.punctuation: text = text.replace(i,' ') if isinstance(word,list): word = ' '.join(word) if caseSensitive == 'no' : word = word.lower() text = text.lower() wordList = word.split() textList = text.split() for word in wordList: if word in textList: continue else: return False return True word = WordTrigger('dummy') word.isWordIn('korea', 'this is a test for korea articles') Out[9]: True

class TitleTrigger(WordTrigger): def __init__(self, title):
self.title = title def evaluate(self, story): return self.isWordIn(self.title, story.title) def getTitle(self): return self.title def __str__(self): return 'TitleTrigger('+ ''.join(self.getTitle()) + ')' story = NewsStory('id101', 'my title', 'my subject', 'this is my summary', ' print story id = id101 title = my title subject = my subject summary = this is my summary titleTrigger = TitleTrigger('title') titleTrigger.evaluate(story) Out[18]: True

class SubjectTrigger(WordTrigger): def __init__(self, subject):
self.subject = subject def evaluate(self, story): return self.isWordIn(self.subject, story.subject,caseSensitive='no') def getSubject(self): return self.subject def __str__(self): return 'SubjectTrigger('+ ''.join(self.getSubject()) +')' story = NewsStory('id101', 'my title', 'my subject', 'this is my summary', ' print story id = id101 title = my title subject = my subject summary = this is my summary subjectTrigger = SubjectTrigger('subject') subjectTrigger.evaluate(story) Out[23]: True

class SummaryTrigger(WordTrigger): def __init__(self, summary):
self.summary = summary def evaluate(self, story): return self.isWordIn(self.summary, story.summary) def getSummary(self): return self.summary def __str__(self): return 'SummaryTrigger('+ ''.join(self.getSummary()) +')' story = NewsStory('id101', 'my title', 'my subject', 'this is my summary', ' summaryT = SummaryTrigger('my summary') summaryT.evaluate(story) Out[4]: True

class NotTrigger(): def __init__(self, other): self.other = other def evaluate(self, story): return not self.other.evaluate(story)

class AndTrigger(): def __init__(self, other1, *other2): if other2 == () and len(other1) == 2: self.other1 = other1[0] self.other2 = other1[1] else: self.other1 = other1 self.other2 = other2[0] def evaluate(self, story): return self.other1.evaluate(story) and self.other2.evaluate(story)

class PhraseTrigger():
def __init__(self, phrase): self.phrase = phrase def evaluate(self, story): for e in [story.getTitle(), story.getSubject(), story.getSummary()]: if isinstance(self.phrase, list) and len(self.phrase )==1: self.phrase = self.phrase[0] if self.phrase in e: return True else: continue return False def getPhrase(self): return self.phrase def __str__(self): return 'PhraseTrigger('+ ''.join(self.getPhrase()) +')'

phraseT=PhraseTrigger('New York City')
story = NewsStory('1',"In the heart of New York City's famous cafe",'subject','summary','linl') phraseT.evaluate(story) Out[19]: True story = NewsStory('1',"In the heart of New York City's famous cafe",'subject','summary','linl') Out[21]: False story = NewsStory('1',"In the heart of new York City's famous cafe",'subject','summary','linl') Out[23]: False

Write a function, filterStories(stories, triggerlist) that takes in a list of news stories
and a list of triggers, and returns a list of only the stories for which any of the triggers fires on. The list of stories should be unique - that is, do not include any duplicates in the list. For example, if 2 triggers fire on StoryA, only include StoryA in the list one time def filterStories(stories, triggerlist): filteredStories = [] copyStories = stories[:] # since stories are muted for trigger in triggerlist: for story in copyStories: if trigger.evaluate(story): if story in stories: stories.remove(story) if not story in filteredStories: filteredStories.append(story) return filteredStories

# trigger file # subject trigger named t1 t1 SUBJECT Top # title trigger named t2 t2 TITLE Thai # phrase trigger named t3 t3 PHRASE Twitter t4 AND t1 t2 ADD t4 ADD t3 t5 SUBJECT Top t6 TITLE Vietnam t7 AND t5 t6 ADD t7 t8 TITLE Miley ADD t8

# subject trigger named t1 t1 SUBJECT Top # title trigger named t2
def readTriggerConfig(filename): triggerfile = open(filename, "r") all = [ line.rstrip() for line in triggerfile.readlines() ] lines = [] for line in all: if len(line) == 0 or line[0] == '#': continue lines.append(line) triggers = [] triggerMap = {} for line in lines: linesplit = line.split(" ") # Making a new trigger if linesplit[0] != "ADD": trigger = makeTrigger(triggerMap, linesplit[1], linesplit[2:], linesplit[0]) # Add the triggers to the list else: for name in linesplit[1:]: triggers.append(triggerMap[name]) return triggers # subject trigger named t1 t1 SUBJECT Top # title trigger named t2 t2 TITLE Thai

def makeTrigger(triggerMap, triggerType, params, name):
""" Takes in a map of names to trigger instance, the type of trigger to make, and the list of parameters to the constructor, and adds a new trigger to the trigger map dictionary. triggerMap: dictionary with names as keys (strings) and triggers as values triggerType: string indicating the type of trigger to make (ex: "TITLE") params: list of strings with the inputs to the trigger constructor (ex: ["world"]) name: a string representing the name of the new trigger (ex: "t1") Modifies triggerMap, adding a new key-value pair for this trigger. Returns a new instance of a trigger (ex: TitleTrigger, AndTrigger).

def makeTrigger(triggerMap, triggerType, params, name):
triggerDict = {'TITLE':TitleTrigger, 'SUBJECT': SubjectTrigger, \ 'SUMMARY':SummaryTrigger, 'PHRASE':PhraseTrigger, \ 'AND': AndTrigger, 'OR':OrTrigger, 'NOT': NotTrigger} if triggerType == 'AND' or triggerType == 'OR': parameters = () for param in params: parameters = parameters + (triggerMap[param],) trigger = triggerDict[triggerType](parameters) triggerMap[name] = trigger return trigger trigger = triggerDict[triggerType](params)

def main_thread(master):
triggerlist = readTriggerConfig("C:/Users/Park/Prob7/triggers.txt“) print '\n*****************trigger list:\n' for trigger in triggerlist: print 'trigger = ', trigger # Get stories from Google's Top Stories RSS news feed stories = process(" print '\n*********** Google Top Stories : ' for i, story in enumerate(stories): print "\nGoogle Top Stoty[{0}] = {1}".format(i,story) # Get stories from Yahoo's Top Stories RSS news feed stories.extend(process(" #print '\n*********** Extended Top Stories : ' #for i, story in enumerate(stories): # print "\nExtended Top Stoty[{0}] = {1}".format(i,story)

class ProblemSet7NewsStory(unittest.TestCase):
def setUp(self): pass def testNewsStoryConstructor(self): story = NewsStory('', '', '', '', '') def testNewsStoryGetGuid(self): story = NewsStory('test guid', 'test title', 'test subject', 'test summary', 'test link') self.assertEquals(story.getGuid(), 'test guid') def testNewsStoryGetTitle(self): self.assertEquals(story.getTitle(), 'test title') ……. if __name__ == "__main__": suite = unittest.TestSuite() suite.addTest(unittest.makeSuite(ProblemSet7NewsStory)) suite.addTest(unittest.makeSuite(ProblemSet7)) unittest.TextTestRunner(verbosity=2, stream=sys.stdout).run(suite)

Regular expression A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r'expression‘ Matching vs Searching: Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string search checks for a match anywhere in the string (this is what Perl does by default).

match re.match(pattern, string, flags=0)
pattern This is the regular expression to be matched. string This is the string, which would be searched to match the pattern at the beginning of string. \w : ( ) : group import re m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") print 'group(0) : ', m.group(0) # The entire match print 'group(1) : ', m.group(1) # The first parenthesized subgroup. print 'group(2) : ', m.group(2) # The second parenthesized subgroup. print 'group(1,2) : ', m.group(1, 2) # Multiple arguments give us a tuple. print 'groups(): ', m.groups() print 'start : ', m.start() print 'end : ', m.end() print 'span : ', m.span()

warmup match = re.search(pat, str)
The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise. example which searches for the pattern 'word:' followed by a 3 letter word import re str = 'an example word:cat!!' match = re.search(r'word:\w\w\w', str) # If-statement after search() tests if it succeeded if match: print 'found', match.group() ## 'found word:cat' else: print 'did not find'

Basic patterns a, X, 9, ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below) . (a period) matches any single character except newline '\n' \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character. \b boundary between word and non-word \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character. \t, \n, \r -- tab, newline, return \d decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s) ^ = start, $ = end match the start or end of the string \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as you can put a slash in front of it, to make sure it is treated just as a character.

Basic pattern examples
Search for pattern 'iii' in string 'piiig'. All of the pattern must match, but it may appear anywhere. On success, match.group() is matched text. import re match = re.search(r'iii', 'piiig') #=> found, match.group() == "iii" print match.group() match = re.search(r'igs', 'piiig') #=> not found, match == None print match ## . = any char but \n match = re.search(r'..g', 'piiig') #=> found, match.group() == "iig" ## \d = digit char, \w = word char match = re.search(r'\d\d\d', 'p123g') #=> found, match.group() == "123" match = re.search(r'\w\w\w', #=> found, match.group() == "abc"

repetition Things get more interesting when you use + and * to specify repetition in the pattern + 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's * 0 or more occurrences of the pattern to its left ? match 0 or 1 occurrences of the pattern to its left Leftmost & Largest First the search finds the leftmost match for the pattern Second it tries to use up as much of the string as possible -- i.e. + and * go as far as possible (the + and * are said to be "greedy").

import re ## i+ = one or more i's, as many as possible. match = re.search(r'pi+', 'piiig') #=> found, match.group() == "piii" print match.group() ## Finds the first/leftmost solution, and within it drives the + ## as far as possible (aka 'leftmost and largest'). ## In this example, note that it does not get to the second set of i's. match = re.search(r'i+', 'piigiiii') #=> found, match.group() == "ii" ## \s* = zero or more whitespace chars ## Here look for 3 digits, possibly separated by whitespace. match = re.search(r'\d\s*\d\s*\d', 'xx xx') #=> found, match.group() == " " match = re.search(r'\d\s*\d\s*\d', 'xx12 3xx') #=> found, match.group() == "12 3" match = re.search(r'\d\s*\d\s*\d', 'xx123xx') #=> found, match.group() == "123" ## ^ = matches the start of string, so this fails: match = re.search(r'^b\w+', 'foobar') #=> not found, match == None print match ## but without the ^ it succeeds: match = re.search(r'b\w+', 'foobar') #=> found, match.group() == "bar"

example Suppose you want to find the address inside the string 'xyz purple monkey’ Here's an attempt using the pattern The search does not get the whole address in this case because the \w does not match the '-' or '.' in the address import re str = 'purple monkey dishwasher' match = str) if match: print match.group() ##

Square brackets in email
Square brackets can be used to indicate a set of chars, so [abc] matches 'a' or 'b' or 'c‘ The codes \w, \s etc. work inside square brackets too with the one exception that dot (.) just means a literal dot. For the s problem, the square brackets are an easy way to add '.' and '-' to the set of chars which can appear around with the pattern to get the whole address: match = str) if match: print match.group() ##

More on brackets You can also use a dash to indicate a range, so [a-z] matches all lowercase letters To use a dash without indicating a range, put the dash last, e.g. [abc-]. An up-hat (^) at the start of a square-bracket set inverts it, so [^ab] means any char except 'a' or 'b'.

Group extraction The "group" feature of a regular expression allows you to pick out parts of the matching text. Suppose for the s problem that we want to extract the username and host separately. To do this, add parenthesis ( ) around the username and host in the pattern, like this: In this case, the parenthesis do not change what the pattern will match, instead they establish logical "groups" inside of the match text. On a successful search, match.group(1) is the match text corresponding to the 1st left parenthesis, and match.group(2) is the text corresponding to the 2nd left parenthesis. The plain match.group() is still the whole match text as usual

import re str = 'purple monkey dishwasher' match = str) if match: print match.group() ## (the whole match) print match.group(1) ## 'alice-b' (the username, group 1) print match.group(2) ## 'google.com' (the host, group 2)

findall findall() is probably the single most powerful function in the re module Above we used re.search() to find the first match for a pattern findall() finds *all* the matches and returns them as a list of strings, with each string representing one match import re ## Suppose we have a text with many addresses str = 'purple blah monkey blah dishwasher' ## Here re.findall() returns a list of all the found strings s = str) ## for in s: # do something with each found string print

findall and group() The parenthesis ( ) group mechanism can be combined with findall(). If the pattern includes 2 or more parenthesis groups, then instead of returning a list of strings findall() returns a list of *tuples*. Each tuple represents one match of the pattern, and inside the tuple is the group(1), group(2) .. data. So if 2 parenthesis groups are added to the pattern, then findall() returns a list of tuples, each length 2 containing the username and host, e.g. ('alice', 'google.com'). import re ## Suppose we have a text with many addresses str = 'purple blah monkey blah dishwasher' tuples = str) print tuples ## [('alice', 'google.com'), ('bob', 'abc.com')] for tuple in tuples: print tuple[0] ## username print tuple[1] ## host

re.finditer Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. import re string = "Once you have accomplished small things, you may attempt great ones safely.” # Return all words beginning with character 'a', as an iterator yielding match objects it = re.finditer(r"\ba[\w]*", string) for match in it: print "'{g}' was found between the indices {s}".format(g=match.group(), s=match.span()) # 'accomplished' was found between the indices (14, 26) # 'attempt' was found between the indices (49, 56)

import re m = re.match(r"(\w+) (\w+), (\w+)", "Isaac Newton, physicist") print 'group(0) : ', m.group(0) # The entire match print 'group() : ', m.group() # same as above print 'group(1) : ', m.group(1) # The first parenthesized subgroup. print 'group(2) : ', m.group(2) # The second parenthesized subgroup. print 'group(3) : ', m.group(3) # The third parenthesized subgroup print 'groups(): ', m.groups() import re m = re.match(r"\w+ \w+, \w+", "Isaac Newton, physicist") print 'group(0) : ', m.group(0) # The entire match #print 'group(1) : ', m.group(1) # The first parenthesized subgroup. #print 'group(2) : ', m.group(2) # The second parenthesized subgroup. #print 'group(3) : ', m.group(3) # The third parenthesized subgroup print 'groups(): ', m.groups()

(?P<name>...) syntax
If the regular expression uses the (?P<name>...) syntax, the groupN arguments may also be strings identifying groups by their group name import re m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+), (?P<major>\w+)", "Isaac Newton, physicist") print 'group(0) : ', m.group(0) # The entire match print 'group(1) : ', m.group(1) # The first parenthesized subgroup. print 'group(2) : ', m.group(2) # The second parenthesized subgroup. print 'group(3) : ', m.group(3) # The third parenthesized subgroup print 'groups(): ', m.groups() print '\n' print 'first_name : ', m.group('first_name') print 'last_name : ', m.group('last_name') print 'major : ', m.group('major') print 'groupdict() : ', m.groupdict()

If a group matches multiple times, only the last match is accessible:
import re m = re.match(r'(..)+', 'a1b2c3') print 'group(0) : ', m.group(0) print 'group(1) : ', m.group(1) #print 'group(2) : ', m.group(2)

import re m = re.match(r'(\d+)(\.)(\d+)'," ") # no blanks print 'm.group(0) : ', m.group(0) print 'm.group(1) : ', m.group(1) print 'm.group(2) : ', m.group(2) print 'm.group(3) : ', m.group(3) print 'm.groups() : ', m.groups()

search This function searches for first occurrence of RE pattern within string with optional flags re.search(pattern, string, flags=0) import re m = re.match(r'\d+',"hello123extra") print 'what is m? ', m m1 = re.search(r'\d+',"hello123extra") print 'what is m1? ', m1.group(0) print m1.groups() m2 = re.search(r'(\d+)',"hello123extra456") print 'what is m2? ', m2.group(0) print m2.groups() m3 = re.search(r'(\d)+',"hello123extra456") print 'what is m3? ', m3.group(0) print m3.groups()

match vs. search import re line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line) if matchObj: print "match --> matchObj.group() : ", matchObj.group() else: print "No match!!" searchObj = re.search( r'dogs', line) if searchObj: print "search --> searchObj.group() : ", searchObj.group() print "Nothing found!!"

match() import re text = ' is phone number' match = re.match(r'\d\d\d-\d\d\d-\d\d\d\d',text) print 'group : ', match.group() print 'start : ', match.start() print 'end : ', match.end() print 'span : ', match.span() The match object returned from an re.match() call always has a start() value of 0 and the span() method also always returns 0 as the first element of the tuple. This is because, as noted earlier, the match() function only returns patterns found at the start of a string.

search() import re text = """While I was at the store in Washington, DC I tried to call on my mobile but accidentally called The person on the line redirected me to which I don't think is a real number. Neither is or Well, I will try (555) again now.""" match = re.search(r'\d\d\d-\d\d\d-\d\d\d\d',text) print 'group : ', match.group() print 'start : ', match.start() print 'end : ', match.end() print 'span : ', match.span()

Python’s Regular Expression Syntax
Most characters match themselves The regular expression “test” matches the string ‘test’, and only that string [^x] matches any one character that is not included in x “[^abc]” matches any single character except ‘a’,’b’,or ‘c’

[] [x] matches any one of a list of characters
“[abc]” matches ‘a’,‘b’,or ‘c’ this is the same as [a-c], which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your RE would be [a-z]

Python’s Regular Expression Syntax
x|y matches x or y “this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.

. (a period) “.” matches any single character
Parentheses can be used for grouping “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc. import re m1 = re.search(r'....', '###this is a test ') print 'm1.group(0) : ', m1.group(0) print 'm1.groups() : ', m1.groups() print m2 = re.search(r'(....)', '###this is a test ') print 'm2.group(0) : ', m2.group(0) print 'm2.groups() : ', m2.groups() print 'm2.group(1) : ', m2.group(1) m3 = re.findall(r'(....)', '###this is a test ') print 'm3 : ', m3

( ) import re m1 = re.search(r'(abc)', 'this is a test forabc test and abc') print 'm1.group(0) : ', m1.group(0) print 'm1.groups() : ', m1.groups() print m2 = re.search(r'(....)', '###this is a test ') print 'm2.group(0) : ', m2.group(0) print 'm2.groups() : ', m2.groups() print 'm2.group(1) : ', m2.group(1) m3 = re.search(r'(....)+', '###this is a test ') print 'm3.group(0) : ', m3.group(0) print 'm3.groups() : ', m3.groups() print 'm3.group(1) : ', m3.group(1) # return last match m4 = re.findall(r'(....)', '###this is a test ') print 'm4 : ', m4 m5 = re.findall(r'(....)+', '###this is a test ') print 'm5 : ', m # return last match

Python’sRegular Expression Syntax
x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc. x+ matches one or more x’s “a+” matches ’a’,’aa’,’aaa’, etc. x? matches zero or one x’s “a?” matches ’’ or ’a’ x{m, n} matches i x‘s, where m<i< n “a{2,3}” matches ’aa’ or ’aaa’

Regular Expression Syntax
“\d” matches any digit; “\D” any non-digit “\s” matches any whitespace character; “\S” any non-whitespace character “\w” matches any alphanumeric character; “\W” any non-alphanumeric character “^” matches the beginning of the string; “$” the end of the string “\b” matches a word boundary; “\B” matches a character that is not a word boundary

re.split Split the source string by the occurrences of the pattern,returning a list containing the resulting substrings. import re #':', '.', ' ' 문자를 구분자로 사용 print re.split(r'[:. ]+', 'apple Orange:banana tomato') # 패턴에 괄호를 사용하면 해당 분리 문자도 결과 문자열에 포함 print re.split(r'([:. ])+', 'apple Orange:banana tomato') # maxsplit 이 입력된 경우 print re.split(r'[:. ]+', 'apple Orange:banana tomato', 2)

replace re.sub(pattern, repl, string, max=0)
This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method would return modified string. import re phone = " # This is Phone Number" # Delete Python-style comments num = re.sub(r'#.*$', "", phone) print "Phone Num : ", num # Remove anything other than digits num = re.sub(r'\D', "", phone)

import re print re.sub('1', r'\n', ' ')

Empty match Empty matches for the pattern are replaced only when not adjacent to previous match import re m = re.sub('x*','-',’abc') print m m1 = re.sub('a*','-',’abc') print m1 m2 = re.sub('b*','-','abc') print m2

subn() import re text = ' is phone number. (555) is phone number too' match = re.subn(r'\d\d\d-\d\d\d\d','xxx-xxxx', text) print 'group : ', match What if we want to know how many substitutions occurred? Then we can use the subn() function, which returns a two-element tuple containing the result string and the number of substitutions.

search vs findall import re
m1 = re.search(r'[ ]+','this is 1346 a test 34444string') print m1.group(0) m2 = re.findall(r'[ ]+','this is 1346 a test 34444string') print m2 import re m1 = re.search('(-{1,2})','prog----gram-files') print 'm1.group(0) : ', m1.group(0) print 'm1.group(1) : ', m1.group(1) #print 'm1.group(2) : ', m1.group(2) m2 = re.findall('(-{1,2})','prog----gram-files') print m2

Use a function to supply the replacement string
import re def dashrepl(matchobj): if matchobj.group(0) == "--": return "+" else: return "*" m = re.sub('-{1,2}',dashrepl, 'prog----gram-files') print m

More examples import re print re.sub(r'@(=+=)*@','xxx','@@')
print # strange? print print print print re.sub('[aeiouAEIOU]','-','I am the happiest man in the world.')

To revise import re # matches one or more decimal digits
m1 = re.search(r'([ ])+', '1234') # If a group is contained in a part of the pattern # that matched multiple times, the last match is returned. print m1.group(0) print m1.group(1) m2 = re.search(r'([ ])+', '1234') print m2.group(0) print m2.group(1) # matches two words separated by any number of spaces m3 = re.search(r'([\w])+ +([\w])+','this is a test') print 'm3 = ', m3.group(0) print 'm3.group(1) : ', m3.group(1) print 'm3.groups() : ', m3.groups() #print m3 m4 = re.search(r'(\w)+ + (\w)+','this is a test') print 'm4 = ', m4.group(0) m5 = re.search(r'(\w)+ + (\w)+','this is a test') print 'm5 = ', m5.group(0) print 'm5.group(1) : ', m5.group(1) print 'm5.groups() : '. m5.groups()

HADOOP 실행 커맨드 $ hadoop

HADOOP 실행 커맨드 $ hadoop fs

ls 명령어 $ hadoop fs –ls /

Sample Data 압축 해제 $ cd ~/training_materials/developer/data
$ tar zxvf shakespeare.tar.gz

Local에서 HDFS로 data 복사 $ hadoop fs -put shakespeare /user/training/shakespeare

HDFS Architecture

HDFS에 디렉토리 생성 $ hadoop fs –mkdir weblog

파이프를 통해 stdout의 출력을 HDFS에 저장(stdin)
$ gunzip -c access_log.gz | hadoop fs -put - weblog/access_log

파이프를 통해 stdout의 출력을 HDFS에 저장(stdin)
$ hadoop fs –mkdir testlog $ gunzip -c access_log.gz | head -n 5000 | hadoop fs –put - testlog/test_access_log

HDFS의 파일 삭제 $ hadoop fs -ls shakespeare
$ hadoop fs –rm shakespeare/glossary

HDFS의 파일 내용 확인 $ hadoop fs -cat shakespeare/histories | tail -n 50

HDFS에서 Local로 파일 복사 $ hadoop fs -get shakespeare/poems ~/shakepoems.txt

MapReduce의 Job Flow

HDFS상의 실행 결과 확인 $hadoop fs –ls wordcounts
$hadoop fs –cat wordcounts/part-r-0000 | tail –n 20

NameNode의 웹 인터페이스 http://localhost:50070

Filesystem Browsing을 통해 output 확인(1)

Filesystem Browsing을 통해 output 확인(2)

Some MapReduce Terminology
Job – A “full program” - an execution of a Mapper and Reducer across a data set Task – An execution of a Mapper or a Reducer on a slice of data a.k.a. Task-In-Progress (TIP) Task Attempt – A particular instance of an attempt to execute a task on a machine

MapReduce: High Level In our case: circe.rc.usf.edu

Nodes, Trackers, Tasks Master node runs JobTracker instance, which accepts Job requests from clients TaskTracker instances run on slave nodes TaskTracker forks separate Java process for task instances

Jobtracker의 웹 인터페이스(1) http://localhost:50030

Jobtracker의 웹 인터페이스(2) http://localhost:50030

Jobtracker의 웹 인터페이스에서 jobid를 통해 job status 확인(1)

Jobtracker의 웹 인터페이스에서 jobid를 통해 job status 확인(2)

Python execution mode You can run python programs from files, just like perl or shell scripts, by typing “python program.py” at the command line. The file can contain just the python commands. Or, one can invoke the program directly by typing the name of the file, “program.py”, if it has as a first line something like “#!/usr/bin/python” (like a shell script... works as long as the file has execute permissions set)‏

wordcount mapper #!/usr/bin/python import sys
# input comes from STDIN (standard input) for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() # split the line into words words = line.split() # increase counters for word in words: # write the results to STDOUT (standard output); # what we output here will be the input for the # Reduce step, i.e. the input for reducer.py # # tab-delimited; the trivial word count is 1 print '{0}\t{1}'.format(word, 1)

wordcount reducer #!/usr/bin/python import sys word2count = {}
# input comes from STDIN for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) # convert count (currently a string) to int try: count = int(count) except ValueError: continue word2count[word] = word2count[word]+count except: word2count[word] = count for word, count in sorted(word2count.items()): print '{0}\t{1}'.format( word, count )

Hadoop streaming hs mapper.py reducer.py inDir outDir
~]$ more .bashrc # .bashrc run_mapreduce() { hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming mr1-cdh4.1.1.jar -mapper $1 -reducer $2 -file $1 -file $2 -input $3 -output $4 } alias hs=run_mapreduce hs mapper.py reducer.py inDir outDir

Hadoop Streaming using Python
지능형 시스템 박 영 택

Now is definitely the time
Streaming Example #1 Read any text input and computes the average length of all words that start each character. Example Text input Output Now is definitely the time N 3 d 10 i 2 t 3.5

Streaming Example Mapper
The Mapper receives a line of text for each input value. For each word in the line, emit the first letter of the word as a key, and the length of the word as a value. Example Input value Mapper should emit Now is definitely the time N 3 d 10 i 2 t 3 t 4

Example Mapper code The Mapper receives a line of text for each input value. For each word in the line, emit the first letter of the word as a key, and the length of the word as a value. Now is definitely the time #!/usr/bin/python import re import sys NONALPHA = re.compile("\W") for input in sys.stdin.readlines(): for w in NONALPHA.split(input): if len(w) > 0: print "{0}\t{1}".format(w[0].lower(), str(len(w)))

Regular expression re.compile(pattern, flag=0) Compile a regular expression pattern, returning a pattern object. \W : any non-alphanumeric character Example input NONALPHA is Non-alphanumeric pattern object NONALPHA.split(input) return [“Now”, “is”, “definitely”, “the”, “time”] Now is definitely the time Non-alphanumeric pattern NONALPHA = re.compile("\W") NONALPHA.split(input)

Regular expression example
import re text = "Now is definitely the time" nonAlpha = re.compile("\W") print "return re.compile object :", re.findall(nonAlpha, text) result = nonAlpha.split(text) print "nonAlpha split :", result

Streaming Example Reducer
The Reducer receives the keys in sorted order, and all the values for one key appear together. So, for the Mapper output previous, the Reducer would receives following For either type of input, the final output should be N 3 d 10 i 2 t 3 t 4 N 3 d 10 i 2 t 3.5

Example Reducer code N 3 d 10 i 2 t 3 t 4 #!/usr/bin/python import sys
wordcount = 0.0 lettercount = 0 key = None for input in sys.stdin.readlines(): input = input.rstrip() parts = input.split("\t") if len(parts) < 2: continue newkey=parts[0] wordlen=int(parts[1]) if not key: key = newkey if key != newkey: print "{0}\t{1}".format(key, str(lettercount / wordcount)) key = newkey; wordcount = 0.0 lettercount = 0 wordcount = wordcount + 1.0 lettercount = lettercount + wordlen if key != None:

Running code Run Streaming Example Result
$hadoop-streaming.jar  /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming mr1-cdh4.2.1.jar Result Hadoop fs –cat mrstream/output/part* $ hadoop jar $hadoop-streaming.jar - input $inputData/input.txt - output output - mapper mapper.py - reducer reducer.py N 3 d 10 i 2 t 3.5

Streaming Example #2 Use New York Stock Exchange records of stock prices from We will write a streaming application program that prints the unique list of ticker symbols available in the data. Example Input Data output exchange Stock symbol date Stock price open high low close volume Adj close NYSE AEA 4.42 4.21 4.24 205500 4.54 4.22 4.41 194300 4.55 4.69 4.39 233800 4.65 4.5 182100 4.74 5 4.62 4.66 222700 4.84 4.92 4.68 4.75 194800 … AEA YSI YUM YZC …

Example Mapper code For the reduce function we will use the shell utility uniq, which when provided sorted output returns a unique set of value Example( mapper code) parse.py #!/usr/bin/python import sys while 1: line = sys.stdin.readline() if line == "": break fields = line.split(",") print "{0}".format(fields[1])

Hadoop Streaming Usage Hadoop Streaming
$ hadoop jar $hadoop-streaming.jar [options] - input <path> : HDFS input file for the Map step - output <path> : HDFS output file for the Reduce step - mapper <cmd | javaClassName> : the streaming command to run - reducer <cmd | javaClassName> : the streaming command to run - file <file> : file/dir to be shipped in the job jar file - combiner <JavaClassName> Combiner has to be a Java class …

Hadoop Streaming Unix command … $ /usr/bin/uniq [options]
-c, --count : 같은 라인이 몇 번 나오는지 표시 -D, --all-repeated : 중복되는 모든 라인을 표시 -u, unique : 중복 라인이 없는 것만 표시 --help :도움말 표시 --version : 버전을 표시 …

$hadoop-streaming.jar  /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming mr1-cdh4.2.1.jar Result Hadoop fs –cat mrstream/output/part* $ hadoop jar $hadoop-streaming.jar -input $inputData/nyse -output output -mapper parse.py -reducer /usr/bin/uniq -file parse.py AEA YSI YUM YZC …

Streaming Example #3 Inverted Index ( with wordcount)
Text input – a news article Output doc1 WASHINGTON United States Special Operations troops are forming elite counterterrorism units in four countries in North and West Africa … operations doc1:1 over doc1:1 doc1 doc1:1 month doc1:1 four doc1:1 fighters doc1:1 .

Example Mapper code #!/usr/bin/env python from sys import stdin
import re doc_id = None for line in stdin: if not line.strip(): continue if not doc_id: doc_id, content = line.split('\t') words = re.findall(r'\w+', line) for word in words: print "{0}\t{1}:1".format(word.lower(), doc_id)

Example Reducer code #!/usr/bin/env python from sys import stdin
import re index = {} for line in stdin: word, postings = line.split('\t') index.setdefault(word, {}) for posting in postings.split(','): doc_id, count = posting.split(':') count = int(count) index[word].setdefault(doc_id, 0) index[word][doc_id] += count for word in index: postings_list = ["%s:%d" % (doc_id, index[word][doc_id]) for doc_id in index[word]] postings = ','.join(postings_list) print "{0}\t{1}".format(word, postings)

Hadoop fs –cat output/part* $hs mapper.py reducer.py input output ( input path : input/doc1.txt ) . officer doc1:2 haram doc1:1 night doc1:3 served doc1:1 because doc1:1 some doc1:4

Streaming Example #4 Find tatal sale per store Example
Text input(purchases.txt) output date time store item cost payment 09:00 San Jose Men's Clothing 214.05 Amex Fort Worth Women's Clothing 153.57 Visa San Diego Music 66.08 Cash Pittsburgh Pet Supplies 493.51 Discover Omaha Children's Clothing 235.63 MasterCard Stockton 247.18 … store cost Fort Worth 153.57 San Diego San Jose Stockton Omaha Pittsburgh …

Example Mapper code #!/usr/bin/python # Format of each line is:
# date\ttime\tstore name\titem description\tcost\tmethod of payment # # We want elements 2 (store name) and 4 (cost) # We need to write them out to standard output, separated by a tab import sys for line in sys.stdin: data = line.strip().split("\t") if len(data) == 6: date, time, store, item, cost, payment = data print "{0}\t{1}".format(store, cost)

Example Reducer code #!/usr/bin/python import sys salesTotal = 0
oldKey = None # Loop around the data # It will be in the format key\tval # Where key is the store name, val is the sale amount # # All the sales for a particular store will be presented, # then the key will change and we'll be dealing with the next store for line in sys.stdin: data_mapped = line.strip().split("\t") if len(data_mapped) != 2: # Something has gone wrong. Skip this line. continue thisKey, thisSale = data_mapped thisKey, thisSale = data_mapped if oldKey and oldKey != thisKey: print oldKey, "\t", salesTotal oldKey = thisKey; salesTotal = 0 oldKey = thisKey salesTotal += float(thisSale) if oldKey != None: print "{0}\t{1}".format(oldKey, salesTotal)

Making Decision : if Statement

Similar presentations

Presentation on theme: "Making Decision : if Statement"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Making Decision : if Statement

Similar presentations

Presentation on theme: "Making Decision : if Statement"— Presentation transcript:

Similar presentations

About project

Feedback