Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 20151 Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.

Similar presentations


Presentation on theme: "Fall 20151 Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles."— Presentation transcript:

1 Fall 20151 Week 4 CSCI-141 Scott C. Johnson

2  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles on Hurricane Katrina as part of the tenth anniversary of this disaster. They would have to search for the words “hurricane and “Katrina”

3  How do we represent text such as: ◦ Words ◦ Sentences ◦ Characters  In programming we call these sequences Strings. ◦ A sequence is a collection or composition of elements in a specific order ◦ For strings, the elements are characters  This includes:  Punctuation  Spaces  Numeric characters ◦ The order is a spelling of a word or structure of a sentence.  Not always true.

4  A string can be: ◦ Empty ◦ Non-empty  Must start with a character and be followed by a string  Definition 1 A string is one of the following: ◦ An empty string ◦ A non-empty string, which has the following parts:  A head, which is a single character, followed by,  A tail, which is a string

5  When processing strings we must be able to check for: ◦ Empty string  def strIsEmpty(str) ◦ The head  def strHead(str) ◦ The tail  def strTail(str) ◦ To construct strings from other strings  def strConcat(str1, str2) ◦ We will implemeant these later….

6  A Python string is: ◦ A sequence of characters ◦ The sequence can be explicitly written between two quotation marks  Python allows single or double quotes ◦ Examples:  ‘Hello World’  “Hello World”  ‘’ (the empty string is a valid string)  ‘a’  “abc”

7  Like numbers, string can be assigned to variables ◦ str = ‘Hello World’  You can asscess parts of strings via indexing ◦ str[n] means give me the n th -1 character of the string str ◦ str[0] == ‘H’ ◦ str[1] == ‘e’ ◦ str[2] == ‘l’ ◦ str[5] == ‘ ‘

8  Another way to access parts of a string is via slicing ◦ str[m:n] means give me the part of the string from character m up to but not including n ◦ Both m and n are optional  If m is omitted then it starts from the beginning of the string  If n is omitted it ends at the end of the string ◦ Examples:  Str = ‘Hello World’  Str[1:4] == ‘ell’  Str[:5] == ‘Hello’  Str[1:] == ‘ello World’

9  You can concatenate string, or put two string together ◦ The plus-sign (+)  Examples ◦ ‘Hello’ + ‘World’ == ‘Hello World’ ◦ ‘a’ + ‘ab’ == ‘aab’ ◦ ‘ab’ + ‘c’ == ‘abc’ ◦ ‘a’ + ‘b’ + ‘c’ == ‘abc’

10  Python strings are immutable ◦ Means parts of strings cannot be changed using the assignment operator ◦ If we assign a new value to a string it replaces the old value  Basically a new string  Example: ◦ Str = ‘abc’ ◦ Str[0] == ‘a’ ◦ Str[0] = ‘z’  We get: ◦ “Traceback (most recent call last): File " ", line 1, in TypeError: ’str’ object does not support item assignment“

11  def strIsEmpty(str): if str == ‘’: return True else: return False  def strHead(str): return str[:1]  def strTail(str): return str[1:]  def strConcat(str1, str2): return str1 + str2

12  Computing length of strings ◦ Python has a function to do this ◦ But we will write our own version so we can:  Learn about strings  And how to process them  Think about this: lengthRec(‘abc’) == 3

13  Lets break down string length as a recursive function ◦ ‘’: the empty string, length is 0 ◦ ‘a’: string with one character, length = 1  Head length = 1  tail length is empty string, length = 0 ◦ ‘ab’: string with two character, length = 2  Head length = 1  tail is string with one character, length = 1 ◦ ‘abc’: string with three characters. Length = 3  Head length = 1  tail is a string with two characters, length = 2  Notice the pattern????

14  The pattern leads us to: def lengthRec(str): if str == ‘’: return 0 else: return 1 + lengthRec(strTail(str))

15  Computing the reversal of a string ◦ reverseRec(‘abc’) == ‘cba’  Lets solve for this like any other recursive function

16  String reversal cases: ◦ ‘’: emptystring reversed is the empty string ◦ ‘a’: a single character reversed is the same string ◦ ‘ab’: two character string  ‘b’ + ‘a’ == ‘ba’  strTail(‘ab’) + strHead(‘ab’) ◦ ‘abc’: three character string  ‘c’ + ‘b’ + ‘a’ == ‘cba’  strTail(strTail(‘abc’)) + strHead(strTail(‘abc’)) + strHead(‘abc’)  strTail(‘bc’) + strHead(‘bc’) + strHead(‘abc’)  Reversal of string ‘bc’ + strHead(‘abc’)

17  From this we get: def reverseRec(str): if str == ‘’: return ‘’ else: return reverseRec(strTail(str) + strHead(str)

18

19

20  Accumulative Length ◦ Recall the recursive form we did earlier:  def lengthRec(str): if str == ‘’: return 0 else: return 1 + lengthRec(strTail(str)) ◦ How can we change this to use an accumulator variable??  Notice we are returning zero for the empty case and 1 + the recursive call for the other case

21  Accumulative Length ◦ We can add the accumulator variable and add 1 for the recursive call and return it for the empty string:  def lengthAccum(str, ac): if str == ‘’: return ac else: return lengthAccum(strTail(str), ac + 1)  Basically:  if we have a head, add one to the acummulator and check the tail  Else return the accumulator variable since we have nothing else to count

22  Accumulative Reverse ◦ Recall the recursive form we did earlier:  def reverseRec(str): if str == ‘’: return ‘’ else: return reverseRec(strTail(str) + strHead(str) ◦ How can we change this to use an accumulator variable??  Notice we are returning the empty string for the empty case and the recursive call + the strHead for the other case

23  Accumulative Reverse ◦ We can add the accumulator variable and add the strHead to the accumulator for each recursive call:  def reverseAccum(str, ac): if str == ‘’: return ac else: return reverseAccum(strTail(str), strHead(str) +ac)

24  We often can replace recursion with iteration ◦ This requires the use of a new type of Python statement ◦ The for loop

25  for loop example ◦ for ch in ‘abc’: print(ch) ◦ This will print: a b c

26  With the for loop when can convert our recursive string operations to iterative forms ◦ Recall the accumulative form: def lengthAccum(str, ac): if str == ‘’: return ac else: return lengthAccum(strTail(str), ac + 1) ◦ basically we ran the function over and over again adding one to ac until we hit the empty string… How could we make that into a for loop???

27  With a for loop we can avoid recursion def lenghtIter(str): ac = 0 for ch in str: ac = 1 + ac return ac

28  This can work for reverse too! ◦ def reverseAccum(str, ac): if str == ‘’: return ac else: return reverseAccum(strTail(str), strHead(str) +ac)  Becomes: ◦ def reverseIter(str): ac = ‘’ for ch in str: ac = hd + ac return ac

29  Some times we want to access parts of a string by index ◦ This means iterating over the range of values 0, 1, 2, …, len(str) -1  len(str) is a built in Python function for string length ◦ To do this we have a special for loop in Python  for i in range(0, len(str))  This says for all character in str starting at index 0 to the last index of the string, do x  It does not have to be the whole string  for I in range(2, 5)… this will do all i’s 2, 3, 4

30  example

31  Example:

32  Say we do not want to type in a long string every time… ◦ For instance we want to find and remove all of the instances of a word in a report…  How can we do that without using an input statement and entering the entire text manually?

33  Python can read files! ◦ Lets look at a basic function to hide a word in a string ◦ def hide(textFileString, hiddenWord): for currentWord in textFileString: if currentWord == hiddenWord: print(‘---’) else: print currentWord

34  How do we do this from a text file and not a string? ◦ To make this problem simple we will assume only one word per line in the file  Do the file reads, the spaces are shown on purpose as _: word1 __word2 ___word3 __word4 _word5 word6  word7

35  How can we read this file? ◦ It is actually pretty easy in Python ◦ for line in open(‘text1.txt’): print(line) ◦ This give us: word1 ◦ __word2 ◦ ___word3 ◦ __word4 ◦ _word5 ◦ word6 ◦ word7 Notice the spaces are still there and it appears to have more space between lines!

36  The extra space between lines is due to: ◦ The print function adds a new line ◦ The original new line from the file is still in the string ◦ If we were to make a single concatenated string we would see the original file contents ◦ str = ‘’ for line in open(‘text1.txt’): str = str + line print(str)  word1 __word2 ___word3 __word4 _word5 word6  word7

37  We can make printing better ◦ print(str, end=‘’)  Print will not generate newlines ◦ We still have an issue for our problem  Newlines are still there from the file ◦ for line in open(‘text1.txt’): print(line == ‘word1’) ◦ We would see false for all line  Even thought word1 looks like it exists  Due to the new line from the file!

38  We can use a Python feature called strip ◦ This removes all whitespace from a string  Whitespace:  Newlines  Spaces ◦ We call it a bit different that other string functions  str.strip()  It returns the ‘stripped’ string

39  Using strip we get  for line in open(‘text1.txt’): print(line.strip() == ‘word1’)  Which results in: True False False False False False False False

40  Using all these ideas we can make the hide and a helper function

41  We orginally made this function to hide a word… ◦ But it can be used to find a word too ◦ We look for, or search for, the word to replace it ◦ We look for all occurrences of the word…  Sometimes we only want to find the first time a word happens

42  Linear Search ◦ The process of visiting each element in a sequence in order stopping when either:  we find what we are looking for  We have visited every element, and not found a match ◦ Often we wish to characterize how long an algorithm takes  Measuring in seconds often depends on the hardware, operating system, etc ◦ We want to avoid such details  How do we do this?

43  We avoid this by focusing on the size of the input, N… ◦ And characterize the time spend as a function of N ◦ This function is called the time complexity  One way to measure performance is to count the number of operations or statements the code makes

44  For the hide function we: ◦ Strip the current word ◦ Compare it to the hidden word ◦ And then prints the results  Since this is a loop, this occurs for each word ◦ Even thought the operations can take different times individually, as the number of words grow this time difference are very small  They are considered a constant amount

45  For a linear search we can see that different size of N can lead to different behaviors and times… ◦ Say the element we are looking for is the first element… it’s very fast…. ◦ What if it is the last element?  We must then look at every element  If n is 10 then the slow case is no problem..  What if it was 10 billion?

46  Typically we are interested in how bad it can get… ◦ Or the Worst-case analysis  Consider the worst-case analysis of the linear search… ◦ For each element we spend some constant time ◦ Plus some fixed time to startup and end the search ◦ If processing of an element takes constant time k  Then search all N elements should take k * N  Similarly the startup and end time is some constant c  So the time to run the linear search is (k * N) + c

47  So the time to run the linear search is (k * N) + c ◦ What are k and c?  Most of the time we simply do not care…. ◦ It is often good enough to know that the time complexity is a linear function with a non-zero slope!  The,mathematical way to ignore this is to say the time complexity of linear search is O(N)  It is pronounced “Order N”  This is known as “Big-O” notation

48  “Big-O” notation ◦ Makes it easy to compare algorithms… ◦ Constant time, like comparing two characters, O(1) ◦ There are some algorithms that are O(N 2 ) and O(N 3 ) ◦ We prefer O(1) to O(N), O(N) to O(N 2 )… ◦ There are possibly time complexities in between these…

49  O(N 2 ) example: def counter(n): for i in range(0,n): for j in range(0, n): print(i * j)  O(N 3 ) example: def counter(n): for i in range(0,n): for j in range(0, n): for k in range(0,n): print(i * j * k)


Download ppt "Fall 20151 Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles."

Similar presentations


Ads by Google