Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting set with Python and NLTK Tuples, Strings, Numeric types.

Similar presentations


Presentation on theme: "Getting set with Python and NLTK Tuples, Strings, Numeric types."— Presentation transcript:

1 Getting set with Python and NLTK Tuples, Strings, Numeric types

2 Python By now, you should have Python available on a computer you can use. Download nltk: http://nltk.org/install.html http://nltk.org/install.html From python (type python from the command line or use your favorite IDE) – import nltk – nltk.download() This opens a window with choices of what to download. Choose “book” 2

3 Getting started from nltk.book import * Let’s do some of the basic operations of the nltk – Create a concordance from text1 of “monstrous” – Repeat for another word in another text – Find words similar to “monstrous” in several texts – Determine the length of some of the texts – Get a sorted list of the words in one of the texts. How many distinct words are in the text? 3

4 Defining a function 4 >>> def lexical_diversity(text):... len(text) / len(set(text)) Note the indentation. Python does not use brackets to indicate boundaries of blocks of code. The indentation is necessary. Do a “lexical_diversity” test on one of the other texts.

5 Lists: Review and some operators not mentioned before List is a mutable collection of objects of arbitrary type. – Create a list: – places = list() or places = [] – places = [“home”, “work”, “hotel”] – otherplaces=[‘home’,’office’,’restaurant’] – Changing a list: places.append(‘restaurant’) places.insert(0,’stadium’) places.remove(‘work’) places.extend(otherplaces) places.pop() places.pop(3) places[1]=“beach” places.sort() places.reverse() Note use of single or double quotes Note ues of () or [] 5 Not a complete set – selected by text authors

6 Information about lists Again, the list of places len(places) places[i] --- positive or negative values “beach” in places places.count(“home”) places.index(“stadium”) places.index(‘home’,0,4) places == otherplaces places != otherplaces places < otherplaces places.index[‘home’] places.index[‘home’,2] -- start looking at spot 2 6

7 New lists from old lists – places[0,3] – places[1,4,2] – places + otherplaces note places + “pub” vs places +[‘pub’] – places * 2 Creating a list – range(5,100,25) -- how many entries 7

8 Immutable objects Lists are mutable. – Operations that can change a list – Name some – Two important types of objects are not mutable: str and tuple – tuple is like a list, but is not mutable A fixed sequence of arbitrary objects Defined with () instead of [] – grades = (“A”, “A-”, “B+”,”B”,”B-”,”C+”,”C”) – str (string) is a fixed sequence of characters Operations on lists that do not change the list can be applied to tuple and to str also Operations that make changes must create a new copy of the structure to hold the changed version 8

9 Strings Strings are specified using quotes – single or double – name1 = “Ella Lane” – name2= ‘Tom Riley’ If the string contains a quotation mark, it must be distinct from the marks denoting the string: – part1= “Ella’s toy” – Part2=‘Tom\n’s plane’ 9

10 Methods In general, methods that do not change the list are available to use with str and tuple String methods >>> message=(“Meet me at the coffee shop. OK?”) >>> message.lower() 'meet me at the coffee shop. ok?' >>> message.upper() 'MEET ME AT THE COFFEE SHOP. OK?' 10

11 Immutable, but… It is possible to create a new string with the same name as a previous string. This leaves the previous string without a label. >>> note="walk today" >>> note 'walk today' >>> note = "go shopping" >>> note 'go shopping' 11 The original string is still there, but cannot be accessed because it no longer has a label

12 Strings and Lists of Strings Extract individual words from a string >>> words = message.split() >>> words ['Meet', 'me', 'at', 'the', 'coffee', 'shop.', 'OK?'] OK to split on any token >>> terms=("12098,scheduling,of,real,time,10,21,,real time,") >>> terms '12098,scheduling,of,real,time,10,21,,real time,' >>> termslist=terms.split() >>> termslist ['12098,scheduling,of,real,time,10,21,,real', 'time,'] >>> termslist=terms.split(',') >>> termslist ['12098', 'scheduling', 'of', 'real', 'time', '10', '21', '', 'real time', '’] 12 Note that there are no spaces in the words in the list. The spaces were used to separate the words and are dropped.

13 String Methods Methods for strings, not lists: – terms.isalpha() – terms.isdigit() – terms.isspace() – terms.islower() – terms.isupper() – message.lower() – message.upper() – message.capitalize() – message.center(80) (center in 80 places) – message.ljustify(80) (left justify in 80 places) – message.rjustify(80) – message.strip() (remove left and right white spaces) – message.strip(chars) (returns string with left and/or right chars removed) – startnote.replace("Please m","M") 13

14 Adding lists sent1 is the first sentence in text1, sent2 the first sentence in text2, etc. – expressed as lists of words. sent1+sent2 is the list that is the first sentence of text1 followed by the first sentence of text2 Try it: combine the sentences of some texts. 14

15 Indexing >>> text4[173] 'awaken' >>> text4.index('awaken') 173 15 >>> text4[1000:1100] ['that', 'the', 'propitious', 'smiles', 'of', 'Heaven', 'can', 'never', 'be', 'expected', 'on', 'a', 'nation', 'that', 'disregards', 'the', 'eternal', 'rules', 'of', 'order', 'and', 'right', 'which', 'Heaven', 'itself', 'has', 'ordained', ';', 'and', 'since', 'the', 'preservation', 'of', 'the', 'sacred', 'fire', 'of’ … Slicing >>> text4[:10] ['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the', 'House'] >>> len(text4) 145735 >>> text4[145720:] ['you', '.', 'God', 'bless', 'you', '.', 'And', 'God', 'bless', 'the', 'United', 'States', 'of', 'America', '.']

16 16 >>> saying = ['After','all','is','said','and','done','more','is','said','than','done'] >>> tokens=set(saying) >>> tokens set(['and', 'all', 'said', 'is', 'After', 'done', 'than', 'more']) >>> tokens=sorted(tokens) >>> tokens ['After', 'all', 'and', 'done', 'is', 'more', 'said', 'than'] >>> tokens[-2:] ['said', 'than']

17 Some statistics on text Frequency distributions 17 >>> fdist1=FreqDist(text1) fd>>> fdist1 >>> vocabulary1=fdist1.keys() >>> vocabulary1[:50] [',', 'the', '.', 'of', 'and', 'a', 'to', ';', 'in', 'that', "'", '-', 'his', 'it', 'I', 's', 'is', 'he', 'with', 'was', 'as', '"', 'all', 'for', 'this', '!', 'at', 'by', 'but', 'not', '--', 'him', 'from', 'be', 'on', 'so', 'whale', 'one', 'you', 'had', 'have', 'there', 'But', 'or', 'were', 'now', 'which', '?', 'me', 'like'] >>> fdist1['whale'] 906

18 Spot check With a partner, do exercises 2.14, 2.15, 2.16. 2.17 (Python book) – Half the room do first and last. Other half do the middle two. Choose a spokesperson to present your answers (one person per problem). Choose another person to be designated questioner of other side (though anyone can ask a question, that person must do so.) 18

19 Numeric types int – whole numbers, no decimal places float – decimal numbers, with decimal place long – arbitrarily long ints. Python does conversion when needed operations between same types gives result of that type operations between int and float yields float >>> 3/2 1 19 >>> 3./2. 1.5 >>> 3/2. 1.5 >>> 3.//2. 1.0 >>> 18%4 2 >>> 18//4 4

20 Numeric operators 20 book slide

21 Numeric Operators 21 book slide

22 Numeric Operators 22 book slide

23 Casting >>> str(3.14159) '3.14159' >>> int(3.14159) 3 >>> round(3.14159) 3.0 >>> round(3.5) 4.0 >>> round(3.499999999999) 3.0 >>> num=3.789 >>> num 3.7890000000000001 >>> str(num) '3.789' >>> str(num+4) '7.789’ >>> str(num) '3.789' >>> str(num+4) '7.789' >>> >>> list(num) Traceback (most recent call last): File " ", line 1, in TypeError: 'float' object is not iterable >>> list(str(num)) ['3', '.', '7', '8', '9'] >>> tuple(str(num)) ('3', '.', '7', '8', '9') 23 Convert from one type to another

24 Functions We have seen some of these before 24 book slide

25 Functions 25 book slide

26 Modules Collections of things that are very handy to have, but not as universally needed as the built-in functions. >>> from math import pi >>> pi 3.1415926535897931 >>> import math >>> math.sqrt(32)*10 56.568542494923804 >>> We will use the nltk module Once imported, use help( ) for full documentation 26

27 Common modules 27 book slide

28 Expressions 28 Several part operations, including operators and/or function calls Order of operations same as arithmetic – Function evaluation – Parentheses – Exponentiation (right to left) – Multiplication and Division (left to right) – Addition and Subtraction (left to right) book slide

29 Evaluation trees make precedence clear 1 + 2 * 3 29 book slide

30 Evaluation tree for strings fullname=firstName+ ‘ ‘ + lastName 30 book slide

31 Boolean Values are False or True 31 book slide XYnot XX and YX or YX == YX != y False TrueFalse TrueFalse True FalseTrueFalseTrue False TrueFalseTrue FalseTrue False

32 Evaluation tree involving boolean values 32 book slide

33 Source code in file Avoid retyping each command each time you run the program. Essential for non- trivial programs. Allows exactly the same program to be run repeatedly -- still interpreted, but no accidental changes Use print statement to output to display File has.py extension Run by typing python.py python termread.py 33

34 Basic I/O print – list of items separated by commas – automatic newline at end – forced newline: the character ‘\n’ raw_input( ) – input from the keyboard – input comes as a string. Cast it to make it into some other type input( ) – input comes as a numeric value, int or float 34

35 Case Study – Date conversion months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun’, 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] date = raw_input('Enter date (mm-dd-yyyy)') pieces = date.split('-') monthVal = months[int(pieces[0])] print monthVal+ ' '+pieces[1]+', '+pieces[2] 35 Try it – run it on your machine with a few dates

36 Spot check Again, split the class. Work in pairs – Side by my office do Exercise 2.24 and 2.28 – Other side do Exercise 2.26 and 2.27 Again, designate a person to report on each of the side’s results and a person who is designated question generator for the other side’s results – No repeats of individuals from the first set! 36

37 For Next Week 2.36 – Check now to make sure that you understand it. – Make a.py file, which you will submit. – I will get the Blackboard site ready for an upload. 37


Download ppt "Getting set with Python and NLTK Tuples, Strings, Numeric types."

Similar presentations


Ads by Google