Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing

Similar presentations


Presentation on theme: "Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing"— Presentation transcript:

1 Lists 1 Day 10 - 9/17/14 LING 3820 & 6820 Natural Language Processing
Harry Howard Tulane University

2 Course organization http://www.tulane.edu/~howard/LING3820/
The syllabus is under construction. NLP, Prof. Howard, Tulane University 17-Sept-2014

3 Review of regular expressions
NLP, Prof. Howard, Tulane University 17-Sept-2014

4 4.3.4. Summary table meta-character matches name notes a|b a or b
disjunction (ab) a and b grouping only outputs what is in (); (?:ab) for rest of pattern [ab] range [a-z] lowercase, [A-Z] uppercase, [0-9] digits [^a] all but a negation a{m, n} from m to n of a repetition a{n} a number n of a ^a a at start of S a$ a at end of S a+ one or more of a a+? lazy + a* zero or more of a Kleene star a*? lazy * a? with or without a optionality a?? lazy ? NLP, Prof. Howard, Tulane University 17-Sept-2014

5 4.4. Character classes class abbreviates name notes \w [a-zA-Z0-9_]
alphanumeric it’s really alphanumeric and underscore, but we are lazy \W [^a-zA-Z0-9_] not alphanumeric \d [0-9] digit \D [^0-9] not a digit \b word boundary \B not a word boundary \t horizontal tab \v vertical tab \n newline \r carriage return \f form-feed \s [ tvnrf] whitespace \S [^ tvnrf] not whitespace \A ^ \Z $

6 §5. Lists NLP, Prof. Howard, Tulane University 17-Sept-2014

7 Open Spyder NLP, Prof. Howard, Tulane University 17-Sept-2014

8 Introduction In working with re.findall(), you have seen many instances of a collection of strings held within square brackets, such as the one below: >>> S = '''This above all: to thine own self be true, ... And it must follow, as the night the day, ... Thou canst not then be false to any man.''' >>> re.findall(r'\b[a-zA-Z]{4}\b', S) ['This', 'self', 'true', 'must', 'Thou', 'then'] NLP, Prof. Howard, Tulane University 17-Sept-2014

9 Definition of list A list in Python is a sequence of objects delimited by square brackets, []. The objects are separated by commas. Consider this sentence from Shakespeare’s A Midsummer Night’s Dream represented as a list: >>> L = ['Love', 'looks', 'not', 'with', 'the', 'eyes', ',', 'but', 'with', 'the', 'mind', '.'] >>> type(L) >>> type(L[0]) L is a list of strings. You may think that a string is also a list of characters, and you would be correct for ordinary English, but in pythonic English, the word ‘list’ refers exclusively to a sequence of objects delimited by square brackets. It is also a builtimethod, so you shouldn't use it as a name. NLP, Prof. Howard, Tulane University 17-Sept-2014

10 An example with numerical objects
>>> type(i) >>> I = [0,1,i,3] >>> type(I) >>> type(I[0]) >>> n = 2.3 >>> type(n) >>> N = [2.0,2.1,2.2,n] >>> type(N) >>> type(N[0]) NLP, Prof. Howard, Tulane University 17-Sept-2014

11 Most of the string methods work just as well on lists
>>> len(L) >>> sorted(L) >>> set(L) >>> sorted(set(L)) >>> len(sorted(set(L))) >>> L+['!'] >>> len(L+['!']) >>> L*2 >>> len(L*2) >>> L.count('the') NLP, Prof. Howard, Tulane University 17-Sept-2014

12 String methods work on lists, cont.
>>> L.count('Love') >>> L.count('love') >>> L.index('with') >>> L.rindex('with') >>> L[2:] >>> L[:2] >>> L[-2:] >>> L[:-2] >>> L[2:-2] >>> L[-2:2] >>> L[:] >>> L[:-1]+['!'] NLP, Prof. Howard, Tulane University 17-Sept-2014

13 5.2.3. How to put lists into random order
It may seem odd to you now, but randomizing the order of a list is sometimes a very useful thing to do. Python has an amazingly simple scheme for randomization, the shuffle method from the random module. shuffle() randomizes the sequence of a list in place, destroying the list’s original sequence. NLP, Prof. Howard, Tulane University 17-Sept-2014

14 Examples of randomization
The examples below first save the original lists into new variables and then randomize them in order to not lose the original sequencing: >>> from random import shuffle >>> rL = L[:] >>> shuffle(rL) >>> rL >>> L NLP, Prof. Howard, Tulane University 17-Sept-2014

15 >>> shuffle(rI) >>> rI >>> rN = N
>>> rI = I >>> shuffle(rI) >>> rI >>> rN = N >>> shuffle(rN) >>> rN NLP, Prof. Howard, Tulane University 17-Sept-2014

16 5.3. How to convert between strings and lists
>>> S1 = 'William Shakespeare' >>> S2 = 'William_Shakespeare' >>> S3 = 'William' >>> u = '_' >>> S1.split() >>> S2.split() >>> S2.split('_') >>> S2.split(u) >>> list(S3) The split() function divides a prefixed string into chains at a space. If there is no space, then a string must be supplied at which the input string can be divided. This separator string can be a variable. If there is no chance of finding a separator, then list() will separate every character into a list. NLP, Prof. Howard, Tulane University 17-Sept-2014

17 How to join() a list into a string
>>> L1 = ['William', 'Shakespeare'] >>> u = '_' >>> ''.join(L1) >>> ' '.join(L1) >>> u.join(L) join() requires a prefixed string as a separator. If the empty string is supplied, then the strings are concatenated without interruption. The separator can be a variable. NLP, Prof. Howard, Tulane University 17-Sept-2014

18 The inverse relation between splitting and joining
NLP, Prof. Howard, Tulane University 17-Sept-2014

19 Next time More on lists NLP, Prof. Howard, Tulane University
17-Sept-2014


Download ppt "Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing"

Similar presentations


Ads by Google