Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388: Computers and Language

Similar presentations


Presentation on theme: "LING 388: Computers and Language"— Presentation transcript:

1 LING 388: Computers and Language
Lecture 19

2 nltk book: Language Processing and Python
2   A Closer Look at Python: Texts as Lists of Words: Assuming sent1,..,sent9 from nltk.book import *

3 nltk book: Language Processing and Python
sent2, sent3 + for concatenation

4 nltk book: Language Processing and Python
.append() to the end of the list (mutates the list) .append() vs .extend() to the end of the list: from stackoverflow.com

5 nltk book: Language Processing and Python
Indexing [<index>]: Slices [<index>:<index>]: (can omit either <index>, default value)

6 nltk book: Language Processing and Python
We know indexing works on strings (as well as lists): Repetition (*), Concatenation (+): .join() .split()

7 nltk book: Language Processing and Python
Understanding check: Answer: Last two words by alphabetic sorting…

8 nltk book: Language Processing and Python
3.1   Frequency Distributions methods: .plot() .most_common() .hapaxes()

9 nltk book: Language Processing and Python

10 nltk book: Language Processing and Python
specifically relevant to Moby Dick; other reported words are generic "English plumbing"

11 nltk book: Language Processing and Python

12 nltk book: Language Processing and Python
Extract long words (using list comprehension):

13 nltk book: Language Processing and Python
text5: chat corpus Pick out all the words longer than 7 characters that occur more than 7 times (using list comprehension) and sort them:

14 nltk book: Language Processing and Python
Classes: FreqDist vs. Text

15 nltk book: Language Processing and Python
Word length distribution (3.4 Counting Other Things)

16 nltk book: Language Processing and Python
fdistl1.plot() fdistl1.plot(cumulative=True)


Download ppt "LING 388: Computers and Language"

Similar presentations


Ads by Google