Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 3820 & 6820 Natural Language Processing Harry Howard

Similar presentations


Presentation on theme: "LING 3820 & 6820 Natural Language Processing Harry Howard"— Presentation transcript:

1 Text statistics 4 Day 26 - 10/27/14
LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

2 Course organization http://www.tulane.edu/~howard/LING3820/
The syllabus is under construction. Chapter numbering 3.7. How to deal with non-English characters 4.5. How to create a pattern with Unicode characters 6. Control NLP, Prof. Howard, Tulane University 24-Oct-2014

3 Open Spyder NLP, Prof. Howard, Tulane University 24-Oct-2014

4 Review The quiz was the review NLP, Prof. Howard, Tulane University
24-Oct-2014

5 Review of dictionaries & FreqDist
>>> from corpFunctions import textLoader >>> text = textLoader('Wub.txt') >>> from nltk.probability import FreqDist >>> wubFD = FreqDist(word.lower() for word in text) NLP, Prof. Howard, Tulane University 24-Oct-2014

6 wubFD.plot(50) NLP, Prof. Howard, Tulane University 24-Oct-2014

7 wubFD.plot(50, cumulative=True)
NLP, Prof. Howard, Tulane University 24-Oct-2014

8 Tuples >>> singleton = (1) >>> double = (1,2)
>>> triple = (1,2,3) >>> quadruple = (1,2,3,4) >>> singleton >>> singleton[0] >>> double[0] >>> double[1] >>> double[2] >>> triple[3] >>> quadruple[4] NLP, Prof. Howard, Tulane University 24-Oct-2014

9 How to print values on a logarithmic scale
The task is to extract the values/outcomes from the frequency distribution in order to graph them against their rank without any words. The values must be sorted from high to low in order to reflect their rank order. >>> Y = [v for (k,v) in wubFD.items()] >>> Y = sorted(Y,reverse=True) >>> X = range(1,len(Y)+1) >>> import matplotlib.pyplot as plt >>> plt.loglog(X,Y) >>> plt.title("Logarithmic rank-frequency plot of 'Beyond Lies the Wub'") >>> plt.xlabel('Rank') >>> plt.ylabel('Frequency') NLP, Prof. Howard, Tulane University 24-Oct-2014

10 Next time Q7 Conditional frequency
NLP, Prof. Howard, Tulane University 24-Oct-2014


Download ppt "LING 3820 & 6820 Natural Language Processing Harry Howard"

Similar presentations


Ads by Google