NLTK & Python Day 4 LING 681.02 Computational Linguistics Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
GSSR Research Methodology and Methods of Social Inquiry January 10, 2012 Research Using Available Data.
Advertisements

Lists Ruth Anderson CSE 140 University of Washington 1.
Programming for Linguists
Regular expressions Day 2
Text Corpora and Lexical Resources Chapter 2 of Natural Language Processing with Python.
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
Finite-state automata 2 Day 13 LING Computational Linguistics Harry Howard Tulane University.
Python for NLP and the Natural Language Toolkit CS1573: AI Application Development, Spring 2003 (modified from Edward Loper’s notes)
NLTK & BASIC TEXT STATS DAY /08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Lecture 3 Ngrams Topics Python NLTK N – grams SmoothingReadings: Chapter 4 – Jurafsky and Martin January 23, 2013 CSCE 771 Natural Language Processing.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Python Programming Chapter 10: Dictionaries Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Programming for Linguists An Introduction to Python 24/11/2011.
October 4, 2005ICP: Chapter 4: For Loops, Strings, and Tuples 1 Introduction to Computer Programming Chapter 4: For Loops, Strings, and Tuples Michael.
Lists in Python.
Structured programming 4 Day 34 LING Computational Linguistics Harry Howard Tulane University.
ON-LINE DOCUMENTS 3 DAY /17/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Text classification Day 35 LING Computational Linguistics Harry Howard Tulane University.
NLTK & Python Day 7 LING Computational Linguistics Harry Howard Tulane University.
Structured programming 3 Day 33 LING Computational Linguistics Harry Howard Tulane University.
1 © 2002, Cisco Systems, Inc. All rights reserved. Arrays Chapter 7.
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Built-in Data Structures in Python An Introduction.
Information extraction 2 Day 37 LING Computational Linguistics Harry Howard Tulane University.
OCR Computing GCSE © Hodder Education 2013 Slide 1 OCR GCSE Computing Python programming 9: Tuples and lists.
NLTK & Python Day 5 LING Computational Linguistics Harry Howard Tulane University.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Finite-state automata Day 12 LING Computational Linguistics Harry Howard Tulane University.
NLTK & Python Day 6 LING Computational Linguistics Harry Howard Tulane University.
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
+ Arrays & Random number generator. + Introduction In addition to arrays and structures, C supports creation and manipulation of the following data structures:
1D Arrays and Random Numbers Artem A. Lenskiy, PhD May 26, 2014.
Discrete Mathematics Lecture # 22 Recursion.  First of all instead of giving the definition of Recursion we give you an example, you already know the.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
Lists Michael Ernst CSE 140 University of Washington.
Lists Ruth Anderson University of Washington CSE 160 Winter
CONTROL 3 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
String and Lists Dr. José M. Reyes Álamo.
NLTK Natural Language Processing with Python, Steven Bird, Ewan Klein, and Edward Loper, O'REILLY, 2009.
Lists 2 Day /19/14 LING 3820 & 6820 Natural Language Processing
Types and Values.
CSCE 590 Web Scraping – NLTK
Ruth Anderson University of Washington CSE 160 Spring 2015
Computation with strings 2 Day 3 - 9/02/16
Computation with strings 3 Day 4 - 9/07/16
Computation with strings 1 Day 2 - 8/31/16
LING 388: Computers and Language
Lists Part 1 Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Regular expressions 2 Day /23/16
Bryan Burlingame 03 October 2018
Bryan Burlingame Halloween 2018
control 4 Day /01/14 LING 3820 & 6820 Natural Language Processing
LING 388: Computers and Language
Ruth Anderson University of Washington CSE 160 Winter 2017
Data Structures – 1D Lists
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
CEV208 Computer Programming
String and Lists Dr. José M. Reyes Álamo.
CSCE 771 Natural Language Processing
Regular expressions 3 Day /26/16
Variables, Lists, and Objects
15-110: Principles of Computing
Bryan Burlingame Halloween 2018
Computation with strings 4 Day 5 - 9/09/16
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
Presentation transcript:

NLTK & Python Day 4 LING Computational Linguistics Harry Howard Tulane University

31-Aug-2009LING , Prof. Howard, Tulane University2 Course organization  I have requested that Python and NLTK be installed on the computers in this room.

NLPP §1 Language processing & Python §1.1 Computing with language

31-Aug-2009LING , Prof. Howard, Tulane University4 Loading the book's texts >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1,..., text9 and sent1,..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G. K. Chesterton 1908 >>>

31-Aug-2009LING , Prof. Howard, Tulane University5 Searching text  Show every token of a word in context, called concordance view.  text1.concordance("monstrous")  Show the words that appear in a similar range of contexts.  text1.similar("monstrous")  Show the contexts that two words share.  text1.common_contexts("monstrous")

31-Aug-2009LING , Prof. Howard, Tulane University6 Searching text, cont.  Plot how far each token of a word is from the beginning of a text.  text1.dispersion_plot(["monstrous"])  Needs NumPy & Matplotlib, though it didn't work for me.  Generate random text.  text1.generate()

31-Aug-2009LING , Prof. Howard, Tulane University7 Counting vocabulary  Count the word and punctuation tokens in a text:  len(text1)  List the distinct words, i.e. the word types, in a text:  set(text1)  Count how many types there are in a text:  len(set(text1))  Count the tokens of a word type:  text1.count("smote")

31-Aug-2009LING , Prof. Howard, Tulane University8 Lexical richness or diversity  The lexical richness or diversity of a text can be estimated as tokens per type:  len(text1) / len(set(text1)  The frequency of a type can be estimated as tokens per all tokens:  100 * text1.count('a') / len(text1)  This is integer division, however.  p. 8 "_future_" is some kind of error

31-Aug-2009LING , Prof. Howard, Tulane University9 Making your own function in Python  To save you from typing the same thing over and over, you can define your own function: >>> def lexical_diversity(text):...return len(text1) / len(set(text1)  You call this function just by typing it and filling in the argument, a text name, in the parenthesis: >>> lexical_diversity(text1)

31-Aug-2009LING , Prof. Howard, Tulane University10 Other functions  Sort the word types in a text alphabetically:  sorted(set(text1))

31-Aug-2009LING , Prof. Howard, Tulane University11 Exercises 1.8.…  4. … How many words are there in text2? How many distinct words are there?  5. Compare the lexical diversity scores for humor and romance fiction in Table 1.1. Which genre is more lexically diverse?Table 1.1  8. Consider the following Python expression: len(set(text4)). State the purpose of this expression. Describe the two steps involved in performing this computation.

NLPP §1.2 A Closer Look at Python: Texts as Lists of Words

31-Aug-2009LING , Prof. Howard, Tulane University13 The representation of a text  We will think of a text as nothing more than a sequence of words and punctuation.  The opening sentence of Moby Dick: >>> sent1 = ['Call', 'me', 'Ishmael', '.']  The bracketed material is known as a list in Python.  We can inspect it by typing the name.  How would you find out how many words it has?

31-Aug-2009LING , Prof. Howard, Tulane University14 List construction  Append one list to the end of another with '+', known as concatenation: >>> ['Monty', 'Python'] + ['and', 'the', 'Holy', 'Grail'] ['Monty', 'Python', 'and', 'the', 'Holy', 'Grail'] >>> sent4 + sent1 ['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the','House', 'of', 'Representatives', ':', 'Call', 'me', 'Ishmael', '.']  Append a single item to a list  >>> sent1.append("Some")  sent1 ['Call', 'me', 'Ishmael', '.', 'Some']

31-Aug-2009LING , Prof. Howard, Tulane University15 List indexing  Each element in a list is numbered in sequence, a number known as the element's index.  Show the item that occurs at an index such as 173 in a text: >>> text4[173] 'awaken'  Show the index of an element's first occurrence: >>>text4.index('awaken') 173  Show the elements between two indices (slicing): >>> text5[16715:16735] >>> text5[16715:] >>> text5[:16735]  Assign an element to an index: >>> text[0] = 'First'

31-Aug-2009LING , Prof. Howard, Tulane University16 Python counts from 0  Create a list: >>> sent = ['word1', 'word2', 'word3', 'word4', 'word5',... 'word6', 'word7', 'word8', 'word9', 'word10']  Find the first word: >>> sent[0] 'word1' Find the last word: >>> sent[9] 'word10'  What does sent[10] do?  It produces a runtime error.

31-Aug-2009LING , Prof. Howard, Tulane University17 List exercises

Next time NLPP: finish §1 and do all exercises; do up to Ex 8 in §2