Download presentation

Presentation is loading. Please wait.

Published byBarbra Pierce Modified over 5 years ago

1
Methods in Computational Linguistics II Queens College Lecture 1: Introduction

2
Methods in Computational Linguistics II 2 nd semester of a two semester course providing instruction in –The basics of computer science and programming (via python) –An introduction to techniques in computational linguistics 1

3
My background Research –Speech Synthesis, and Recognition –Prosody (Intonation) –Speech Segmentation –Non-native speech –Political speech, and other paralinguistics Computer Science professor at Queens and CUNY GC. Worked at IBM and Google 2

4
Your Background Name. What are your research interests in linguistics? How do you expect computational linguistics to fit into your work? –Are there techniques or applications that you are particularly looking to learn Programming background? –1 semester? more? Are you simultaneously taking Language Technologies 3

5
Outline NLTK –Overview –Major Capabilities Searching and Sorting. –Linear (Sequential) search –Binary Search –Insertion sort –MergeSort Course Policies Syllabus Review 4

6
NLTK Natural Language Toolkit. A set of utilities in python that facilitate the processing of text. 5

7
NLTK Functionality Accessing corpora String processing Collocation discovery Part of speech tagging Classification and Clustering Evaluation Metrics Chunking Parsing 6

8
NLTK Functionality Semantic interpretation –first order logic, lambda calculus, model checking Probability and estimation WordNet Browsing Chatbots 7

9
NLTK as a resource This range of functionality is quite broad, and not necessarily cohesive. However, there are resources and tools (functions and objects) that underpin most major computational linguistics tasks. 8

10
Major Computational Linguistics Tasks Syntax –Tagging –Parsing Semantics –Information Extraction –Semantic Role Labeling Phonology Sentence Processing Segmentation Summarization Speech Recognition Speech Synthesis Information Retrieval Sentiment Analysis Authorship studies Co-reference resolution 9

11
NLTK Resources NLTK also contained lexical material –Project Gutenberg –WordNet –Penn Treebank (subset) –Named Entity Recognition data –Inaugural addresses –Sentiment data –Names corpus –Switchboard (subset) –TIMIT –Webtext 10

12
Quick Assignment Methods I used NLTK. Homework 0 –Make sure that NLTK is installed and working correctly –Install matplotlib to use nltk’s graphing functions. “Due” asap. 11

13
One Question Pop Quiz Solve for p 12

14
Math Computational Linguistics requires a not- quite-trivial amount of math. Statistics and probabilistic modeling form the pillars underlying these computational techniques. This involves counting and algebra. Machine learning governs the classification and clustering techniques that CL makes heavy use of. –Requires calculus, statistics, linear algebra. 13

15
Math in this course Overview of probability. –Next class Algebra for evaluation, some common features Statistics for Naïve Bayes classification Entropy in Decision Trees 14

16
Outline NLTK –Overview –Major Capabilities Searching and Sorting. –Linear (Sequential) search –Binary Search –Insertion sort –MergeSort Course Policies Syllabus Review 15

17
Data Structures, Algorithms, etc. In computer science, there is a tight relationship between data structures and algorithms In general, the more complex the data structure –the more general or flexible the data and relationships that can be represented –the faster algorithms can run 16

18
Searching and Sorting Searching and sorting is a frequent example of the relationship between algorithm runtimes, and data structuring. Search: identify the location of a value, x, in a list, A. Sort: manipulate a list A, such that the values in A are increasing. A[i] <= A[i+1] 17

19
Sequential Search def search(A, x): for i in xrange(len(A)): if A[i] == x: return i return -1 18

20
How long does sequential search take to run? Best case? Worst case? Average case? 19

21
Binary Search If the list A is in increasing order, large chunks of the list can be be ignored. 20 def search(A, x): top = len(A) bottom = 0 while bottom < top: mid = (top + bottom) / 2 if A[mid] < x: bottom = mid + 1 elif A[mid] > x: top = mid else: return mid return -1

22
How long does binary search take to run? Best Case? Worst Case? Average Case? 21

23
Improvement of Binary Search Binary search is a significant improvement –log n < n However, Binary search requires that A is sorted. How long does it take to sort an Array and how does this impact the total runtime? 22

24
Insertion Sort Sort the list [5, 2, 4, 6, 1, 3] 23 def insertionSort(A): for j in xrange(1, len(A)): key = A[j] i = j - 1 while i > -1 and A[i] > key: A[i + 1] = A[i] i = i - 1 A[i + 1] = key

25
How long does Insertion sort take to run? Best Case? Worst Case? Average Case? 24

26
Can we sort faster? Yes. This requires recursion. We’ll come back to this, but here is a first example. 25

27
Merge Sort 26 def mergeSort(A): if len(A) == 1: return A mid = len(A) / 2 Abottom = mergeSort(A[1:mid]) Atop = mergeSort(A[mid + 1:len(A)]) return merge(Abottom, Atop)

28
Merge 27 def merge(A, B): C = [] i = 0 j = 0 A.append(float('inf')) B.append(float('inf')) for k in xrange(len(A) + len(B)): if A[i] < B[j]: C.append(A[i]) i = i + 1 else: C.append(B[j]) j = j + 1 return C

29
How long does Merge Sort take to run? Hint: This is a (much) harder question. Best Case? Worst Case? Average Case? 28

30
Comparison of run times 29 SortingSearching 0n n*log(n)log(n) How much searching do you need to do to make it worth sorting?

31
Class Structure and Policies Course website: –http://eniac.cs.qc.cuny.edu/andrew/methods2/syllabus.htmlhttp://eniac.cs.qc.cuny.edu/andrew/methods2/syllabus.html Email list –Banner does not have an email function –Put your email address on the sign up sheet. 30

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google