1 Introduction to Natural Language Processing (600.465) Word Classes: Programming Tips & Tricks AI-lab 2003.11.

Slides:



Advertisements
Similar presentations
Further Pure 1 Matrices Introduction. Definitions A matrix is just a rectangle of numbers. It’s a bit like a two-way table. You meet this concept in D1.
Advertisements

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Introduction to Graph “theory”
Transportation Problem (TP) and Assignment Problem (AP)
ITGD4207 Operations Research
ECE 552 Numerical Circuit Analysis Chapter Four SPARSE MATRIX SOLUTION TECHNIQUES Copyright © I. Hajj 2012 All rights reserved.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
1.5 Elementary Matrices and a Method for Finding
Algorithms Sheet 2 Identifying Control Structures.
Planning under Uncertainty
1.2 Row Reduction and Echelon Forms
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
Linear Equations in Linear Algebra
Lesson 8 Gauss Jordan Elimination
1 Reduction between Transitive Closure & Boolean Matrix Multiplication Presented by Rotem Mairon.
Problem Set 2 Solutions Tree Reconstruction Algorithms
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
Linear Programming Applications
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
5  Systems of Linear Equations: ✦ An Introduction ✦ Unique Solutions ✦ Underdetermined and Overdetermined Systems  Matrices  Multiplication of Matrices.
Clustering Unsupervised learning Generating “classes”
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Positional Number Systems
Number Sequences Lecture 7: Sep 29 ? overhang. This Lecture We will study some simple number sequences and their properties. The topics include: Representation.
Eva Math Team 6 th -7 th Grade Math Pre-Algebra. 1. Relate and apply concepts associated with integers (real number line, additive inverse, absolute value,
TCP Traffic and Congestion Control in ATM Networks
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
1 Introduction to Natural Language Processing ( ) Mutual Information and Word Classes AI-lab
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
Cilk Pousse James Process CS534. Overview Introduction to Pousse Searching Evaluation Function Move Ordering Conclusion.
9/22/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) Dr. Jan Hajič CS Dept., Johns.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab
Union-find Algorithm Presented by Michael Cassarino.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
Linear Systems – Iterative methods
CS 103 Discrete Structures Lecture 19 Relations. Chapter 9.
Lesson 76 – Introduction to Complex Numbers HL2 MATH - SANTOWSKI.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
Revision on Matrices Finding the order of, Addition, Subtraction and the Inverse of Matices.
1 Closures of Relations Based on Aaron Bloomfield Modified by Longin Jan Latecki Rosen, Section 8.4.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
Matrices Introduction.
MAT 322: LINEAR ALGEBRA.
All-pairs Shortest paths Transitive Closure
Matrices Rules & Operations.
RNA sequence-structure alignment
5 Systems of Linear Equations and Matrices
Linear Algebra Lecture 4.
Linear Equations in Linear Algebra
Closures of Relations: Transitive Closure and Partitions
Fundamentals of Data Representation
CSE 541 – Numerical Methods
Parallel Sorting Algorithms
Applied Combinatorics, 4th Ed. Alan Tucker
Matrices Introduction.
Chapter 6 Network Flow Models.
Computational Genomics Lecture #3a
Closures of Relations Epp, section 10.1,10.2 CS 202.
Linear Equations in Linear Algebra
Applied Discrete Mathematics Week 4: Functions
Presentation transcript:

1 Introduction to Natural Language Processing ( ) Word Classes: Programming Tips & Tricks AI-lab

2 The Algorithm (review) Define merge(r,k,l) = (r ’,C ’ ) such that C ’ = C - {k,l}  {m (a new class)} r ’ (w) = r(w) except for k,l member words for which it is m. 1. Start with each word in its own class (C = V), r = id. 2. Merge two classes k,l into one, m, such that (k,l) = argmax k,,l I merge(r,k,l) (D,E). 3. Set new (r,C) = merge(r,k,l). 4. Repeat 2 and 3 until |C| reaches a predetermined size.

3 Complexity Issues Still too complex: –|V| iterations of the steps 2 and 3. –|V| 2 steps to maximize argmax k,l (selecting k,l freely from |C|, which is in the order of |V| 2 ) –|V| 2 steps to compute I(D,E) (sum within sum, all classes, includes log) –  total: |V| 5 –i.e., for |V| = 100, about steps; ~ several hours! –but |V| ~ 50,000 or more

4 Trick #1: Recomputing The MI the Smart Way: Subtracting... Bigram count table: Test-merging c 2 and c 4 : recompute only rows/cols 2 & 4: –subtract column/row (2 & 4) from the MI sum (intersect.!) –add sums of merged counts (row & column)

5...and Adding Add the merged counts: Be careful at intersections: –(don ’ t forget to add this:)

6 Trick #2: Precompute the Counts-to-be-Subtracted Summing loop goes through i,j...but the single row/column sums do not depend on the (resulting sums after the) merge  can be precomputed only 2k logs to compute at each algorithm iteration, instead of k 2 Then for each “ merge-to-be ” compute only add-on sums, plus “ intersection adjustment ”

7 Formulas for Tricks #1 and #2 Let ’ s have k classes at a certain iteration. Define: q k (l,r) = p k (l,r) log(p k (l,r) / (p kl (l) p kr (r))) same, but using counts: q k (l,r) = c k (l,r)/N log(N c k (l,r)/(c kl (l) c kr (r))) Define further (row+column i sum): s k (a) =  l=1..k q k (l,a) +  r=1..k q k (a,r) - q k (a,a) Then, the subtraction part of Trick #1 amounts to sub k (a,b) = s k (a) + s k (b) - q k (a,b) - q k (b,a) intersection adjustment remaining intersect. adj. precomputed

8 Formulas - cont. After-merge add-on: add k (a,b) =  l=1..k,l  a,b q k (l,a+b) +  r=1..k,r  a,b q k (a+b,r) + q k (a+b,a+b) What is it a+b? Answer: the new (merged) class. Hint: use the definition of q k as a “ macro ”, and then p k (a+b,r) = p k (a,r) + p k (b,r) (same for other sums, equivalent) The above sums cannot be precomputed After-merge Mutual Information (I k is the “ old ” MI, kept from previous iteration of the algorithm): I k (a,b) (MI after merge of cl. a,b) = I k - sub k (a,b) + add k (a,b)

9 Trick #3: Ignore Zero Counts Many bigrams are 0 –(see the paper: Canadian Hansards, <.1 % of bigrams are non-zero) Create linked lists of non-zero counts in columns and rows (similar effect: use perl ’ s hashes) Update links after merge (after step 3)

10 Trick #4: Use Updated Loss of MI We are now down to |V| 4 : |V| merges, each merge takes |V| 2 “ test-merges ”, each test-merge involves order-of-|V| operations ( add k (i,j) term, foil #8) Observation: many numbers (s k, q k ) needed to compute the mutual information loss due to a merge of i+j do not change: namely, those which are not in the vicinity of neither i nor j. Idea: keep the MI loss matrix for all pairs of classes, and (after a merge) update only those cells which have been influenced by the merge.

11 Formulas for Trick #4 (s k-1,L k-1 ) Keep a matrix of “ losses ” L k (d,e).  Init: L k (d,e) = sub k (d,e) - add k (d,e) [then I k (d,e) = I k - L k (d,e)] Suppose a,b are now the two classes merged into a: Update (k-1: index used for the next iteration; i,j  a,b): –s k-1 (i) = s k (i) - q k (i,a) - q k (a,i) - q k (i,b) - q k (b,i) + q k-1 (a,i) + q k-1 (i,a) – 2 L k-1 (i,j) = L k (i,j) - s k (i) + s k-1 (i) - s k (j) + s k-1 (j) + + q k (i+j,a) + q k (a,i+j) + q k (i+j,b) + q k (b,i+j) - - q k-1 (i+j,a) - q k-1 (a,i+j) [NB: may substitute even for s k, s k-1 ] NB  L k is symmetrical L k (d,e) = L k (e,d) (q k is something different!) 2 The update formula L k-1 (l,m) is wrong in the Brown et. al paper

12 Completing Trick #4 s k-1 (a) must be computed using the “ Init ” sum. L k-1 (a,i) = L k-1 (i,a) must be computed in a similar way, for all i  a,b. s k-1 (b), L k-1 (b,i), L k-1 (i,b) are not needed anymore (keep track of such data, i.e. mark every class already merged into some other class and do not use it anymore). Keep track of the minimal loss during the L k (i,j) update process (so that the next merge to be taken is obvious immediately after finishing the update step).

13 Effective Implementation Data Structures: (N - # of bigrams in data [fixed]) –Hist(k) history of merges Hist(k) = (a,b) merged when the remaining number of classes was k –c k (i,j) bigram class counts [updated] –c kl (i), c kr (i) unigram (marginal) counts [updated] –L k (a,b) table of losses; upper-right trianlge [updated] –s k (a) “ subtraction ” subterms [optionally updated] –q k (i,j) subterms involving a log [opt. updated] The optionally updated data structures will give linear improvement only in the subsequent steps, but at least s k (i) is necessary in the initialization phase (1 st iteration)

14 Implementation: the Initialization Phase 1 Read data in, init counts c k (l,r); then  l,r,a,b; a < b: 2 Init unigram counts: c kl (l) =  r=1..k c k (l,r), c kr (r) =  l=1..k c k (l,r) –complicated? remember, must take care of start & end of data! 3 Init q k (l,r): use the 2 nd formula (count-based) on foil 7, q k (l,r) = c k (l,r)/N log(N c k (l,r)/(c kl (l) c kr (r))) 4 Init s k (a) =  l=1..k q k (l,a) +  r=1..k q k (a,r) - q k (a,a) 5 Init L k (a,b) = s k (a)+s k (b)-q k (a,b)-q k (b,a)-q k (a+b,a+b)+ -  l=1..k,l  a,b q k (l,a+b) -  r=1..k,r  a,b q k (a+b,r)

15 Implementation: Select & Update 6 Select the best pair (a,b) to merge into a (watch the candidates when computing L k (a,b)); save to Hist(k) 7 Optionally, update q k (i,j) for all i,j  b, get q k-1 (i,j) –remember those q k (i,j) values needed for the updates below 8 Optionally, update s k (i) for all i  b, to get s k-1 (i) –again, remember the s k (i) values for the “ loss table ” update 9 Update the loss table, L k (i,j), to L k-1 (i,j), using the tabulated q k, q k-1, s k and s k-1 values, or compute the needed q k (i,j) and q k-1 (i,j) values dynamically from the counts: c k (i+j,b) = c k (i,b) + c k (j,b); c k-1 (a,i) = c k (a+b,i)

16 Towards the Next Iteration 10 During the L k (i,j) update, keep track of the minimal loss of MI, and the two classes which caused it. 11 Remember such best merge in Hist(k). 12 Get rid of all s k, q k, L k values. 13 Set k = k -1; stop if k == Start the next iteration –either by the optional updates (steps 7 and 8), or –directly updating L k (i,j) again (step 9).

17 Moving Words Around Improving Mutual Information –take a word from one class, move it to another (i.e., two classes change: the moved-from and the moved-to), compute I new (D,E); keep change permanent if I new (D,E) > I(D,E) –keep moving words until no move improves I(D,E) Do it at every iteration, or at every m iterations Use similar “ smart ” methods as for merging

18 Using the Hierarchy Natural Form of Classes –follows from the sequence of merges: evaluation assessment analysis understanding opinion

19 Numbering the Classes (within the Hierarchy) Binary branching Assign 0/1 to the left/right branch at every node: evaluation assessment analysis understanding opinion [padding: 0] prefix determines class: 00 ~ {evaluation,assessment}