Using Large-Vocabulary Classifiers William W. Cohen.

Slides:



Advertisements
Similar presentations
Overview of this week Debugging tips for ML algorithms
Advertisements

Rocchio’s Algorithm 1. Motivation Naïve Bayes is unusual as a learner: – Only one pass through data – Order doesn’t matter 2.
MapReduce.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Introduction to Information Retrieval
CS 540 Database Management Systems
Programming Types of Testing.
Naïve Bayes for Large Vocabularies William W. Cohen.
Designing Algorithms Csci 107 Lecture 4. Outline Last time Computing 1+2+…+n Adding 2 n-digit numbers Today: More algorithms Sequential search Variations.
Basics of MapReduce Shannon Quinn. Today Naïve Bayes with huge feature sets – i.e. ones that don’t fit in memory Pros and cons of possible approaches.
Sorting Large Files Part One:  Why even bother?  And a simple solution.
Lecture 5 (Classification with Decision Trees)
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Naïve Bayes and Hadoop Shannon Quinn.
CS146 Overview. Problem Solving by Computing Human Level  Virtual Machine   Actual Computer Virtual Machine Level L0.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
1 Rizwan Rehman Centre for Computer Studies Dibrugarh University.
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 11 Parallel.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Vyassa Baratham, Stony Brook University April 20, 2013, 1:05-2:05pm cSplash 2013.
Phrase Finding with “Request-and-Answer” William W. Cohen.
Question 10 What do I write?. Spreadsheet Make sure that you have got a printout of your spreadsheet - no spreadsheet, no marks!
Using Large-Vocabulary Classifiers William W. Cohen.
Phrase Finding; Stream-and-Sort vs “Request-and-Answer” William W. Cohen.
Efficient Logistic Regression with Stochastic Gradient Descent – part 2 William Cohen.
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.
Other Map-Reduce (ish) Frameworks William Cohen 1.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
MAP-REDUCE ABSTRACTIONS 1. Abstractions On Top Of Hadoop We’ve decomposed some algorithms into a map-reduce “workflow” (series of map-reduce steps) –
Some Details About Stream- and-Sort Operations William W. Cohen.
Naïve Bayes for Large Vocabularies William W. Cohen.
Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably.
Phrase Finding William W. Cohen. ACL Workshop 2003.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Announcements My office hours: Tues 4pm Wed: guest lecture, Matt Hurst, Bing Local Search.
Naïve Bayes with Large Feature sets: (1)Stream and Sort (2)Request and Answer Pattern (3) Rocchio’s Algorithm COSC 526 Class 4 Arvind Ramanathan Computational.
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
Map-Reduce With Hadoop. Announcement 1/2 Assignments, in general: Autolab is not secure and assignments aren’t designed for adversarial interactions Our.
CS4432: Database Systems II Query Processing- Part 2.
More on Data Streams and Streaming Data Shannon Quinn (with thanks to William Cohen of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford)
Parallel and Distributed Simulation Time Parallel Simulation.
Naïve Bayes with Large Vocabularies (aka, stream- and-sort) Shannon Quinn.
Some More Efficient Learning Methods William W. Cohen.
Parallel and Distributed Computing: MapReduce pilfered from: Alona Fyshe.
Scaling up: Naïve Bayes on Hadoop What, How and Why?
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.
Phrase Finding; Stream-and-Sort vs “Request-and-Answer” William W. Cohen.
Naïve Bayes for Large Vocabularies and Stream-and-Sort Programming William W. Cohen 1.
DATA STRUCTURES (CS212D) Overview & Review Instructor Information 2  Instructor Information:  Dr. Radwa El Shawi  Room: 
Efficient Logistic Regression with Stochastic Gradient Descent
Phrase Finding; Stream-and-Sort to “Request-and-Answer” William W. Cohen.
Analyzing Programs: Order of Growth CMSC Introduction to Computer Programming October 11, 2002.
A Quick Overview of Probability + Naïve William W. Cohen Machine Learning
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Mistake Bounds William W. Cohen. One simple way to look for interactions Naïve Bayes – two class version dense vector of g(x,y) scores for each word in.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Lecture 2. Algorithms and Algorithm Convention 1.
Revision Units Exam date 16th May 2017 Summer 1.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Announcements Working AWS codes will be out soon
10-605: Map-Reduce Workflows
Lecture 2- Query Processing (continued)

Big ML.
Presentation transcript:

Using Large-Vocabulary Classifiers William W. Cohen

Using Large-vocabulary Naïve Bayes For each example id, y, x 1,….,x d in test: train: Sort the event-counter update “messages” Scan and add the sorted messages and output the final counter values Initialize a HashSet NEEDED and a hashtable C For each example id, y, x 1,….,x d in test: – Add x 1,….,x d to NEEDED For each event, C(event) in the summed counters – If event involves a NEEDED term x read it into C For each example id, y, x 1,….,x d in test: – For each y’ in dom(Y): Compute log Pr (y’,x 1,….,x d ) = …. [For assignment] Model size: O (|V|) Time: O (n 2 ), size of test Memory : same Time: O (n 2 ) Memory: same Time: O (n 2 ) Memory: same 2

Review of NB algorithms HWTrain eventsTest eventsSuggested data 1AHashMap RCV1 1BMsgs  DiskHashMap (for subset) Wikipedia ---Msgs  DiskMsgs on Disk (coming….) ---

Outline Even more on stream-and-sort and naïve Bayes Another problem: “meaningful” phrase finding Implementing phrase finding efficiently Some other phrase-related problems

Last Week How to implement Naïve Bayes – Time is linear in size of data (one scan!) – We need to count, e.g. C( X=word ^ Y=label) How to implement Naïve Bayes with large vocabulary and small memory – General technique: “Stream and sort” Very little seeking (only in merges in merge sort) Memory-efficient

Flaw: Large-vocabulary Naïve Bayes is Expensive to Use For each example id, y, x 1,….,x d in train: Sort the event-counter update “messages” Scan and add the sorted messages and output the final counter values For each example id, y, x 1,….,x d in test: – For each y’ in dom(Y): Compute log Pr (y’,x 1,….,x d ) = Model size: max O(n), O(|V||dom(Y)|)

The workaround I suggested For each example id, y, x 1,….,x d in train: Sort the event-counter update “messages” Scan and add the sorted messages and output the final counter values Initialize a HashSet NEEDED and a hashtable C For each example id, y, x 1,….,x d in test: – Add x 1,….,x d to NEEDED For each event, C(event) in the summed counters – If event involves a NEEDED term x read it into C For each example id, y, x 1,….,x d in test: – For each y’ in dom(Y): Compute log Pr (y’,x 1,….,x d ) = ….

Can we do better?

id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … id 5 w 5,1 w 5,2 …... X=w 1 ^Y=sports X=w 1 ^Y=worldNews X=.. X=w 2 ^Y=… X=… … … Test data Event counts id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … C[X=w 1,1 ^Y=sports]=5245, C[X=w 1,1 ^Y=..],C[X=w 1,2 ^…] C[X=w 2,1 ^Y=….]=1054,…, C[X=w 2,k2 ^…] C[X=w 3,1 ^Y=….]=… … What we’d like

Can we do better? X=w 1 ^Y=sports X=w 1 ^Y=worldNews X=.. X=w 2 ^Y=… X=… … … Event counts wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Step 1 : group counters by word w How: Stream and sort: for each C[X=w^Y=y]=n print “w C[Y=y]=n” sort and build a list of values associated with each key w Like an inverted index

If these records were in a key-value DB we would know what to do…. id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … id 5 w 5,1 w 5,2 …... Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Step 2: stream through and for each test case id i w i,1 w i,2 w i,3 …. w i,ki request the event counters needed to classify id i from the event-count DB, then classify using the answers Classification logic

Is there a stream-and-sort analog of this request-and-answer pattern? id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … id 5 w 5,1 w 5,2 …... Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Step 2: stream through and for each test case id i w i,1 w i,2 w i,3 …. w i,ki request the event counters needed to classify id i from the event-count DB, then classify using the answers Classification logic

Recall: Stream and Sort Counting: sort messages so the recipient can stream through them example 1 example 2 example 3 …. Counting logic “C[ x] +=D” Machine A Sort C[x1] += D1 C[x1] += D2 …. Logic to combine counter updates Machine C Machine B

Is there a stream-and-sort analog of this request-and-answer pattern? id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … id 5 w 5,1 w 5,2 …... Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Classification logic W 1,1 counters to id 1 W 1,2 counters to id 2 … W i,j counters to id i …

Is there a stream-and-sort analog of this request-and-answer pattern? id 1 found an aarvark in zynga’s farmville today! id 2 … id 3 …. id 4 … id 5 ….. Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Classification logic found ctrs to id 1 aardvark ctrs to id 1 … today ctrs to id 1 …

Is there a stream-and-sort analog of this request-and-answer pattern? id 1 found an aarvark in zynga’s farmville today! id 2 … id 3 …. id 4 … id 5 ….. Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Classification logic found~ ctrs to id 1 aardvark ~ctrs to id 1 … today ~ctrs to id 1 … ~ is the last ascii character % export LC_COLLATE=C means that it will sort after anything else with unix sort

Is there a stream-and-sort analog of this request-and-answer pattern? id 1 found an aardvark in zynga’s farmville today! id 2 … id 3 …. id 4 … id 5 ….. Test data Record of all event counts for each word wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464 Classification logic found~ ctr to id 1 aardvark ~ctr to id 2 … today ~ctr to id i … Counter records requests Combine and sort

A stream-and-sort analog of the request-and- answer pattern… Record of all event counts for each word wCounts aardvarkC[w^Y=sports]=2 agent… … zynga … found~ ctr to id 1 aardvark ~ctr to id 1 … today ~ctr to id 1 … Counter records requests Combine and sort wCounts aardvarkC[w^Y=sports]=2 aardvark~ctr to id1 agentC[w^Y=sports]=… agent~ctr to id345 agent~ctr to id9854 …~ctr to id345 agent~ctr to id34742 … zyngaC[…] zynga~ctr to id1 Request-handling logic

A stream-and-sort analog of the request-and- answer pattern… requests Combine and sort wCounts aardvarkC[w^Y=sports]=2 aardvark~ctr to id1 agentC[w^Y=sports]=… agent~ctr to id345 agent~ctr to id9854 …~ctr to id345 agent~ctr to id34742 … zyngaC[…] zynga~ctr to id1 Request-handling logic previousKey = somethingImpossible For each ( key,val ) in input: If key ==previousKey Answer(recordForPrevKey, val) Else previousKey = key recordForPrevKey = val define Answer (record,request): find id where “ request = ~ctr to id ” print “ id ~ ctr for request is record”

A stream-and-sort analog of the request-and- answer pattern… requests Combine and sort wCounts aardvarkC[w^Y=sports]=2 aardvark~ctr to id1 agentC[w^Y=sports]=… agent~ctr to id345 agent~ctr to id9854 …~ctr to id345 agent~ctr to id34742 … zyngaC[…] zynga~ctr to id1 Request-handling logic previousKey = somethingImpossible For each ( key,val ) in input: If key ==previousKey Answer(recordForPrevKey, val) Else previousKey = key recordForPrevKey = val define Answer (record,request): find id where “ request = ~ctr to id ” print “ id ~ ctr for request is record” Output: id1 ~ctr for aardvark is C[w^Y=sports]=2 … id1 ~ctr for zynga is …. …

A stream-and-sort analog of the request-and- answer pattern… wCounts aardvarkC[w^Y=sports]=2 aardvark~ctr to id1 agentC[w^Y=sports]=… agent~ctr to id345 agent~ctr to id9854 …~ctr to id345 agent~ctr to id34742 … zyngaC[…] zynga~ctr to id1 Request-handling logic Output: id1 ~ctr for aardvark is C[w^Y=sports]=2 … id1 ~ctr for zynga is …. … id 1 found an aardvark in zynga’s farmville today! id 2 … id 3 …. id 4 … id 5 ….. Combine and sort ????

id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … C[X=w 1,1 ^Y=sports]=5245, C[X=w 1,1 ^Y=..],C[X=w 1,2 ^…] C[X=w 2,1 ^Y=….]=1054,…, C[X=w 2,k2 ^…] C[X=w 3,1 ^Y=….]=… … KeyValue id1found aardvark zynga farmville today ~ctr for aardvark is C[w^Y=sports]=2 ~ctr for found is C[w^Y=sports]=1027,C[w^Y=worldNews]=564 … id2w 2,1 w 2,2 w 2,3 …. ~ctr for w 2,1 is … …… What we’d wanted What we ended up with

Implementation summary java CountForNB train.dat … > eventCounts.dat java CountsByWord eventCounts.dat | sort | java CollectRecords > words.dat java requestWordCounts test.dat | cat - words.dat | sort | java answerWordCountRequests | cat - test.dat| sort | testNBUsingRequests id 1 w 1,1 w 1,2 w 1,3 …. w 1,k1 id 2 w 2,1 w 2,2 w 2,3 …. id 3 w 3,1 w 3,2 …. id 4 w 4,1 w 4,2 … id 5 w 5,1 w 5,2 …... X=w1^Y=sports X=w1^Y=worldNews X=.. X=w2^Y=… X=… … … train.dat counts.dat

Implementation summary java CountForNB train.dat … > eventCounts.dat java CountsByWord eventCounts.dat | sort | java CollectRecords > words.dat java requestWordCounts test.dat | cat - words.dat | sort | java answerWordCountRequests | cat - test.dat| sort | testNBUsingRequests words.dat wCounts associated with W aardvarkC[w^Y=sports]=2 agentC[w^Y=sports]=1027,C[w^Y=worldNews]=564 …… zyngaC[w^Y=sports]=21,C[w^Y=worldNews]=4464

Implementation summary java CountForNB train.dat … > eventCounts.dat java CountsByWord eventCounts.dat | sort | java CollectRecords > words.dat java requestWordCounts test.dat | cat - words.dat | sort | java answerWordCountRequests | cat - test.dat| sort | testNBUsingRequests wCounts aardvarkC[w^Y=sports]=2 agent… … zynga … found~ ctr to id 1 aardvark ~ctr to id 2 … today ~ctr to id i … wCounts aardvarkC[w^Y=sports]=2 aardvark~ctr to id1 agentC[w^Y=sports]=… agent~ctr to id345 agent~ctr to id9854 …~ctr to id345 agent~ctr to id34742 … zyngaC[…] zynga~ctr to id1 output looks like this input looks like this words.dat

Implementation summary java CountForNB train.dat … > eventCounts.dat java CountsByWord eventCounts.dat | sort | java CollectRecords > words.dat java requestWordCounts test.dat | cat - words.dat | sort | java answerWordCountRequests | cat - test.dat | sort | testNBUsingRequests Output: id1 ~ctr for aardvark is C[w^Y=sports]=2 … id1 ~ctr for zynga is …. … id 1 found an aardvark in zynga’s farmville today! id 2 … id 3 …. id 4 … id 5 ….. Output looks like this test.dat

Implementation summary java CountForNB train.dat … > eventCounts.dat java CountsByWord eventCounts.dat | sort | java CollectRecords > words.dat java requestWordCounts test.dat | cat - words.dat | sort | java answerWordCountRequests | cat - test.dat | sort | testNBUsingRequests Input looks like this KeyValue id1found aardvark zynga farmville today ~ctr for aardvark is C[w^Y=sports]=2 ~ctr for found is C[w^Y=sports]=1027,C[w^Y=worldNews]=564 … id2w 2,1 w 2,2 w 2,3 …. ~ctr for w 2,1 is … ……