How can we maintain an error bound? Settle for a “per-step” bound What’s the probability of a mistake at each step? Not cumulative, but Equal footing with.

Slides:



Advertisements
Similar presentations
Scaling Up Graphical Model Inference
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Hierarchical Dirichlet Processes
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
LDA Training System 8/22/2012.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
ICML 2003 © Sergey Kirshner, UC Irvine Unsupervised Learning with Permuted Data Sergey Kirshner Sridevi Parise Padhraic Smyth School of Information and.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
British Museum Library, London Picture Courtesy: flickr.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Online Learning for Latent Dirichlet Allocation
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Topic Modeling using Latent Dirichlet Allocation
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Data Structures and Algorithms in Parallel Computing
Scaling up LDA (Monday’s lecture). What if you try and parallelize? Split document/term matrix randomly and distribute to p processors.. then run “Approximate.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Visualization Lab By: Thomas Kraft.  What is being talked about and where?  Twitter has massive amounts of data  Tweets are unstructured  Goal: Quickly.
Error-Correcting Code
Scaling up LDA William Cohen. First some pictures…
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Large Scale Parallel Supervised Topic-Modeling -implementation plan- Keisuke Kamataki Jun Zhu Eric Xing Sep 27, 2010.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Probabilistic models for corpora and graphs. Review: some generative models Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Online Multiscale Dynamic Topic Models
Large-scale Machine Learning
Multiplicative updates for L1-regularized regression
8.5 The Dot Product.
EE 193: Parallel Computing
Theory of Measurements and Errors
CSCI1600: Embedded and Real Time Software
Advanced Statistical Computing Fall 2016
Markov Networks.
CIS 700: “algorithms for Big Data”
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
CSCI B609: “Foundations of Data Science”
Bayesian Inference for Mixture Language Models
Topic models for corpora and for graphs
≠ Particle-based Variational Inference for Continuous Systems
Resource Recommendation for AAN
Michal Rosen-Zvi University of California, Irvine
Topic models for corpora and for graphs
Low Depth Cache-Oblivious Algorithms
CSCI1600: Embedded and Real Time Software
Presentation transcript:

How can we maintain an error bound? Settle for a “per-step” bound What’s the probability of a mistake at each step? Not cumulative, but Equal footing with other sources of error Still hard: distributions are changing at each step can’t see other processors until they’re done Compute a retrospective bound Hilbert’s projective metric Used to analyze belief propagation; nice properties Separate “constant” from “changing” counts Measure error in the “constant” part at the end Use this to bound the error at every step –Start with initial counts a, b, c 0 =v 0 +h 0 –P=1 updates a,b, v 0 ! v 1 –AD-LDA: P=2 uses v 0 instead of v 1 ; updates a,b,h –When done, measure d(v 0, v 1 ); O(T) work Understanding Errors in Approximate Distributed Latent Dirichlet Allocation Alexander Ihler David Newman Dept. of Computer Science University of California, Irvine Latent Dirichlet Allocation Dept. of Computer Science University of California, Irvine AD-LDA and modificationsScaling and Experimental Results z di 1A2A3A 3B1B2B 2C3C1C Documents Words Adding a non-negative vector h never increases the metric Invariant to inversion Invariant to element-wise scaling Invariant to scalar normalization Number of Cores Speedup Factor Enron NIPS KOS Ideal Sample Error Probability Enron Bound Enron Error Can also compute “true” error (just not in parallel) Shape matches that of the error bound Peak early on Falls to steady-state level Number of Data Sample Error Probability KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference Error bounds NICTA Victoria Research Lab U. Melbourne, Australia Un-collapsed Gibbs sampling Easy to make parallel Collapsed Gibbs sampling Fundamentally sequential Each sample depends (slightly) on all others Parallel efficiency and scaling similar to AD-LDA Same strengths, weaknesses local v. shared data Experiments: shared memory, multicore implementation Investigate scaling properties with Data set size, N Number of processors / blocks, P Number of topics, T Scaling is fairly predictable using a simple approximation: May deteriorate for very large T Extensions to DP models? From Newman et al. (2009) AD-LDA: just run CGS in parallel anyway (Newman et al, 2008; 2009; extensions) Distribute documents across P nodes: Ignore dependence between parallel samples (z, a) local, not shared; (b,c) copied across all nodes In practice, this works great Anecdotal examples: performs the same as LDA But can we know how it will do on new data? No way to know but run sequential LDA First modification: additional partitioning Subdivide data across documents and words Organize computation into orthogonal epochs No two concurrent jobs overlap Less work per epoch, but Fewer inconsistencies “b” no longer shared – only “c” Bulk, stable quantity; if constant, exact samples Properties of HPM Bounds the L1 norm case filed injunction court siut security check background privacy information Document 1: filed suit privacy injunction information case case court security injunction security privacy… Document 2: suit case injunction filed court filed case court background court suit… Topic models for text corpora Topics are bags of words Documents are mixtures of topics … … Massive data sets, linear complexity Call for parallel or distributed algorithms (Nallapati et al 2007; Newman et al 2008; Asuncion et al 2009; Wang et al 2009; Yan et al 2009; …) Gibbs sampling: Collapsed sampler converges more quickly