Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University

Research questions Given huge collections of images online, how can we analyze images and non-visual metadata to: – Help users organize, browse, search? – Mine information about the state of the world and human behavior?

Common computational problems 1. Image-by-image (e.g. recognition) – Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) – Sometimes few but long-running iterations – Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) – Small graphs with huge label spaces – Large graphs with small label spaces – Large graphs with large label spaces

Scene classification E.g.: Find images containing snow, in a collection of ~100 million images – Typical approach: extract features and run a classifier (typically SVM) on each image – We use Hadoop, typically with trivial Reducer, images in giant HDFS sequence files, and C++ Map-Reduce bindings

Geolocation Given an image, where on Earth was it taken? – Match against thousands of place models, or against hundreds of attribute classifiers (e.g. indoor vs outdoor, city vs rural, etc.) – Again use Hadoop with trivial mapper

Learning these models Many recognition approaches use “bags-of-words” – Using vector space model over “visual words” To learn, need to: 1.Generate vocabulary of visual words (e.g. with K-means) 2. Extract features from training images 3.Learn a classifier Our computational approach: 1.For k-means, use iterative Map-Reduce (Twister – J. Qiu) 2. For feature extraction, Map-Reduce with trivial reducer 3.For learning classifiers, we use off-the-shelf packages (can be quite slow)

Inference on graphical models Statistical graphical models are widely used in vision – Basic idea: vertices are variables, with some known and some unknown; edges are probabilistic relationships – Inference is NP hard in general Many approximation algorithms are based on message passing – e.g. Loopy Discrete Belief Propagation – # of Messages proportional to # of edges in graph – Messages can be large – size depends on variable label space – # of iterations depends (roughly) on diameter of graph

Pose and part-based recognition Represent objects in terms of parts Can be posed as graphical model inference problem – Small number of variables (vertices) and constraints (edges), but large label space (millions++) – We use single-node multi-threaded implementation, with barriers between iterations

Fine-grained object recognition Classify amongst similar objects (e.g. species of birds) – How can we learn discriminative properties of these objects automatically? – Model each training image as a node, edges between all pairs; goal is to label each image with a feature that is found in all positive examples and no negative examples We use off-the-shelf solver – With some additional multi- threading on single node; still very slow

Large-scale 3D reconstruction

Pose as inference problem View reconstruction as statistical inference over a graphical model –Vertices are cameras and points –Edges are relative camera/point correspondences (estimated through point matching) –Inference: Label each image with a camera pose and each point with a 3-d position, such that constraints are satisfied

Computation Our graphs have ~100,000 vertices, ~1,000,000 edges, ~100,000 possible discrete labels – Reduce computation using exact algorithmic tricks (min convolutions) from O(|E| |L| 2 ) to O(|E| |L|) – Huge amount of data: total message size >1TB per iteration Parallelize using iterative MapReduce – Hadoop plus shell scripts for iteration – Mappers take in messages from last iteration and compute outgoing messages – Reducers collate and route messages – Messages live on HDFS between iterations

Common computational problems 1. Image-by-image (e.g. recognition) – Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) – Sometimes few but long-running iterations – Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) – Small graphs with huge label spaces – Large graphs with small label spaces – Large graphs with large label spaces

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.

Similar presentations

Presentation on theme: "Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.

Similar presentations

Presentation on theme: "Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University."— Presentation transcript:

Similar presentations

About project

Feedback