CMPT-884 Jan 18, 2010 Video Copy Detection using Hadoop Presented by: Cameron Harvey Naghmeh Khodabakhshi CMPT 820 December 2, 2010
Introduction Video Copy Detection is an important tool for detecting copyright infringements A copy can be obtained through a set of transformations from the original such as: Video cropping or scaling Gamma shift Blurring Addition of logo Changes in quality (noise, framerate, …)
Copy Detection Process There are 2 stages in the Copy Detection Process Signature Extraction The video content is used to create a unique signature of the reference and the query video Signature matching A distance metric compares the signatures of the videos If they are close enough, the query is considered to be a copy
SURF - Speeded up Robust Features A method to find and extract a set of interest points from an image It is based on SIFT – Scale Invariant Feature Transformation
Our Solution We implemented the Copy Detection algorithm of Roth et. al. [1] but parallelized the process to speed up processing For each frame in the video we divide the image into 4x4 giving 16 regions [1] G. Roth, R. Lagani`ere, P. Lambert, I. Lakhmiri, and T. Janati. A simple but effective approach to video copy detection. In CRV ’10: Proceedings of the 2010 Canadian Conference on Computer and Robot Vision, pages 63–70, Washington, DC, USA, IEEE Computer Society.
Our Solution (2) A signature for each frame is created by counting the number of SURF features discovered in each region. The signature is a 16-dimensional vector We compare frames using a distance metric If the distance is below a threshold, then the frames are considered to be a match
Our Solution (3) A video is considered a copy if there are a significant number of consecutive matching frames
Hadoop: Map-Reduce Based on key-value data structures MAP The input to the MAP function is a list of key-value pairs. A user defined function is applied to every element of the list REDUCE The output of the Map is sorted based on the keys and passed to the Reduce function The Reduce function joins values with the same keys
Parallelizing the Process
Parallelizing the Process (2)
Results: Gamma Correction
Results: Scaling
Results: Blurring
Results: Hadoop Signature extraction using a single node 1 hour 2 minutes and 11 seconds Signature extraction using 6 nodes 12 minutes and 3 seconds
Questions