DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia Lab Department of Electrical Engineering Columbia University *Courtesy to Eric Zavesky for preparing for the slides
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Problem Online video search and video indexing Events characterized by an evolution of scenes, objects and actions over time 56 events are defined in LSCOM Airplane Flying Car Exiting
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition : Challenges Geometric and photometric variances Clutter background Complex camera motion and object motion
DVMM Lab, Columbia UniversityVideo Event Recognition Event Recognition : Object Tracking Detect interest object, track over time, and model spatio-temporal dynamics Hard to detect events without explicit object motion, such as Riot Object Detection & Localization Tracking Inference “Airplane Landing” ?
DVMM Lab, Columbia UniversityVideo Event Recognition Event Recognition : Key-Frame based Matching Only key-frame is used for matching. Low-level feature extraction, compare to other frames, overall decision on matching... KeyframeFeature 15% 18% 50% Similarity
DVMM Lab, Columbia UniversityVideo Event Recognition multi-level pyramid matching Event Recognition : Multi-level Pyramid Matching feature extraction concept detectors EMDdistanceEMDdistance... X
DVMM Lab, Columbia UniversityVideo Event Recognition Content Representation: Low-level Features edge direction histogram grid color moment σσσ μμμγγγ Gabor texture
DVMM Lab, Columbia UniversityVideo Event Recognition Train detectors on low-level features Mid-level semantic concept feature is more robust Developed and released 374 semantic concept detectors Concept Detectors Content Representation: Mid-level Semantic Concept Scores Image Database + -
DVMM Lab, Columbia UniversityVideo Event Recognition Earth Mover’s Distance (EMD): Approach d ij Supplier P is with a given amount of goods Receiver Q is with a given limited capacity Weights: Solved by linear programming Temporal shift: a frame at the beginning of P can be mapped to a frame at the end of Q Scale variations: a frame from P can be mapped to multiple frames in Q 1 1/2 1/2
DVMM Lab, Columbia UniversityVideo Event Recognition Multi-level Pyramid Matching : Motivations One Clip = several subclips (stages of event evolution) No prior knowledge about the number of stages in an event Videos of the same event may include only a subset of stages Solution: Multi-level pyramid matching in temporal domain
DVMM Lab, Columbia UniversityVideo Event Recognition Fusion of information from different levels. Alignment of different subclips (Level-1 as an example) EMD Distance Matrix between Sub-clips Integer-value Alignment Smoke Fire Smoke Level-0 Level-1 Temporally Constrained Hierarchical Agglomerative Clustering Fire Multi-level Pyramid Matching: Algorithm Level-2
DVMM Lab, Columbia UniversityVideo Event Recognition Pyramid Matching : Projected Illustration First stage of shot 1 Second stage of shot 1 First stage of shot 2 Second stage of shot 2 Negative shots
DVMM Lab, Columbia UniversityVideo Event Recognition Experiments : Keyframe based feature performance Dataset: TRECVID2005 Evaluation Metric: Average Precision
DVMM Lab, Columbia UniversityVideo Event Recognition Experiments : EMD concept performance
DVMM Lab, Columbia UniversityVideo Event Recognition Experiments : Benefits of multi-level pyramid fusion
DVMM Lab, Columbia UniversityVideo Event Recognition Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy. First systematic study of diverse visual event recognition in the unconstrained broadcast news domain. Video Event Recognition: Conclusions