Presentation on theme: "Shih-Fu Chang1, Yu-Gang Jiang1,2, Junfeng He1, Elie El Khoury3,"— Presentation transcript:
1Concept Detection: Convergence to Local Features and Opportunities Beyond Shih-Fu Chang1, Yu-Gang Jiang1,2, Junfeng He1, Elie El Khoury3,Chong-Wah Ngo21 DVMM Lab, Columbia University2 City University of Hong Kong3 IRIT, Toulouse, FranceGood morning, my name is Yu-Gang Jiang. I will talk about our experiments in the concept detection task. Our main focus is on discriminative local feature representation… but I will also talk about other opportunities for this task.TRECVID 2008 workshop, NIST
2Overview: 5 components & 6 runs ClassifiersLocal FeatureSVM65Global Feature4374-d fea.CU-VIREO3743Web Images14Let me first give you an overview of our system. There are 5 major components in our system…With these components, we submitted six runs. Our run-6 uses local feature alone, where we optimized the representation choices and our aim is to pursue of the upper limit of local features for concept detection. We add in one more components in each of the other runs to learn the effectiveness of other components…2FilteringFace & Audio
3Overview: overall performance (161)Local feature alone already achieves near top performanceEvery other component contributes incrementally to the final detection
7Representation Choices of BoW Word weighting schemeHow to weight the importance of a word to an image?Spatial informationAre the spatial locations of keypoints useful?
8Weighting Scheme Traditional… Our method – soft weighting Binary, Term frequency (TF), inverse document frequency (IDF)…Our method – soft weighting-- Assign a keypoint to multiple visual words-- weights are determined by keypoint-to- word similarityDetails in:Jiang et al. CIVR 2007.Image from
9Vocabulary Size & Weighting Scheme Soft weightingImprove TF by 10%-20%More accurate to assess the importance of a keypoint
10Spatial Information Partition image into equal-sized regions Concatenate BoW features from the regionsPoor generalizabilityF = (f11, f12, f13, f21, f22, f23, f31, f32, f33)
11Spatial InformationSpatial Information does not help much for concept detection2x2 is a good choice3x3 and 4x4 may cause mismatch problem
12Local Feature Representation Framework K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool,“A comparison of affine region detectors”, IJCV, vol. 65, pp , 2005.
13Internal Results – Local Features Over TRECVID 2008 Test DataMAP: 0.157Similar!13%
14Failure Cases - I Flower Possible Solutions Small visual area missesFlowerSmall visual areaColoration/texture too similar to background scenePossible SolutionsColor-descriptorClass-specific visual words
15Failure Cases - II Boat_Ship, Airplane_flying Possible Solution missesBoat_Ship, Airplane_flyingLearning biased by background sceneDifficulty from occlusionPossible SolutionFeature selection
16Summary – Local Features BoW with good representation choices achieved very impressive performanceSoft-weighting is very effectiveMultiple spatial layouts are usefulMulti-detectors do not help muchRooms for future improvementClass-specific visual words, feature selection, color-descriptor etc.
20CU-VIREO374 Fusion of Columbia374 and VIREO374 FeatureDimensionColumbia374Grid-based color moment (LUV)225Gabor Texture48Edge Direction Histogram73VIREO374Bag-of-visual-words (soft weighting)500Grid-based Color Moment (Lab)Grid-based Wavelet Texture81Performance of CU-VIREO374 over TRECVID 2006 Test DataCU-VIREO374VIREO374Columbia374Scores on the TRECVID2008 corpora:Yu-Gang Jiang, Akira Yanagawa, Shih-Fu Chang, Chong-Wah Ngo, "Fusing Columbia374 and VIREO-374 for Large Scale Semantic Concept Detection", Columbia University ADVENT Technical Report # , Aug
21Concept Fusion Using CU-VIREO374 Train a SVM for each conceptUsing CU-VIREO374 scores as featuresPerformance improvement is merely 2%Need a better concept fusion model!2.2%Run5: Local+global
23Exploring External Images from Web ProblemSparsity of positive dataConcept Name# Positive shotsClassroom224Harbor195Bridge158Telephone184Emergency_Vehicle88Street1551Dog122Demonstration/Protest134Kitchen250Hand1515Airplane_flying72Mountain239Two_people3630Nighttime424Bus87Boat_Ship437Driver258Flower582Cityscape288Singing366Total # of shots in TV’08 Dev: 36,262
24Challenging IssuesHow to make use of the large amount of “noisily labeled” web images for concept detection?Issue 1: filter the false positive samplesFlickr ImagesGoodBad
25Challenging IssuesHow to make use of the large amount of “noisily labeled” web images for concept detection?Issue 1: filter the false positive samplesIssue 2: overcome the cross-domain problemFlickrTRECVID
26Preliminary Results Web image set: 18,000 from Flickr Results Issue 1: filter the false positive samplesGraph based semi-supervised learningIssue 2: overcome the cross-domain problemWeighted SVMResultsMAP: no difference“Bus”: improve 50%Open Problem!
28Face Detection and Tracking Face Detection (OpenCV Toolbox)Tracking based on face location and skin colorCharacter 1Start FrameEnd FrameBackwardtrackingForwardPt1Pt2...Pt1Pt2xFace DetectionTracking
29“Two_people” Detector 250 frames2 people250 frames/person1 & 150 frames/person2100 frames/1 person & 150 frames/2 peopleDrawbackCannot find person when face is too small or invisible
30Detecting “Singing” based on Audio Vibrato“the variation of the frequency of an musical instrument or of the voice”Harmonic Coefficient HaIt corresponds to the most important trigonometric series of the spectrumHa is higher in the presence of singing voice
31Performance – Face & Audio Improve “two_people” by 4% and “singing” by 8%Simple heuristics help detect specific concepts.
32Conclusions Convergence to Local Features CU-VIREO374 Local feature alone achieved an impressive MAP of 0.157Representation choices are critical for good performanceThe combination of local features and global features introduces a moderate gain (MAP 0.162)CU-VIREO374Useful resource for concept fusion and video searchA better fusion model is neededFace & Audio detectorsSimple heuristics help detect specific conceptsTraining from external web images – open problemUseful for concepts lacking in positive training samplesChallenges:Unreliable labelsDomain differences
33More information at: http://www.ee.columbia.edu/dvmm/ Thank you!