Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel,

Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel, whlin, bsm}@cs.cmu.edu School of Computer Science Carnegie Mellon University

Talk Outline TRECVID 2007 BBC Rushes Summarization TaskTRECVID 2007 BBC Rushes Summarization Task Look at a few Video SummarizationsLook at a few Video Summarizations Assessment Procedure: Are they any good?Assessment Procedure: Are they any good? First Study: 25x, cluster, pzFirst Study: 25x, cluster, pz Second Study (focus on acceleration): 25x, 50x, 100x, pzSecond Study (focus on acceleration): 25x, 50x, 100x, pz DiscussionDiscussion

TRECVID 2007 BBC Rushes Summarization Video summary is “a condensed version of some information, such that various judgments about the full information can be made using only the summary and taking less time and effort than would be required using the full information source”Video summary is “a condensed version of some information, such that various judgments about the full information can be made using only the summary and taking less time and effort than would be required using the full information source” Maximum 4% durationMaximum 4% duration Benefits of this TRECVID task: provides a reasonably large video collection to be summarized, a uniform method of creating ground truth, and a uniform scoring mechanismBenefits of this TRECVID task: provides a reasonably large video collection to be summarized, a uniform method of creating ground truth, and a uniform scoring mechanism

BBC Rushes 42 test videos (+ development ones) from BBC Archive Test videos: minimum duration 3.3 minutes maximum duration 36.4 minutes mean duration 25 minutes Raw (unedited) rush video with a great deal of redundancy (repeated takes), mixed quality audio, “junk” framesRaw (unedited) rush video with a great deal of redundancy (repeated takes), mixed quality audio, “junk” frames

Summary Demonstration

Assessment Procedure

Assessment (Text Inclusions of Prior Slide) pan left to right around table with five people eating dinner pan right to left around table with five people sitting talking curly haired man stands up from the table closeup of grey haired lady, dinner table not visible grey haired lady across dinner table, green wine bottle visible in foreground grey haired lady across dinner table, camera pans right grey haired lady across dinner table, green wine bottle not visible in foreground partial view of person to the right talking to grey haired lady across dinner table closeup of short haired man sitting, without his hands clasped together closeup of blonde lady as she stands up, there is a fire in the background closeup of curly haired man without a hand on his face closeup of curly haired man as he stands up

Assessment Procedure, Grading

Assessment Metrics Duration (DU, <= 4% of the target video) Assessor time-on-task (TT) judging which ground truth segments were included in the summary The fraction of listed text segments from the full video included in the summary as judged by assessor (IN) Ease of use to find desired content (EA) How redundant was the summary (RE) …ideal summary would have the smallest DU and TT necessary to achieve sufficient IN performance with high user satisfaction based on subjective EA and RE

First Study: cluster, 25x, pz cluster: based on iterative color clustering with junk frame removal, backfilling of unused space and audio coherence pz: cluster-based, but use domain knowledge that “pans/zooms” are important to keep pan or zoom sequences in 1-3 second runs as representing clusters 25x: select every 25 th frame of target video to produce 4% (1/25) video summary with apparent 25x playback (use same coherent audio as with cluster) – note that no junk frame filtering is used

Results from TRECVID 2007 Evaluation Baseline 1Baseline 2cluster TT (secs.)105.66100.48101.83 IN0.590.580.60 EA (5 best)3.443.413.37 RE (5 best)3.523.503.62

Participants and Results, Study 1 4 CMU students and staff following the NIST procedure4 CMU students and staff following the NIST procedure

Study 1 Discussion 25x excellent method to produce summary for high IN25x excellent method to produce summary for high IN TT metric for 25x also highTT metric for 25x also high RE metric poor for 25x (but inversely related to IN…)RE metric poor for 25x (but inversely related to IN…) EA for 25x better than cluster (perhaps helped by audio)EA for 25x better than cluster (perhaps helped by audio) Subjective metrics TT, RE, and EA best for pzSubjective metrics TT, RE, and EA best for pz

Question Leading to Study 2 How fast is too fast? (see [Wildemuth 2003] cited in paper) 25x? 50x? 100x? Will “pz” differentiate from these?

Second Study: 25x, 50x, 100x, pzA 25x: as before (every 25 th frame, coherent audio) 50x: select every 50 th frame of target video to produce 2% (1/50) video summary with apparent 50x playback 100x: select every 100 th frame, 1% summary pzA: as before but with audio same as 25x audio, filled to be a 4% summary

Participants and Results, Study 2 15 subjects (8 female, 7 male; age range [21, 35] with average age 25.7) following the NIST procedure15 subjects (8 female, 7 male; age range [21, 35] with average age 25.7) following the NIST procedure

Study 2 Discussion 25x excellent method to produce summary for high IN (0.73)25x excellent method to produce summary for high IN (0.73) 50x also excellent for high IN (0.68), >> pzA and 100x50x also excellent for high IN (0.68), >> pzA and 100x TT metric for 25x also high: 25x and pzA (the two 4% summaries) both significantly slower than 50x and 100xTT metric for 25x also high: 25x and pzA (the two 4% summaries) both significantly slower than 50x and 100x RE metric shows 25x worse than pzARE metric shows 25x worse than pzA EA for 100x worse than others (100x has fastest TT)EA for 100x worse than others (100x has fastest TT) 50x produces excellent IN performance at 2/3 the time cost (TT) of 25x50x produces excellent IN performance at 2/3 the time cost (TT) of 25x 100x too fast: IN significantly worse than 50x, EA poor100x too fast: IN significantly worse than 50x, EA poor

Discussion We believe inclusion of audio narrative along with sped- up video made 25x and 50x more playable; at 100x the audio becomes too short/choppy to contribute well 15 subjects for Study 2 not as careful as NIST or Study 1 assessors, e.g., TT of 77.5 vs. 110 or 102 seconds If these 15 better reflect true users, time savings important (and hence TT is important metric) How will 50x hold up as a baseline? (To be discussed in the context of TRECVID 2008 BBC rushes summarization task – it does well on IN, poor on TT, RE)

Conclusions For BBC rushes, 50x works quite wellFor BBC rushes, 50x works quite well Domain knowledge (here, attempting to preserve pans/zooms) did not distinguish itselfDomain knowledge (here, attempting to preserve pans/zooms) did not distinguish itself Improve detector for “significant” pans/zooms Improve detector for “significant” pans/zooms Sacrifice coverage for pan/zoom inclusion Sacrifice coverage for pan/zoom inclusion Interactive summary control an area of promise, e.g., 50x until neighborhood of interest found, then pz to see pans/zooms and more detailInteractive summary control an area of promise, e.g., 50x until neighborhood of interest found, then pz to see pans/zooms and more detail Thanks to NIST, BBC, and TRECVID organizers for making this investigation possible. This work supported by the National Science Foundation under Grant Nos. IIS- 0205219 and IIS-0705491

Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel,

Similar presentations

Presentation on theme: "Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel,

Similar presentations

Presentation on theme: "Evaluating Audio Skimming and Frame Rate Acceleration for Summarizing BBC Rushes CIVR July 8, 2008 Mike Christel, Wei-Hao Lin, and Bryan Maher {christel,"— Presentation transcript:

Similar presentations

About project

Feedback