Presentation on theme: "Using Multiple Synchronized Views Heymo Kou. What is the two main technologies applied for efficient video browsing? (one for audio, one for visual."— Presentation transcript:
From your ◦ Smart phones ◦ Notebooks ◦ Webcams ◦ Digital camera and camcorders ◦ Security and monitoring cameras With advanced streaming technology ◦ Fast Internet access ◦ MPEG-4 format
Search through categories ◦ Similar to Internet shopping mall We search for big categories Then smaller categories …and so on… User should choose which to browse ◦ Should check whether the selected data matches what user was finding Time consuming! Manual categorizing and annotation ◦ One by one?
Too complicated ◦ Lack of efficient algorithm Time consuming ◦ Multimedia calculation ∝ exponential Inaccuracy ◦ Video data is increasing exponentially Cataloging manual has a somewhat limit point ◦ Manually cataloging is done by human hand that mistakes can be happened
MPEG-7 Standards Speech indexing Shot Boundary Detection Time Scale Modification of Audio Signals Storyboards, Moving Storyboards and Animation Adaptive Accelerating Fast Playback Streaming Synchronized Views
Standardized by ISO/IEC ◦ International Standard Organization ◦ International Electrotechnical Commission Not a video encoding format XML to store metadata ◦ Attached to timecode in multimedia By this tag ◦ Able to index and search efficiently Yet, improvement is needed
Search through speech transcripts ◦ Finds familiar metaphor of free text search Automatic speech recognition (ASR) ◦ Indexed transcript → semantic information Main advantage : Representation ◦ Speech is built of words
Frame Key frame Shot ◦ Group of frames which represents similar frames Start key frame end key frame animation
Context ◦ Meaningful information within multimedia data 3 levels of video browsing ◦ Browsing a large collection of videos ◦ Browsing a ranked list of videos ◦ Browsing a single video to find relevant segments
Shot Boundary Detection(SBD) algorithm ◦ Completely automatic Key frames are selected and extracted ◦ Saved as JPEG files High Accuracy and Efficiency ◦ Still, fault detection problem is unsolved
Audio browsing is as important as video browsing ◦ Except images, most digital contents are audible Faster audio browsing is necessary Speeding up of audio signal by ◦ By deleting small audio segments ◦ Especially, human speech signals are quasi-periodic
Storyboard ◦ a set of one or more pages, each consists of a two dimensional array of key-frames, sorted in chronological order. Animation ◦ a quick slide show, where each of the key-frames is shown for a fixed short period (e.g., 0.6 seconds) Moving Storyboard (MSB) ◦ the animated key frames, fully synchronized with the original audio track. Each key-frame is shown for the entire duration of the associated shot.
Very fast video playback (without audio) Ordinary fast forward depends only on speed ◦ There is a chance to miss important scene Accelerates until new scene is met Requires less computation load
Server preprocesses media ◦ Keep same media, but different speed encoded When user selects other speed ◦ 1. pause current media ◦ 2. open file with same content with selected speed ◦ 3. seek to the corresponding position ◦ 4. play the selected view Needs no extra computational load ◦ However, requires more storage: Tradeoff
Can browse multiple videos at once Split frames every given time ◦ (i.e 10 seconds) Strong information scent is visible ◦ With aggregation of occurrences
View Visual Audio Typical speedup rate StaticDynamic Full video (w/o TSM) ○○1 – 2X Video Skim ○○2 – 20X Slide show (w/o TSM) ○○1 – 2X Adaptive Fast Playback ○5 – 30X Animation ○10 – 40X Storyboard, mosaic ○NA
Streaming synchronized views and movieDNA ◦ Less computation, multiple videos at once Active accelerating fast playback ◦ Most useful at analyzing surveillance videos SBD & TSM ◦ Efficient for implementing above technologies Then, what is current limitation?
What is the two main technologies applied for efficient video browsing? (one for audio, one for visual content) Answer : The two main technologies are Shot Boundary Detection(SBD) for visual content and Time Scale Modification(TSM) for audio signals
Shot Boundary Detection ◦ http://muvis.cs.tut.fi/sbd.html http://muvis.cs.tut.fi/sbd.html Key frame ◦ http://en.wikipedia.org/wiki/Key_frame http://en.wikipedia.org/wiki/Key_frame Synchronous Overlap-Add ◦ http://www.surina.net/article/time-and-pitch-scaling.html http://www.surina.net/article/time-and-pitch-scaling.html Digital Video Market Growth ◦ http://articles.businessinsider.com/2011-12- 13/research/30508929_1_fios-cable-providers-video-streaming- service/2 http://articles.businessinsider.com/2011-12- 13/research/30508929_1_fios-cable-providers-video-streaming- service/2 Amount of Digital data ◦ http://www.emc.com/collateral/analyst-reports/diverse- exploding-digital-universe.pdf http://www.emc.com/collateral/analyst-reports/diverse- exploding-digital-universe.pdf
Streaming synchronized view ◦ http://www.visus.uni-stuttgart.de/forschung/visualisierung-und-visual- analytics/visuelle-analyse-videostroeme/adaptive-fast-forward-for- video-surveillance.html http://www.visus.uni-stuttgart.de/forschung/visualisierung-und-visual- analytics/visuelle-analyse-videostroeme/adaptive-fast-forward-for- video-surveillance.html MovieDNA ◦ http://homepage.mac.com/juggle5/WORK/publications/HICSS34. pdf http://homepage.mac.com/juggle5/WORK/publications/HICSS34. pdf