Presentation is loading. Please wait.

Presentation is loading. Please wait.

卓 越 計 畫卓 越 計 畫 National Taiwan University 1 Midterm Report on Ja-Ling Wu Graduate Institute of Networking and Multimedia Department of Computer Science.

Similar presentations


Presentation on theme: "卓 越 計 畫卓 越 計 畫 National Taiwan University 1 Midterm Report on Ja-Ling Wu Graduate Institute of Networking and Multimedia Department of Computer Science."— Presentation transcript:

1 卓 越 計 畫卓 越 計 畫 National Taiwan University 1 Midterm Report on Ja-Ling Wu Graduate Institute of Networking and Multimedia Department of Computer Science and Information Engineering National Taiwan University Content Science for a Media-Rich Life

2 卓 越 計 畫卓 越 計 畫 National Taiwan University 2 Goal – Re-organize the Content to Make it More Attractive to Users, More Convenient and Efficient to Use –User-centric user attention/preference modeling, emphasizing interesting parts while deleting undesired parts, etc. –Information extraction and selection Initial extraction of the quintessence of digital media –Semantic/Object-based re-organized efficiently, conveniently and semantically, easier to browse and efficient to use –Mining knowledge discovery: patterns, rules, constraints, etc. –Emotion discovery emotion embedded in content may be helpful in organization User response on Content may be helpful in content re- organization Content Science of a Media-Rich Life : Multi-modal Content Organization

3 卓 越 計 畫卓 越 計 畫 National Taiwan University 3 Sub-project 1 Content Semantics Ontology/Taxonomy Question/Answering Content Organization and Understanding Sub-project 3 Interactive User Interface Content Feature Extraction and Descriptor Generation Content Data Acquisition and Quality Enhancement Sub-project 4 Content Processing Platform and Distribution Environment Guide-line: From Engineering to Science Video capture and compression module Video stabilization Quaterion formulation for color science Skeleton-based 3D graphic models MPEG-7 transcoding hints Emotional features Multi-modal Content Organization : Overview IEEE Transactions on Signal Processing, July The Visual Computer, vol.22, 2006.

4 卓 越 計 畫卓 越 計 畫 National Taiwan University Multi-modal Content Organization : Overview Content Data Acquisition and Quality Enhancement Content Feature Extraction and Descriptor Generation 4 Multi-modal Semantic Events Detection Content topic Organization Content Summarization and Title Generation Content Information Extraction and Selection Content Understanding and Organization User Attention Modeling User Response on Content User-oriented Technologies Content Embedded Emotion Discovery Content Mining and Knowledge Discovery Content-oriented Technologies Content Retrieval and Question/Answering : Integrated Demo Systems

5 卓 越 計 畫卓 越 計 畫 National Taiwan University 5 Multi-modal Semantic Events Detection Content topic Organization Content Summarization and Title Generation Content Information Extraction and Selection Content Understanding and Organization User Attention Modeling User Response on Content User-oriented Technologies Content Embedded Emotion Discovery Content Mining and Knowledge Discovery Content-oriented Technologies

6 卓 越 計 畫卓 越 計 畫 National Taiwan University 6 User Attention –In multimedia documents, a spatial area or portion that viewers are interested in or pay more attention to than others. Visual Process –The biological mechanism of human beings determine where to locate their user attention. –Bottom-up process Involuntarily be drawn to visual stimulus. –Top-down process Voluntarily focus on predefined goals. region of user attention User-oriented Technologies : User Attention Modelling(1/3) IEICE Transactions on Information and Systems, July 2005.

7 卓 越 計 畫卓 越 計 畫 National Taiwan University 7 Examples of Feature Maps Intensity contrastred/green color blue/yellow colorhorizontal motionvertical motion original frame User-oriented Technologies : User Attention Modelling(2/3)

8 卓 越 計 畫卓 越 計 畫 National Taiwan University 8 Video Demo Sports AccidentMan Walking User-oriented Technologies : User Attention Modelling(3/3)

9 卓 越 計 畫卓 越 計 畫 National Taiwan University 9 Physiological signal analysis Sensor ECG EMG GSR Respiration PPG( 脈搏 ) EEG Physiological Sources Stress Hormore release Parasympathetic activation Emotion behavior Sympathetic activation Reflex potentiation User-oriented Technologies : User Response on Content(1/2)

10 卓 越 計 畫卓 越 計 畫 National Taiwan University 10 Different Response Patterns when Viewing Different Human Faces An Example of Brain Response Analysis with Functional MRI – neutral faces – faces causing subjective emotion User-oriented Technologies : User Response on Content(2/2)

11 卓 越 計 畫卓 越 計 畫 National Taiwan University 11 Multi-modal Semantic Events Detection Content topic Organization Content Summarization and Title Generation Content Information Extraction and Selection Content Understanding and Organization User Attention Modeling User Response on Content User-oriented Technologies Content Embedded Emotion Discovery Content Mining and Knowledge Discovery Content-oriented Technologies

12 卓 越 計 畫卓 越 計 畫 National Taiwan University 12 Major Faces(1/3) Definition: The faces that are seen more often in a program Major faces imply high-level semantics e.g.: leading actors in movies, starring players in sports videos Content-oriented Technologies : Content Information Extraction and Selection– (1) IPPR Conference on Computer Graphics and Image Processing, Aug. 2005

13 卓 越 計 畫卓 越 計 畫 National Taiwan University 13 Experiment Results Face extraction results: Content-oriented Technologies : Content Information Extraction and Selection– (1) Test data CorrectMissedFalse Test1 (85) Test2 (68) 6422 Test3 (51) Face tracking results Major Faces(2/3)

14 卓 越 計 畫卓 越 計 畫 National Taiwan University 14 Application to news segmentation: – Commercial filtering is done first – Anchorperson is determined – News are segmented accordingly Content-oriented Technologies : Content Information Extraction and Selection– (1) Major Faces(3/3)

15 卓 越 計 畫卓 越 計 畫 National Taiwan University 15 Commercial Detection(1/3) How to identify the commercial (CM) segments that have been inserted into the given video? We focus on news/talk show programs. News Commercial segment Content-oriented Technologies : Content Information Extraction and Selection– (2)

16 卓 越 計 畫卓 越 計 畫 National Taiwan University 16 Commercial Detection(2/3) Video Shot change detection Label CM breaks candidates (a) CM break candidates Outliers removing Commercial (c) Remove outliers Speech/music discrimination Motion analysis and scene detection Caption detection Boundary Refinement (b) Boundary refinement Cuts per minute CM: more shot changes more (or larger) motions higher volume Content-oriented Technologies : Content Information Extraction and Selection– (2) CM: Overlay Texts – appear in a short duration News: Headlines – always appear in on short Lecture Notes in Computer Science, vol. 3767, 2005.

17 卓 越 計 畫卓 越 計 畫 National Taiwan University 17 Marking Exact Boundaries T as TmTm Shot Change Point t Caption ratio difference Volume change Rough boundary Exact boundary Content-oriented Technologies : Content Information Extraction and Selection– (2) Commercial Detection (3/3)

18 卓 越 計 畫卓 越 計 畫 National Taiwan University 18 Game Progress Pitch Shot Pitch Scene Block BullpenStrikeBallLast Pitch, Swing Outfield ShotBase ShotOther Shot Pitch shot Pitch shot No effective playOne effective play occurs (a) (b)(c) PSB: The video segment between two pitch shots PSB is the basic unit for baseball events. Number of score, number of out, base-occupation situation Content-oriented Technologies : Semantic Event Detection and Summarization Generation Baseball Video(1/6)

19 卓 越 計 畫卓 越 計 畫 National Taiwan University 19 Rule-Based Decision 3 0 Team1 Team2 SBO 010 Base-occupation status #outs #score 3 2 Team1 Team2 SBO 000 Base-occupation status #outs #score Caption in i-th pitch shot Caption in (i+1)-th pitch shot (#outs1, #score1, _base1) (#outs2, #score2, _base2) (0, 0, 1)(0, 2, 0) (0, 2, -1) compare (∆outs, ∆score, ∆base) Home Run Content-oriented Technologies : Semantic Event Detection and Summarization Generation Baseball Video(2/6) IEEE International Conference on Multimedia and Expo, 2005.

20 卓 越 計 畫卓 越 計 畫 National Taiwan University 20 Rule-based Decision Tree Yes 1B B[3]=1 3BHR Δ b i + Δ s i =0 Nothing Stolen base Δ n i + Δ s i =0 Δ s i =0 Δ b i =0 Out Sacrificed Catch steal Double play No Δ o i <2 Δ o i =0 Δ n i + Δ s i =1 B[1]=1 B[2]=1 2B Status filtering: Yes Between the ith and (i+1)-th pitch shot Δ o i : changes of #outs Δ s i : changes of #scores Δ n i : changes of #occupied bases B[j]: whether the jth base is occupied No Δ o i <2 Double play Δ o i <2 Δ o i =0 Δ n i + Δ s i =1 B[1]=1 B[2]=1 2B Content-oriented Technologies : Semantic Event Detection and Summarization Generation Baseball Video (3/6)

21 卓 越 計 畫卓 越 計 畫 National Taiwan University 21 Event Discrimination Combine visual and speech info. to discriminate confused events. Confused segments video speech Key-phrase spotting Model-based Detection Confidence calculation Confidence calculation events Integrated decision Explicit events Event boundaries Content-oriented Technologies : Semantic Event Detection and Summarization Generation Baseball Video(4/6)

22 卓 越 計 畫卓 越 計 畫 National Taiwan University 22 System Framework Baseball Videos Extended applications Box score Generation Game Summarization Game Highlights Event-on-Demand Intelligent Browsing Thirteen concepts are detected: single (1B), double (2B), triple (3B), home run (HR), stolen base (SB), caught stealing (CS), fly out (AO), strikeout (SO), base on ball (Walk, BB), sacrifice bunt (SAC), sacrifice fly (SF), double play (DP), and triple play (TP). Geometric info. Shot detection Shot classification Predefined Field color Field color detection Pitcher detection Shot Classification Color info. Semantic Concept Detection Baseball rules Char recognition Symbol detection Rule-based decision Pitch shots Other shots Game-specific features extraction Model-based decision Confused concepts Concept Pool Content-oriented Technologies : Semantic Event Detection and Summarization Generation Baseball Video(5/6)

23 卓 越 計 畫卓 越 計 畫 National Taiwan University 23 Game Summarization & Game Highlight 興農 vs. 統一 比賽時間: 3 小時 14 分 16 分鐘 3 分 25 秒 6 分鐘 Man-made Summary Automatic Summary Automatic Highlight 1 Automatic Highlight 2 9 分鐘 (30 events) 31 events, 25 evts are in the man-made sum. Precision=0.806 Recall=0.833 Baseball Video(6/6) Content-oriented Technologies : Semantic Event Detection and Summarization Generation

24 卓 越 計 畫卓 越 計 畫 National Taiwan University 24 News item detection: Segmented by the last Anchorperson frame (in shot n) and the first Anchorperson frame (in shot n+1), if commercials are filtered out first. News Video(1/5) Content-oriented Technologies : Semantic Event Detection and Summarization Generation

25 卓 越 計 畫卓 越 計 畫 National Taiwan University 25 = Major face Content-oriented Technologies : Semantic Event Detection and Summarization Generation News Video(2/5) IEEE International Conference on Image Processing, Oct

26 卓 越 計 畫卓 越 計 畫 National Taiwan University 26 Identification of Anchorperson Shots (Major) FaceNon-Face Studio Non- Studio Video Cue: Major Face Audio Cue: (Un-)Voiced Speech Ratio SBNS OSLEAD SB: Sound Bite OS: Overlap Sound LEAD: Anchorperson shot NS: Nature Sound Content-oriented Technologies : Semantic Event Detection and Summarization Generation News Video(3/5)

27 卓 越 計 畫卓 越 計 畫 National Taiwan University 27 Detectable Important Events and Importance Measurement Continuous events –Events can be detected in a period of continuous frames Face detection (who) Caption detection (what, when) Instant events –An event can only be detected in a special instant Flashlight (who) Zooming operation (what, where) Panning operation (where) The beginning and ending of a shot Content-oriented Technologies : Semantic Event Detection and Summarization Generation News Video(4/5)

28 卓 越 計 畫卓 越 計 畫 National Taiwan University 28 Original Summary Conciseness: As short as possible. Coverage: Covers key points. Guide-line for Summarization: Importance Curve Content-oriented Technologies : Semantic Event Detection and Summarization Generation Context: Defines terms before using them. Coherence: Flows naturally and fluidly. News Video(5/5)

29 卓 越 計 畫卓 越 計 畫 National Taiwan University 29 Multi-media Content in the Future Network Era Most Attractive Form of the Network Content will be in Multi-media, which usually Includes Speech Information The Speech Information, if Included, usually Tells the Subjects, Topics and Concepts of the Multi-media Content, thus Becomes the Key for Indexing, Retrieval and Browsing Future Integrated Networks Real–time Information – weather, traffic – flight schedule – stock price – sports scores Electronic Commerce – virtual banking – on–line transactions – on–line investments Knowledge Archieves – digital libraries – virtual museums Intelligent Working Environment – e–mail processors – intelligent agents – teleconferencing – distant learning Private Services – personal notebook – business databases – home appliances – network entertainments Content Understanding and Organization : Content Topic Organization(1/6)

30 卓 越 計 畫卓 越 計 畫 National Taiwan University 30 Retrieving Archives of Multi-media/Spoken Documents Written Documents are Better Structured and Easier to Browse — in paragraphs with titles — easily shown on the screen — easily decided at a glance if it is what the user is looking for Multi-media/Spoken Documents are just Video/Audio Signals — not easy to be shown on the screen — the user can ’ t go through each one from the beginning to the end during browsing — better approaches for efficient retrieval /browsing are needed Content Understanding and Organization : Content Topic Organization(2/6)

31 卓 越 計 畫卓 越 計 畫 National Taiwan University 31 Multi-media/Spoken Document Understanding and Organization Key Term/Named Entity Extraction from Multi-media/Spoken Documents — very often keywords and/or out-of-vocabulary (OOV) words Multi-media/Spoken Document Segmentation — automatically segmenting a spoken document into short paragraphs Information Extraction for Multi-media/Spoken Documents — extraction of key information such as who, when, where, what and how(4W1H) Summarization and Title Generation for Multi-media/Spoken Documents — automatically generating a summary and a title (in text or speech form) for each short paragraph Topic Analysis and Organization for Multi-media/Spoken Documents — analyzing the subject topics for the short paragraphs and organizing them into graphic structures Content Understanding and Organization : Content Topic Organization(3/6) IEEE Signal Processing Magazine, Sept

32 卓 越 計 畫卓 越 計 畫 National Taiwan University 32 System Block Diagram for Spoken Document Understanding and Organization Content Understanding and Organization : Content Topic Organization(4/6)

33 卓 越 計 畫卓 越 計 畫 National Taiwan University 33 User ’ s Short Query Produces too many Retrieved Spoken Documents Difficult to be Shown on the Screen The System may Provide Better Information about the Archive to the User, while the User may Specify Clearer Queries for the System Multi-modal Dialogue with Topic Hierarchies Spoken Document Archive Topic Hierarchy User Proposed System Multi-modal Dialogue Content Understanding and Organization : Content Topic Organization(5/6)

34 卓 越 計 畫卓 越 計 畫 National Taiwan University 34 System Block Diagram for Dialogue with Topic Hierachies Content Understanding and Organization : Content Topic Organization(6/6)

35 卓 越 計 畫卓 越 計 畫 National Taiwan University 35 From Semantic Event Detection to Knowledge Discovery Research Focuses: Discovering Time-Variant Patterns Mining Data Streams Labeling Unclustered Categorical Data. Content Understanding and Organization : Content Mining and Knowledge Discovery(1/2)

36 卓 越 計 畫卓 越 計 畫 National Taiwan University 36 Mining Data Streams Streams summarization techniques and Stream mining techniques –Resource-Aware Mining for Data Streams (RAM-DS) –Clustering on Demand for Multiple Data Streams (COD) –Integrating DCT and DWT for Approximating Cube Streams (DAWA) Hardware-enhanced mining framework Hardware Stream Processor Software Stream Processor Buffer Synopsis In Memory...… Data Streams Results RAM-DS, COD, DAWA Content Understanding and Organization : Content Mining and knowledge Discovery(2/2) Proceedings of Emerging Information Technology Conference, 2005.

37 卓 越 計 畫卓 越 計 畫 National Taiwan University 37 Multi-model Content Organization : Content Embedded Emotion Discovery Emotion Analysis for Media Physiological Signal Analysis Psychological Analysis (Evaluation) Content Embedded Emotion Discovery User Response on Content Valence-Arousal space

38 卓 越 計 畫卓 越 計 畫 National Taiwan University 38 Multi-modal Content Organization : Inter-project Relationship Content Semantics Multi-modal Content Organization Reconfigurable Multimedia SoC User-centric Interactive Media A CB A:( i)Ontologies for broadcast news and baseball programs (ii)On-line Q&A systems for broadcast news and baseball programs B:( i)Gesture-based Interface for broadcast news and baseball programs (ii)PDA-based Interface for broadcast news and baseball programs C:( i)Real-time MPEG-1/2 Encoder (ii)Real-time MPEG-2 to MPEG-4 Transcoder

39 卓 越 計 畫卓 越 計 畫 National Taiwan University 39 Multi-modal Content Organization : Post Project Plan (a) content-oriented technologies, (b) user-oriented technologies, (c) content embedded emotion discovery, (d) content mining and knowledge discovery, Additional research items in the third and fourth year.


Download ppt "卓 越 計 畫卓 越 計 畫 National Taiwan University 1 Midterm Report on Ja-Ling Wu Graduate Institute of Networking and Multimedia Department of Computer Science."

Similar presentations


Ads by Google