Presentation is loading. Please wait.

Presentation is loading. Please wait.

Question-Answering of Large News Video Archives CHUA, Tat-Seng, Yang, Hui, Chaisorn, Lekha & Zhao, Yun-Long School of Computing National University of.

Similar presentations


Presentation on theme: "Question-Answering of Large News Video Archives CHUA, Tat-Seng, Yang, Hui, Chaisorn, Lekha & Zhao, Yun-Long School of Computing National University of."— Presentation transcript:

1 Question-Answering of Large News Video Archives CHUA, Tat-Seng, Yang, Hui, Chaisorn, Lekha & Zhao, Yun-Long School of Computing National University of Singapore Email: chuats@comp.nus.edu.sg Web: http://www.comp.nus.edu.sg/~chuats

2 Outline of Talk Introduction and Motivation News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

3 3 Personalized News Video Retrieval Infotainment, including news video, is one of the major applications of MM Technology In a personalized news video scenario, users interact with the system to enquire info such as: o o show me latest news video on Iraq Iraq o o highlight of last nights European football European football o o Results are time-specific Users increasingly want to see video news, supplemented with audio and text o o and summarized to as much detail as is necessary In a more futuristic setup, these will be accomplished through natural human-oriented I/O

4 4 Issues to Resolve Imprecision of users queries o o highlight of football match last night? Extraction of semantic contents of video: o o Multi-modality o o Multi-sources Segmentation of news video into story units with genre classifications Summarization of info for viewing at different level of details

5 5 What Kinds of Data Do we Have? Most research in the past has looked into only one source o o Example, video and its accompanying audio track, + ASR In most real-life applications, information is readily available in multiple sources: o o Broadcast news -- video and audio o o Web-based news articles (by news stations) o o On-line wired news (by news agencies) o o Other general resources: ontologies, dictionary etc… Other types of info increasingly used in IR community: o o User models: query logs, user profiles etc. A challenge in developing usable systems.. How to use these available data effectively In co-training/ testing type framework?? Ignoring these obvious data resources will result in unsatisfactory solutions.

6 6 Outline of Our Approach In this talk, I will describe our approach in developing systems to handle large scale video corpuses – TREC video Sources of data used: o o News video itself: visual, audio features, ASR o o External sources: on-line news articles of the same period o o General resources – ontology of countries, dictionary - WORDNET Approach (see architecture) :

7 7 System Architecture of VideoQA Overview of QA on News Video Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6

8 Outline of Talk Introduction and Motivation News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

9 9 Video Story Segmentation for News Video First basic problem: break the news video into meaningful units based on stories. Issues: oHow to classify shots into the correct class/category? oHow to detect story boundaries? Most news adopt the structure similar to CNNs (?) IntroNewsCom1NewsFinanceNewsCom2SportsNewsWeather

10 10 Video Story Segmentation for News Video -2 To help alleviate the estimation problem in statistical learning, we adopt a two stage process: o oStage 1: Shot classification o oStage 2: Scene segmentation & classification The set of features considered o oVisual (color histogram, b/g change) o oTemporal [Motion activity, Audio type, Shot duration, speaker change] o oMid-Level [# of Faces, Shot type, # of Text Lines, and text- position, cue phrases]

11 11 Stage 1: Shot Classification Divide video sequence into shots Consider 13 categories of shots oIntro/Highlight oAnchor; 2-Anchor; Meeting; Speech oStill image shot; Text Scene oSports; Live reporting oFinance; Weather; Commercial; Special Perform classification using Decision Tree (SEE 6.0)

12 12 Stage 2: Scene Detection Employ Hidden Markov Model (HMM) to detect story boundaries Features (sequence level features) used at this stage: o Shot classes – shot tags o Scene change [c/u] o Speaker change [c/u] o Cue phrases at the beginning of new stories Input to HMM: [1cc 1uu 1cu..2cc 4c 4uu 6uu 6uu …. 2cc …. ] Tested on 120 hours of TREC video and achieve around 76% in F 1 accuracy in story segmentation TREC data may be down-loaded from TREC web sites later (?) (Chaisorn & Chua et al, ICME02, WWW Journal02, TREC03)

13 Outline of Talk Introduction and motivation News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

14 14 Text Transcript: from Speech to Text Need accurate transcript for QA onot a problem for document or story retrieval Performance of speech recognition system oAccuracy about 80% for news oMost errors are named entities – likely answer targets (ATs) oMost such errors are type substitution homonym problem oExamples: pneumonia new area; Tony Blair Teddy Bear How to correct errors in ATs? use phonetic sound matching to correct the errors oMay use confusion matrix successfully used in spoken doc m retrieval oProblem: low precision match to many irrelevant phrases One solution: limit scope of phonetic sound match oBy utilizing on-line text news of same period (extract base noun phrases and named entities) – reasonable

15 15 Use of External Resource to correct Speech Errors Extract all ATs from on-line news articles, A i = (a i1,.. a iq ) Given video transcript T i with a list of terms (t i1,.., t ip ) The basic problem is then to select an a ik A i to replace a sequence of terms s j T i that maximizes the probability: where s j contains one or more consecutive terms in T i Basic idea: use co-occurrence probabilities & phonetic matching to find most likely a ik A i to replace sequence of terms s j T i,: a) Extract list of probable ATs using co-occurrence probabilities a) Matching at phonetic syllable level; b) Matching at confusion syllable string level (see Wang & Chua, ACL03)

16 Outline of Talk Introduction and Motivation News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

17 17 System Architecture of VideoQA Overview of QA on News Video (Similar to our text-based QA work – Yang & Chua, SIGIR03)

18 18 Users typical issue short queries (several keywords): o odevelopment in North Korea o omatch last night o oQuery is ambiguous!! Analyze the query to extract: o oKey terms in query o oLikely answer target o oNP & NE in query o oType of video genre o oTemporal constraint o oDuration constraint Question Processing Example: football match last night? football, match football team (ORG-NAME) football match SPORTS LAST-NIGHT 30 seconds (default)

19 19 The query, however, is ambiguous! o oUse on-line news articles to provide the context (user independent) Basic Idea: Given original query q (o) : o oUse web (or news sites) and dictionary – WordNet o oFind terms (from web articles) co-occur frequently with q (o) o oExtract semantically related terms from WordNet o oAdd high probability terms into q (0) to get q (1) Expect q (1) to contain more context terms than q (0) o oFor the football example: we expect q (1) to also contain terms like: arsenal, inter milan, soccer, etc (the big match last night) Query Reinforcement

20 20 Query Reinforcement Another Example q (0) = What are the symptoms of atypical pneumonia? q (1) = symptoms, pneumonia, virus, spread, fever, cough, breath, doctor Use q (1) to retrieve a list of news transcripts at story level

21 21 Final score is: where α k =1 and w kj = {w nj, w hj, w cj, w ej, w aj, w vj } The top K sentences are selected as the candidate answer sentences based on S ij Candidate Sentence Extraction For the retrieved transcript T i, we select sentences Sent ij that best match the user query as follows: onoun phrases, w nj onamed entities, w hj ooriginal query words q (0), w cj oexpanded query words q (1-0) = q (1) - q (0), w ej ovideo genre, w vj

22 Outline of Talk Introduction News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

23 23 Results Use 7 days of CNN news video from 13-19 Mar 2003 ocontained a total of 350 minutes of news video oretrieved about 600 news articles per day from the Alta Vista news web site during these 7 days Designed 40 factoid questions o28 general questions that are asked everyday o12 questions are date-specific oGive a total of 208 questions TranscriptCorrect AnswersAccuracy without error correction11655.8% with error correction15373.6% (To present in ACM Multimedia 03) Results

24 24 Results -- Example Query: What are the symptoms of atypical pneumonia?, the 3-sentence window selected by the QA engine is oS 1 : He and his two companions are now in isolation and the one hundred and fifty five passengers on the flight were briefly quarantined. oS 2 : Symptoms include high fever, coughing, shortness of breath and difficulty breathing. oS 3 : But health officials say there's no reason to panic. The video summary example (4 shots) is:

25 Outline of Talk Introduction News Video Processing & Story Segmentation Video Transcript Correction Question-answering on News Video Results Conclusion

26 26 Research in correcting speech recognition errors (ACL03, EMNLP02) News story and dialogue segmentation (Columbia U) (ICME03, ACL03) Question-answering in text (TREC02, SIGIR03) Infomedia Project o oUses multi-modality features effectively, esp speech o oInsufficient emphasis on external resources Works on Video-TREC - Large scale testing Collaboration with Ramesh jain (Georgia Tech) as part of Video Tagging Project o oEmploy TV-Anytime metadata for news (collaborate with ETRI Korea) o oAutomatic tagging of TV-Anytime metadata, and use it as basis for video QA Related Work

27 27 Works are preliminary o oMany processes needs to be automated Participating in this years Video-TREC and test on large scale corpuses (120 hours of news video) o oOn both story segmentation and retrieval Experience: o oStory Segmentation: content features are important, text or ASR feature less important o oRetrieval: Text or ASR is important; content features help in enhancing precision Current Work: o oBuild appropriate meta model to encode domain knowledge o oUse higher order statistics to analyze data KEY MESSAGE– Must incorporate domain model and utilize multi-modality, multi-source information Summary

28 28 THANK YOU

29 29 Question classification and possible video genres Answer TargetLikely Video GenreExample HumanAnchor, meeting, speech, General-news Who is the Secretary of State of the United States? LocationLive report, Anchor, General-news Where is Saddam Hussein hiding? OrganizationLive report, anchorWhich hospital is the center for SARS treatment in Singapore? TimeAnchor, General-newsWhen did the Iraq war start? NumberFinanceWhat is the expected GDP of Singapore this year? Sports, Text-sceneHow many points did Yao Ming score? Weather, Text-sceneWhat is the highest temperature tomorrow? ObjectAnchor, Still-image, Text- scene Which kinds of bombs are used in the current Iraq war? DescriptionAnchor, Text-sceneWhat does SARS stand for?

30 30 Question analysis QuestionWhat is the score of the football match last night? What are the symptoms of atypical pneumonia? q (0) score, football, match, last, night symptoms atypical pneumonia nfootball match, last nightsymptom, atypical pneumonia hfootballatypical pneumonia Answer TargetNumberDescription Video GenreSports, Text-sceneGeneral News

31 31 1. 1. Who is the British Prime Minister? 2. 2. Who is elected to be China's President? 3. 3. Who is the President of the United States? 4. 4. What is the name of the former Premier of China? 5. 5. What is the name of the new Premier of China? 6. 6. Who will pay the heaviest tallies? 7. 7. Who was arrested in Pakistan? 8. 8. Which musician called off his US tour? 9. 9. When will NASA resume shuttle flights? 10. 10. When will Germany, France and Russia meet? 11. 11. When is the funeral of DjinDjic? 12. 12. Which are the three countries involved in the summit today? 13. 13. Where was the summit held? 14. 14. Which city is the capital of Central African Republic? 15. 15. Which are the three major war opponent countries? 16. 16. To whom US withdrew the aid offer? 17. 17. Which country vowed to veto the resolution today? 18. 18. Which country's compromise proposal was rejected by US? 19. 19. Where is Kashmir Hotel? 20. 20. Where did Iraq invite the chief weapons inspectors to? List of Questions

32 32 21. Which city has the largest anti war demonstration? 22. Where did a AL QUEDA suspect arrested? 23. How many people attended the rally in San Francisco? 24. What is the cost of war? 25. How many people were killed in a Kashmir Hotel? 26. How many people participated in the rally in Madrid? 27. How many people were killed by the new pneumonia? 28. What are the symptoms of the atypical pneumonia? 29. What sanction did President Bush lift? 30. What was the name of the space shuttle broken apart in February? 31. Which rally shows the support for President Bush? 32. What is the official name for the mysterious pneumonia? 33. Which company tests their new passenger profiling system? 34. Name one Jewish holiday. 35. What is British stance? 36. How did Serbs Prime Minister die? 37. How is the anti-war protest in Madrid? 38. How is tomorrow's weather? 39. What is the conflict between US and Turkey? 40. What does the WHO call the new pneumonia? List of Questions – cont.

33 33 Some Remarks on Story Segmentation Task Our 2-stage approach helps alleviate the statistical estimation problem – requires less training data Similar works done in Columbia U oUsing maximum entropy method oFor video segmentation (ICME03) and dialogue segmentation (ACL03) oAchieves similar performance Our current work: oIntegration of multiple machine learning methods: HMM, ME, heuristic rule methods, and co-training approach oFusion of multiple modal features: visual/audio features, text (speech to text), meta-data + domain knowledge oNote: Use only text feature (ASR) performs badly

34 34 We perform matching at 2 levels to find the most likely a ik A i to replace the sequence of terms s j T i,: a) Phonetic syllable level; b) confusion syllable string level Multi-tier mapping (Wang, Chua, ACL03) RecallPrecision At each level, we compute: o oLCS(q i,c j ): gives longest common subsequence (LCS) match between a ik and s j at phonetic syllable level in the order of their occurrence o oM k == I for Levels a and b match; and == coefficients of confusion matrix at Level c match

35 35 The query, however, is ambiguous! o oUse on-line news articles to provide the context (user independent) Basic Idea: Given original query q (o) : o oGo to web (or news sites) to retrieve top N documents o oExtract terms with high co-location probabilities with q (o), C q o oExtract semantically related terms from WordNet, G q & S q o oExtra terms to be added: K q = C q + (G q S q ) o o(q (1) = q (0) +{top m terms K q with weights>=σ} Expect q (1) to contain more context terms than q (0) o oFor the football example: expect q (1) to also contain terms like: real madrid, manchester united, soccer Query Reinforcement

36 36 q (0) = What are the symptoms of atypical pneumonia? q (1) = symptoms, pneumonia, virus, spread, fever, cough, breath, doctor Query Reinforcement Another example Use q (1) to retrieve a list of news transcripts at story level


Download ppt "Question-Answering of Large News Video Archives CHUA, Tat-Seng, Yang, Hui, Chaisorn, Lekha & Zhao, Yun-Long School of Computing National University of."

Similar presentations


Ads by Google