Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
Query Specific Fusion for Image Retrieval
Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Information Retrieval in Practice
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
PageRank for Product Image Search Yushi Jing, Shumeet Baluja College of Computing, Georgia Institute of Technology Google, Inc. WWW 2008 Referred Track:
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Ranking and Classifying Attractiveness of Photos in Folksonomies Jose San Pedro and Stefan Siersdorfer University of Sheffield, L3S Research Center WWW.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Post-Ranking query suggestion by diversifying search Chao Wang.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Clustering of Web pages
Sarthak Ahuja ( ) Saumya jain ( )
Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Presentation transcript:

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of Sheffield, UK SIGIR Summarized and Presented by Hwang Inbeom, IDS Lab., Seoul National University

Copyright  2009 by CEBT Large Amount of Data on YouTube  Traffic to/from YouTube accounts for over 20% of the web total Comprising 60% of on-line watched videos  Growing beyond human perception  Necessity to provide effective knowledge mining and retrieval tools 2

Copyright  2009 by CEBT Knowledge Mining and Retrieval  Making use of human annotation: Folksonomy Provides relevant results at a relatively low cost Applications – Topic detection and tracking – Information filtering – Document ranking – Etc.  However, content-based retrieval techniques are not mature enough Folksonomy-based techniques outperform content-based techniques 3

Copyright  2009 by CEBT Problem: Poorly Annotated YouTube Videos  Hard to annotate videos Intellectually expensive process Time consuming job  Low-quality tags Often very sparse Lack consistency Present numerous irregularities  Difficult to provide retrieval and knowledge extraction relying on textual features 4

Copyright  2009 by CEBT Motivation  Significant amount of near-duplicate videos Over 25% near-duplicate videos detected in search results Has been considered as a problem of online videos  Authors have seen this redundancy as a feature Linkage between two different videos Exploit redundancies to obtain richer video annotations 5

Copyright  2009 by CEBT PageRank-like Graph of Videos 6

Copyright  2009 by CEBT PageRank-like Graph of Videos 7 Overlap Graph G O = (V O, E O ) Overlap Graph G O = (V O, E O )

Copyright  2009 by CEBT Edge in Graph 8  Means video i and j has redundant visual information  Three types of links Duplicate videos Part-of relationship Overlapping Video i Video j

Copyright  2009 by CEBT Related Work: VisualRank (WWW 2008)  Builds a graph of images using visual similarity between two images  Runs PageRank algorithm to re- rank images 9

Copyright  2009 by CEBT Automatic Tagging  Different approach with that of VisualRank Aims to enrich annotations Not to improve search result  Three methods Simple neighbor-based tagging Overlap redundancy aware tagging TagRank: Context-based tag propagation in video graphs 10

Copyright  2009 by CEBT Simple Neighbor-based Tagging  Transforms G O Into the directed graph G’ O (V’ O, E’ O ) of overlapping videos  Weighting function of (i,j) describes to what degree video j is covered by video i 11 Video i Video j w(v i, v j ) w(v j, v i )

Copyright  2009 by CEBT Simple Neighbor-based Tagging (contd.)  Gets tag t’s relevance score for a video from information of adjacent videos Weighted sum of influences of overlapping videos tagged by t Counts only adjacent videos’ tags 12 if v j is tagged with t otherwise

Copyright  2009 by CEBT An Example 13 t t t t t t t t t’s relevance score

Copyright  2009 by CEBT Overlap Redundancy Aware Tagging  Potential high increase of relevance score if a video has multiple redundant overlaps  Contribution of same tag is reduced by relaxation parameter 14

Copyright  2009 by CEBT TagRank  Tag weight propagates through the overlap graph  Relevance scores are computed in matrix form TR converges into a certain value: solved with power iteration method Start power iteration with original tagging information and limited number of iteration – To keep original tag relevance – To prevent TR(t) converging into uniform value 15 t t

Copyright  2009 by CEBT Evaluation  Two kinds of evaluation: Machine-oriented and human-oriented view Data organization with automatically generated tags – Classification – Clustering User-based evaluation 16

Copyright  2009 by CEBT Data Collection  38,283 videos: initial set C Returned videos with top 500 general queries Together with related videos given with results  Redundancy analysis Over 35% of videos (V O ) overlap with one or more other videos 17

Copyright  2009 by CEBT Data Organization  Classification with 7 YouTube categories Each of them is containing over 900 videos in V O Binary classification with SVM – Feature vectors constructed with original tags/automatically generated tags Four strategies – BaseOrig: Only considering user-provided tags – NTag: Simple Neighbor-based tagging – RedNTag: Overlap redundancy aware tagging – TagRankΓ: TagRank with Γ iterations 18

Copyright  2009 by CEBT Data Organization  Clustering k-Means clustering Partition videos into k categories  Neighbor-based tagging and overlap redundancy aware tagging outperform baseline and TagRank methods in both experiments 19

Copyright  2009 by CEBT User-based Evaluation  Assessors rate new tags with web interface Increasingly higher average score when considering tags having higher autotag relevance score 20

Copyright  2009 by CEBT Conclusions  Content redundancy in social sharing systems can be used to obtain richer annotations  Additional information obtained by automatic tagging can largely improve automatic organization of content There is information gain for users also  Future work Authors plan to generalize this work to consider different domains – Photos in Flickr – Text in Delicious Analysis and generation of deep tags – Tags linked to a small part of larger media source 21

Copyright  2009 by CEBT Discussion  Good idea and good formalization  Would be better if performance of TagRank were good Considering only neighbors is too naïve method  How can we deal with overhead of visual processing?  Would it be scalable enough to apply it to all videos in YouTube? 22