Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
An Interactive-Voting Based Map Matching Algorithm
Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
TrustRank Algorithm Srđan Luković 2010/3482
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
1 Adaptive relevance feedback based on Bayesian inference for image retrieval Reporter : Erica Li Date :
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Link Analysis HITS Algorithm PageRank Algorithm.
Overview of Search Engines
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
A Model for Fast Web Mining Prototyping Nivio Ziviani UFMG – Brazil Álvaro Pereir a Ricardo Baeza-Yates Jesus Bisbal UPF – Spain.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Hongbo Deng, Michael R. Lyu and Irwin King
Post-Ranking query suggestion by diversifying search Chao Wang.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Automated Information Retrieval
Information Retrieval in Practice
HITS Hypertext-Induced Topic Selection
Birgit Juen.
Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu.
Author: Kazunari Sugiyama, etc. (WWW2004)
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
Google News Personalization: Scalable Online Collaborative Filtering
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Web Mining Department of Computer Science and Engg.
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Junghoo “John” Cho UCLA
Topic: Semantic Text Mining
Jia-Bin Huang Virginia Tech
Presentation transcript:

StalemateBreaker: A Proactive Content-Introducing Approach to Automatic Human-Computer Conversation Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1 1School of EECS, Peking University, China 2Natural Language Processing Department, Baidu Inc., China

Human-computer Conversation One of the most challenging problems in artificial intelligence The computer either searches or synthesizes a reply given an utterance (called query) issued by a user Industrial products like Siri of Apple, Xiaobing of Microsoft, and Dumi of Baidu, etc.

Passive or Proactive Passive: To “respond” only Proactive: To introduce new content when stalemate occurs Human-human conversation statistics:

Mixed-initiative Systems in Vertical VS Open Domains Vertical domain Train95, AutoTutor, etc. Feasible to manually design rules and templates The content to be introduced is nearly certain Open-domain Users are free to say anything Impossible to specify rules/templates in advance

Contributions The first to address the problem of content introducing in open-domain conversation systems A complete pipeline, involving when, what, and how to introduce new content The Bi-PageRank-HITS reranking algorithm, emphasizing rich interaction between conversation context and candidate replies

Architecture Stalemate detection Named entity detection Keyword filters like “Err”, “Errr”, etc. Named entity detection Named entities reflect users' interest Candidate reply retrieval Retrieve candidate replies by entities & conversation context Selection by reranking Candidate replies are reranked by a random walk-like algorithm

Process Flow The system is built upon a conventional retrieval-based conversation system, which is typically passive

Reranking Algorithm Importance of each query or reply, interaction between queries and replies Bi-PageRank-HITS: combination of PageRank and HITS Formulate queries and replies as a bipartite graph Alternate between the following two steps PageRank: Rank one side in the bipartite graph HITS: Propagate information to the other side

Reranking Algorithm PageRank step: [·]: column normalization M: similarity matrix of either queries or replies x, y: prior distributions over queries and replies, uniformly initialized and updated after HITS HITS step: x: query scores, y: reply scores (see also PageRank) Weight matrix is updated according to PageRank scores, given by where φ(·,·) is a static, text-based relevance score

Reranking Algorithm Global iteration over PageRank and HITS

Evaluation Dataset: sessions from real-world user conversation logs Human evaluation: 1 point = good; 0 point = not good

Analysis Parameter: textual information for queries does recommend important queries. On the contrary, textual information for replies is inimical. Convergence: for the 10 randomly chosen samples, they typically converge quickly in 3--5 global iterations.

Case Study

Thanks for Listening Q & A