“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Artificial Intelligence Presentation
CrowdER - Crowdsourcing Entity Resolution
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Foundations of Adversarial Learning Daniel Lowd, University of Washington Christopher Meek, Microsoft Research Pedro Domingos, University of Washington.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Feature Selection Presented by: Nafise Hatamikhah
Looking at both the Present and the Past to Efficiently Update Replicas of Web Content Luciano Barbosa * Ana Carolina Salgado ! Francisco Tenorio ! Jacques.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Using Search in Problem Solving
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on Mikko Vapa, research student.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
“Artificial Intelligence” in Database Querying Dept. of CSE Seung-won Hwang.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
Identifying needs and establishing requirements
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Artificial Intelligence and Searching CPSC 315 – Programming Studio Spring 2013 Project 2, Lecture 1 Adapted from slides of Yoonsuck Choe.
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: Instructor Name: Dr. Dongchul Kim Review Article by Zengyou.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Copyright Paula Matuszek Kinds of Machine Learning.
NTU & MSRA Ming-Feng Tsai
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Boolean + Ranking: Querying a Database by K-Constrained Optimization Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi.
Organic Evolution and Problem Solving Je-Gun Joung.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1
What Is Cluster Analysis?
Applying Deep Neural Network to Enhance EMPI Searching
Seung-won Hwang, Kevin Chen-Chuan Chang
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Overview of Machine Learning
Presentation transcript:

“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH

2 Recap Bridging the gap between under-/over-specified user queries We went through various techniques to support intelligent querying, implicitly/automatically from data, prior users, specific user, and domain knowledge My research shares the same goal, with some AI techniques applied (e.g., search, machine learning)

3 The Context: Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com

4 Overview Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com Usability: Rank Formulation Efficiency: Processing Algorithms

5 Part I: Rank Processing Essentially a search problem (you studied in AI)

6 Limitation of Naïve approach a:0.90, b:0.80, c:0.70, d:0.60, e:0.50 b:0.78 Algorithm F = min(new,cheap,large) k = 1 Sort stepMerge step new (search predicate) : x cheap (expensive predicate) : p c large (expensive predicate) : p l d:0.90, a:0.85, b:0.78, c:0.75, e:0.70 b:0.90, d:0.90, e:0.80, a:0.75, c:0.20       Our goal is to schedule the order of probes to minimize the number of probes

7 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 pr(a,p c ) =0.85 pr(a,p l ) =0.75 OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e global schedule : H(p c, p l ) Unnecessary probes initial state a b c d e a b c d e b b goal state

8 Search Strategies? Depth-first Breadth-first Depth-limited / iterative deepening (try every depth limit) Bidirectional Iterative improvement (greedy/hill climbing)

9 Best First Search Determining which node to explore next, using evaluation function Evaluation function:  exploring more on object with the highest “upper bound score” We could show that this evaluation function minimizes the number of evaluation, by evaluating only when “absolutely necessary”.

10 Necessary Probes? Necessary probes  probe pr(u,p) is necessary if we cannot determine top-k answers until probing pr(u,p), where u: object, p: predicate OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e0.50 top-1: b(0.78) Can we decide top-1 without probing pr(a,p c )? ≤0.90 Let global schedule be H(p c, p l )  No pr(a,p c ) necessary!

11 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 b:0.8 a:0.75 c:0.7 d:0.6 e:0.5 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 pr(a,p c ) =0.85 pr(a,p l ) =0.75 pr(b,p c ) =0.78 pr(b,p l ) =0.90 Top-1 OIDxpcpc plpl min(x, p c, p l ) a0.90 b0.80 c0.70 d0.60 e global schedule : H(p c, p l ) Unnecessary probes

12 Generalization FA, TA, QuickCombine r =1 (cheap) r = h (expensive) r =  (impossible) CA, SR-Combine NRA, StreamCombine s =1 (cheap) s = h (expensive) s =  (impossible) Random Access Sorted Access FA, TA, QuickCombine NRA, StreamCombine MPro [SIGMOD02/TODS] Unified Top-k Optimization [ICDE05a/TKDE]

13 Strong nuclear force Electromagnetic force Weak nuclear force Gravitational force Unified field theory Just for Laugh: Adapted from Hyountaek Yong’s presentation

14 FA TA NRA CA MPro Unified Cost-based Approach

15 Generality Across a wide range of scenarios  One algorithm for all

16 Adaptivity Optimal at specific runtime scenario

17 Cost based Approach Cost-based optimization  Finding optimal algorithm for the given scenario, with minimum cost, from a space  M opt 

18 Evaluation: Unification and Contrast (v. TA) T N N T N Unification: For symmetric function, e.g., avg( p 1, p 2 ), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min( p 1, p 2 ), NC adapts with different behaviors and outperforms TA depth into p 1 depth into p 2 depth into p 1 depth into p 2 cost

19 Part II: Rank Formulation Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses e.g., realtor.com Usability: Rank Formulation Efficiency: Processing Algorithms

20 Learning F from implicit user interactions Using machine learning technique (that you will learn soon!) to combine quantitative model for efficiency and qualitative model for usability Quantitative model  Query condition is represented as a mapping F of objects into absolute numerical scores  DB-friendly, by attaining the absolute score on each object  Example F( )=0.9 F( )=0.5 Qualitative model  Query condition is represented as a relative ordering of objects  User-friendly by alleviating user from specifying the absolute score on each object  Example >

21 A Solution: RankFP (RANK Formulation and Processing) For usability, a qualitative formulation front-end which enables rank formulation by ordering samples For efficiency, a quantitative ranking function F which can be efficiently processed sample S (unordered) Sample Selection: generate new S Function Learning: learn new F ranking R* over S Over S: R F  R* ? no yes F ranking function Rank Formulation Rank Processing ranked results processing of Q Q: select * from houses order by F limit k

22 Task 1: Ranking  Classification Challenge: Unlike a conventional learning problem of classifying objects into groups, we learn a desired ordering of all objects Solution: We transform ranking into a classification on pairwise comparisons [Herbrich00] learning algorithms: a binary classifier + - F a-b b-c c-d d-e a-c … … ranking view: c > b > d > e > a c b d e a classification view: pairwise comparison classification [Herbrich00] R. Herbrich, et. al. Large margin rank boundary for ordinal regression. MIT Press, 2000.

23 Task 2: Classification  Ranking Challenge: With the pairwise classification function, we need to efficiently process ranking. Solution: developing duality connecting F also as a global per- object ranking function. Suppose function F is linear Classification View: Ranking View: F(u i -u j )>0  F(u i )- F(u j )>0  F(u i )> F(u j ) b d e a c F(a-b)? F(a)=0.7 F(a-c)? F(a-d)? ….. F Rank with F(. ) e.g., F(c)>F(b)>F(d)>…

24 Task 3: Active Learning Finding samples maximizing learning effectiveness  Selective sampling: resolving the ambiguity  Top sampling: focusing on top results Achieving >90% accuracy in <=3 iterations (<=10 ms) F F

25 Using Categorization for Intelligent Retrieval Category structure created a-priori (typically a manual process) At search time: each search result placed under pre-assigned category Susceptible to skew  information overload

26 Categorization: Cost-based Optimization Categorize results automatically/dynamically  Generate labeled, hierarchical category structure dynamically based on the contents of the tuples in the result set  Does not suffer from problems as in a-priori categorization Contributions:  Exploration/cost models to quantify information overload faced by an user during an exploration  Cost-driven search to find low cost categorizations  Experiments to evaluate models/algorithms

27 Thank You!