Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.

Slides:



Advertisements
Similar presentations
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Advertisements

A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
INFO 624 Week 3 Retrieval System Evaluation
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Databases & Data Warehouses Chapter 3 Database Processing.
New “Collaborate” Button Integrate UI directly into the browser. Preferred target: Firefox Easiest browser to extend in terms of UI.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Personalized Search Xiao Liu
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Information Dynamics & Interoperability Presented at: NIT 2001 Global Digital Library Development in the New Millennium Beijing, China, May 2001, and DELOS.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
WIRED Future Quick review of Everything What I do when searching, seeking and retrieving Questions? Projects and Courses in the Fall Course Evaluation.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Definition, purposes/functions, elements of IR systems Lesson 1.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
WP5: Semantic Multimedia
User Characterization in Search Personalization
Key Observation Theorem:
Search Engine Architecture
Augmenting (personal) IR
Experience Management
موضوع پروژه : بازیابی اطلاعات Information Retrieval
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Haystack: an Adaptive Personalized Information Retrieval System
Presentation transcript:

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ

Outline What is Haystack Haystack and IR Data Model System Architecture Information Gathering Problems

What is Haystack? A software for organizing and retrieving personal information Totally personalized  One user, one Haystack Personal digital bookshelf A prototype

Haystack and IR IR  large corpus  precision-recall metric  “expert”relevance judge IF (collaborative filtering)  preference for similar users  require explicit user input Haystack  personal collection  user’s satisfaction  particular user  focus on searching  specific to one user  can observe user’s implicit information needs All Users / Groups of UsersA Single User

Haystack Functionality Automated data gathering  Information maximization  gathering as much information as possible Customized information collection Adaptation to individual query needs A IR system that adapts to its user ?

General Data Model Accommodate all information  arbitrary pieces of data  metadata  links between them Facilitate data growth  new data  user’s annotation  user’s information behavior A semantic network  full text searching  bibliographic info. Searching  associate searching  adapt to the user

General Data Model (illustration)

General Data Model (summary) Inheritance hierarchy  Straw  needle: primitive information   bale: collection of related straws   tie: relationship b/w straws Metadata representation Recursive metadata annotation Interface Haystack to external “services” Index agents controlling external devices

System Architecture Database, searching engine  an adapter as interface to various engines Core Haystack system (root server)  data model implementation  operation-system-like services Client level services  user interface  proxy services  data augmenting services  annotation, querying, browsing  observing interaction with external information resources  modifying data, adding links,…

System Architecture (illustration)

Indexing in Haystack Straws generate textual information IR system stores such information Info. from each straw will be regarded as one unit of indexing  allows to associate pieces of information Incrementally indexing  whenever a series of changes happen

Outline What is Haystack Haystack and IR Data Model System Architecture Data Gathering Problems

Information gathering User’s explicit annotation User’s behaviors observed by the system  interaction with outside world (www, s)  interaction with Haystack building query paths – adapting to the user’s style Analyzing corpus already in Haystack  indexing  metadata extraction  adding links between documents

User’s explicit annotation Probably the best information source Might not be realistic Nicer interface to encourage users HCI studies

Observers Proxy services  WWW, proxies Recording webpages the user sees Tracing the path of browsing Recording visiting time …… Query observer  Using query interactions to mold the data model to the user  Plug in new data  Adding links b/w nodes  Facilitating retrieval

Query Observer Integrates queries into the data model Query straw  a bale, containing query text, rank of docs, ….  attached nodes of matched documents  annotations from user’s choices  relevance feedback  tuned to a particular user Query path  a chain of query straws in a single searching  good for future retrievals: presenting similar query terms adapting relevance of documents by reindexing documents with text of the query path

Information gathering User’s explicit annotation User’s behaviors observed by the system  interaction with outside world (www, s)  interaction with Haystack building query paths – adapting to the user’s style Analyzing corpus already in Haystack  indexing  metadata extraction  adding links between documents Data augmenting clients Data driven clients

Data augmenting clients

digesting existing information, generating new information Independent but cooperating  Fetch clients  Type inference clients  Extractor clients  Field finder clients Triggered by events: data changes in Haystack

Data augmenting clients: example

Summary A prototype of a personalized information organization and retrieval system Relationship with IR General Data Model  graph, straws, … System Architecture  three layers: DB, core, clients Data Gathering  three approaches

Problems Information maximization assumption  the more, the better?  for one user, but has to be prepared for all users  what are useful clues? Efficiency issues  dynamic indexing  a slow system (512M memory, 2G disk…) Today’s haystack project  semantic web, RDF, ontology, user interface …