Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick
Information Access A key task in Oxygen: help people manage and retrieve information Three overlapping projects: l Haystack: information storage and retrieval application clients l Semantic Web: next-generation metadata l Volt: collaborative access
Presentation Overview Motivation l Information access behavior and goals System Design & Architecture l Data Model l Interacting data and UI components Working applications l Base haystack l Frontpage l Volt
Motivation
Problem Scenario I try solving problems using my data: l Information gathered personally l High quality, easy for me to understand l Not limited to publicly available content My organization: l Personal annotations and meta-data l Choose own subject arrangement l Optimize for my kind of searching Adapts to my needs
Then Turn to a Friend Leverage l They organize information for their own use l Let them find things for me too Shared vocabulary l They know me and what I want Personal expertise l They know things not in any library Trust l Their recommendations are good
Last to Library/web Answer usually there l But hard to find l Wish: rearrange to suit my needs l Wish: help from my friends in looking
Lessons Individualized access l Best tools adapt to individual ways of organizing and seeking data Individualized knowledge l People know more than they publish l That knowledge is useful to them and others Collaborative use l Right incentives lead to sharing and joint use
Haystack Individualized access l My data collection, organization l Search tools tuned for me Collaborate to leverage individual knowledge l Access unpublished information in others’ haystacks l Self interest public benefit Lens to personalize access to the world library l Rearrange presentation to suit my personal needs
Example Info on probabilistic models in data mining l My haystack doesn’t know, but “probability” is in lots of I got from Tommi Jaakola l Tommi told his haystack that “Bayesian” refers to “probability models” l Tommi has read several papers on Bayesian methods in data mining l Some are by Daphne Koller l I read/liked other work by Koller l My Haystack queries “Daphne Koller Bayes” on Yahoo l Tommi’s haystack can rank the results for me…
System Design
Gathering Data Haystack archives anything l Web pages browsed, sent and received, address book, documents written And any properties, relationships l Text of object (for text search) l Author, title, color, citations, quotations, annotations, quality, last usage Users freely add types, relationships
Semantic Web Arbitrary objects, connected by named links No fixed schema l User extensible Sharable by any application l A new “file system”? DocD. KargerHaystack title author Outstanding quality says HTML type
Gathering Data Active user input l Interfaces let user add data, note relationships Mining data from prior data l Plug-in services opportunistically extract data Passive observation of user l Plug-ins to other interfaces record user actions Other Users
Data Extraction Services Web Observer Proxy Triple Store Mail Observer Proxy Machine Learning Services Web Viewer Volt Viewer/ Editor Spider
Sample Applications
Because everything uses the Semantic Web constructions, a variety of application clients can share information l Web Browser---data viewer l FrontPage---personalized information filter l Volt---collaboration tool
Haystack via Web Web server interface Basic operations: l Insert objects l View objects l Queries
Haystack via Web
Viewer shows one node and associated arrows Service notices we’ve archived a directory; so archives the objects it contains (and so on…)
Haystack via Web Services detect document type, extract relevant metadata Output can specialize by type of object
Mediation Haystack can be a lens for viewing data from the rest of the world l Stored content shows what user knows/likes l Selectively spider “good” sites l Filter results coming back Compare to objects user has liked in the past l Can learn over time Example - personalized news service
News Service
Scavenges articles from your favorite news sources l Html parsing/extracting services Over time, learns types of articles that interest you l Prioritizes those for display Content provider no longer controls viewing experience l No more ads
Personalized News Service
Collaborative Access Want to leverage others’ work in organizing information l No need to “publish” expertise l Exposed automatically---without effort l Self interest helps others
Volt Volt is about collaboration between people l The Haystack architecture allows easy collaboration among individuals semantic web references to Haystack objects l Individuals share parts of their Haystack l Group spaces and shared notebooks
Volt
Collaborators Those I interact with l Frequent mail contact l Frequent visits to their home page Those with shared content l And who have same opinions about content l Collaborative filtering techniques Referrals Expertise search engine
Expertise Beacon
Volt Expertise Beacons Group spaces and shared notebooks l Create individual and group profiles Profiles can be used to find other people l Allows targeted search l “Who else is working on this project?” User controls visibility/privacy
Summary Next generation information access Semantic Web l provides a language and capabilities for meta-data Haystack l teases out individual knowledge, l stores it in a coherent fashion, and l allows a variety of application clients to leverage individual meta-data Volt l turns individual knowledge into a community resource
More Info