Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari.

Similar presentations


Presentation on theme: "Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari."— Presentation transcript:

1

2 Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari Indian Institute of Technology Bombay

3 IITB 2000 Sources of Web information  Sources already exploited Text on pages (keyword search) Link between pages (popularity rating) Topic taxonomies (query expansion)  Sources not exploited enough yet Public surfing history Public bookmarks  Collaboration is central to hypertext  Lack of trust limits collaboration on Web

4 IITB 2000 Our goals  Infrastructure to support spontaneous formation of topic-based collaborative Web communities Browsing assistant client Community server  Mining algorithms for personal and community level topic management and collaborative resource discovery  Extensible API for plugging in additional hypertext analysis tools

5 IITB 2000 1: Create a Memex account (password sent by email) 3: Allow the Memex client to attach to your Web browser 4: Log on to the Memex server 2: Install the Memex applet signing certificate and visit the applet page

6 IITB 2000 Memex client applet attaches to browser Privacy choice Function tabs

7 IITB 2000 Preparing to import initial bookmarks

8 IITB 2000 Bookmarks imported

9 IITB 2000 For Memex to suggest an initial topic organization, select all bookmarks…

10 IITB 2000 …and send them to the clustering tab

11 IITB 2000 Switch to the clustering tab URLs to be clustered appear here

12 IITB 2000 Submit the URLs to the server-side Memex clustering demon

13 IITB 2000 Check later if the server has completed the clustering task

14 IITB 2000 Two top-level clusters about software and music

15 IITB 2000 Expanding the software cluster to study it in more detail

16 IITB 2000 User can freely reorganize URL placement using cut-and-paste

17 IITB 2000 User can freely reorganize URL placement using cut-and-paste

18 IITB 2000 User can freely reorganize URL placement using cut-and-paste

19 IITB 2000 Moving an entire folder from the cluster tab…

20 IITB 2000 …to the folder tab together with example URLs

21 IITB 2000 …to the folder tab together with example URLs

22 IITB 2000 Folder names can be edited as per taste; this also gives Memex additional clues about the folder’s contents

23 IITB 2000 New folders can be created to hold clusters found in the cluster tab

24 IITB 2000 New folders can be created to hold clusters found in the cluster tab

25 IITB 2000 A topic hierarchy which is too detailed for the user can be flattened

26 IITB 2000 A topic hierarchy which is too detailed for the user can be flattened

27 IITB 2000 Groups of closely related URLs can be moved back to folders in the folder tab

28 IITB 2000 Groups of closely related URLs can be moved back to folders in the folder tab

29 IITB 2000 Memex helps the user derive a starting topic hierarchy from unstructured bookmarks

30 IITB 2000 The user then continues browsing in multiple sessions. Relevant pages found by other members of the community and made public are available for collaborative surfing

31 IITB 2000 If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

32 IITB 2000 If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

33 IITB 2000 Such surf trails together with page contents are valuable inputs to the Memex server-side hypertext mining and resource discovery demons

34 IITB 2000 In the background, the Memex classifier finds the most suitable folders to assign to each history items. History is never deleted (disk is cheap). When the user refreshes the view, surf history from others and herself are found categorized into the user’s familiar topic tree. ‘?’ indicates that Memex is not sure about the folder assignment. Users can easily correct mistakes and this forms additional valuable training data.

35 IITB 2000 Automatic collaborative classification also lets users return to a topic-restricted surfing context quickly, and replay the last few surfing actions within that topic of interest.

36 IITB 2000 Personalized topic-based history management is far superior to the one- dimensional history list provided by popular browsers

37 IITB 2000 Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

38 IITB 2000 Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

39 IITB 2000 A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

40 IITB 2000 A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

41 IITB 2000 Close integration of the Memex client with the browser is non-trivial to implement but adds greatly to comfort and ease of use

42 IITB 2000 Memex system diagram Browser Memex server Client JAR Visit Running client applet Download Attach Event-handler servlets Search Folder Context Archive Memex client-server protocol and workload sharing negotiations Relational metadata Text index Mining demons Topic models Taxonomy synthesis Resource discovery Recommendation Classification Clustering

43 IITB 2000 Document workflow Demon Registry X Per-document version queue NODE table Crawler Search indexer Classifier service Clustering service Garbage collector Push new version Pop and discard old version Browser Memex client Page visit and bookmarking events logged

44 IITB 2000 Autonomous topic organization  Bookmarks often collected into topics  Surfers use personal topic organization  One-size-fits all taxonomy inadequate Many topics over-developed for most of us http://dmoz.org/Sports/Hockey/Underwater_Hockey/ But deeper interests often underdeveloped Structure reorganization also desirable  Best taxonomy depends on community behavior as well as page content

45 IITB 2000 Autonomy and collaboration  Personalization  picking Yahoo nodes  Complex relations between topics  Need “simplest common ground” Coalesce similar topics where possible… …without sacrificing individual taste Sports Hiking Subsumption User2User1 Yahoo Biz Shops Bikeshops Sports Cycling Bikeshops Sports User3 Tree ‘inversion’

46 IITB 2000 Taxonomy synthesis example  Generating themes makes map simpler  But distorts contents of original folders  Joint optimization gives best themes Entertainment Studios Broadcasting Media kpfa.org bbc.co.uk kron.com channel4.com kcbs.com foxmovies.com miramax.com lucasfilms.com Share document Share folder Share terms Themes ‘Radio’ ‘Television’ ‘Movies’

47 IITB 2000 Summary and project status  Collaborative resource discovery and topic management system  Testbed for hypertext mining research  Signed Java2 client Netscape 4.5+ available IE5+ planned  Server for Unix and Windows IBM UDB, Berkeley DB, servlets Non-trivial to install and manage Simple-to-use RPMs being planned  http://www.cse.iitb.ernet.in/~soumen http://www.cse.iitb.ernet.in/~soumen


Download ppt "Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari."

Similar presentations


Ads by Google