Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
® Microsoft Office 2010 Browser and Basics.
Unit 11 Using the Internet & Browsing the Web.  Define the Internet and the Web  Set up & troubleshoot an Internet connection  Categorize webs sites.
Website Content Management Typo3 CMS. King Websites King College does not have one website, it has more than 90! The old site was more than 7,000 pages.
Getting Started. Edline Web Site Requirements Provide Students and Parents With: 1.A Brief Course Description 2.Your Address 3.Course Syllabus 4.Major.
Web publishing training Introduction to the Tera text Web Content Management System (CMS) Learning Objective: Basic knowledge and skills required to publish.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
BlackBoard Online Submission Annual Assessment Updates
Objectives Moodle is an online learning environment where instructors & their students interact. In this workshop you will learn: 1.Configure system requirements.
Resource Discovery Module DigiTool Version 3.0. Resource Discovery 2 Deposit Approval Search & Index Dispatcher & Viewers Single & Bulk Web Services DigiTool.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 of 6 Parts of Your Notebook Below is a graphic overview of the different parts of a OneNote 2007 notebook. Microsoft ® OneNote ® 2007 notebooks are digital.
The Internet 8th Edition Tutorial 1 Browser Basics.
Installing Windows XP Professional Using Attended Installation Slide 1 of 41Session 2 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
How the World Wide Web Works
Sourcing Event Tool Kit Solicitation Archiving User Guide.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Tutorial Introduction Fidelity NTSConnect is an innovative Web-based software solution designed for use by customers of Fidelity National Title Insurance.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Web Browser: Netscape Navigator and Internet Explorer By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
With Internet Explorer 9 Getting Started© 2013 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring the World Wide Web with Internet Explorer.
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
Classroom User Training June 29, 2005 Presented by:
© 2011 Delmar, Cengage Learning Chapter 9 Collecting Data with Forms.
Rich Desktop Client Overview. Rich Desktop Client - Overview After you download and install SureClinical eTMF (through a link sent in the account activation.
Introducing Dreamweaver MX 2004
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
1 Chapter 2 & Chapter 4 §Browsers. 2 Terms §Software §Program §Application.
DocuShare Training Welcome to DocuShare Training.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Home Media Network Hard Drive Training for Update to 2.0 By Erik Collett Revised for Firmware Update.
CSC350: Learning Management Systems COMSATS Institute of Information Technology (Virtual Campus)
Navigation Section 2. Objectives Student will knowhow to navigate through the browser.
Web Programming: Client/Server Applications Server sends the web pages to the client. –built into Visual Studio for development purposes Client displays.
Plan My Move & MilitaryINSTALLATIONS May, 2008 Relocation Personnel Roles and Responsibilities MC&FP.
CTER Orientation Tutorial The Use of the WebBoard.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 1 1 Browser Basics Introduction to the Web and Web Browser Software Tutorial.
The Auditor Role The auditor has the same view of the course as the student does, but no marks are recorded for auditors.
Heidi Atha Diane Look Buena Vista School District.
JavaScript, Fourth Edition Chapter 5 Validating Form Data with JavaScript.
Execute Workflow. Home page To execute a workflow navigate to My Workflows Page.
XP New Perspectives on the Internet, 4e Tutorial 2 1 Browser Basics Introduction to Microsoft Internet Explorer and Netscape Navigator.
Forms and Server Side Includes. What are Forms? Forms are used to get user input We’ve all used them before. For example, ever had to sign up for courses.
LearningSpace 2.0. What is LearningSpace 2.0 Program designed for project-based learning and real-time collaboration in virtual workspaces. Includes safe.
CSC350: Learning Management Systems COMSATS Institute of Information Technology (Virtual Campus)
Using Memex to archive and mine community Web browsing experience Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari Indian Institute.
Using LastPass. Great password management is impossible w/o a great tool Auto-fill (hands-free login) will save you approximately one hour per month You.
Table of Contents TopicSlide Administrator Login 2 Administrator Navigations 3 Managing AlternativeDr.com Blogs 4 Managing Dr. Lloyd May Blogs 5 Managing.
XP Browser and Basics COM111 Introduction to Computer Applications.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Introduction to KE EMu
1 1 ECHO Extended Services February 15, Agenda Review of Extended Services Policy and Governance ECHO’s Service Domain Model How to…
January 2006Colby College ITS Setting Up Course Pages.
Getting Started. Package Overview (GradeQuick)‏ Web-based grade book –Access Anywhere –Always Current Paper grade book “look and feel” Flexible grading.
Computer Skills (1) Internet Explorer. To open the Internet Explorer: –Double click on the Internet Explorer icon on Desktop. –Or, from Start  All Programs.
Thank you for looking into Policy Manager Two for your Head Start program Teresa K. Wickstrom Senior Associate Center for Community Futures
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Multimedia Web site development Plan your site Steps for creating web pages.
Premier, multi-disciplinary engineering content that complements course material 750 interactive tables and graphs to.
Blue Ridge Unified School District.  How will this help me? My Family Link  Where do I get started? Personal Profile Page  How do I manage my webpage?
American Diploma Project Administrative Site Training.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
LMEvents SharePoint Portal How-to Guide
Citation Map Visualizing citation data in the Web of Science
IBM SCPM Basic Navigation
ClicktoHuddle Training
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
Presentation transcript:

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari Indian Institute of Technology Bombay

IITB 2000 Sources of Web information  Sources already exploited Text on pages (keyword search) Link between pages (popularity rating) Topic taxonomies (query expansion)  Sources not exploited enough yet Public surfing history Public bookmarks  Collaboration is central to hypertext  Lack of trust limits collaboration on Web

IITB 2000 Our goals  Infrastructure to support spontaneous formation of topic-based collaborative Web communities Browsing assistant client Community server  Mining algorithms for personal and community level topic management and collaborative resource discovery  Extensible API for plugging in additional hypertext analysis tools

IITB : Create a Memex account (password sent by ) 3: Allow the Memex client to attach to your Web browser 4: Log on to the Memex server 2: Install the Memex applet signing certificate and visit the applet page

IITB 2000 Memex client applet attaches to browser Privacy choice Function tabs

IITB 2000 Preparing to import initial bookmarks

IITB 2000 Bookmarks imported

IITB 2000 For Memex to suggest an initial topic organization, select all bookmarks…

IITB 2000 …and send them to the clustering tab

IITB 2000 Switch to the clustering tab URLs to be clustered appear here

IITB 2000 Submit the URLs to the server-side Memex clustering demon

IITB 2000 Check later if the server has completed the clustering task

IITB 2000 Two top-level clusters about software and music

IITB 2000 Expanding the software cluster to study it in more detail

IITB 2000 User can freely reorganize URL placement using cut-and-paste

IITB 2000 User can freely reorganize URL placement using cut-and-paste

IITB 2000 User can freely reorganize URL placement using cut-and-paste

IITB 2000 Moving an entire folder from the cluster tab…

IITB 2000 …to the folder tab together with example URLs

IITB 2000 …to the folder tab together with example URLs

IITB 2000 Folder names can be edited as per taste; this also gives Memex additional clues about the folder’s contents

IITB 2000 New folders can be created to hold clusters found in the cluster tab

IITB 2000 New folders can be created to hold clusters found in the cluster tab

IITB 2000 A topic hierarchy which is too detailed for the user can be flattened

IITB 2000 A topic hierarchy which is too detailed for the user can be flattened

IITB 2000 Groups of closely related URLs can be moved back to folders in the folder tab

IITB 2000 Groups of closely related URLs can be moved back to folders in the folder tab

IITB 2000 Memex helps the user derive a starting topic hierarchy from unstructured bookmarks

IITB 2000 The user then continues browsing in multiple sessions. Relevant pages found by other members of the community and made public are available for collaborative surfing

IITB 2000 If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

IITB 2000 If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

IITB 2000 Such surf trails together with page contents are valuable inputs to the Memex server-side hypertext mining and resource discovery demons

IITB 2000 In the background, the Memex classifier finds the most suitable folders to assign to each history items. History is never deleted (disk is cheap). When the user refreshes the view, surf history from others and herself are found categorized into the user’s familiar topic tree. ‘?’ indicates that Memex is not sure about the folder assignment. Users can easily correct mistakes and this forms additional valuable training data.

IITB 2000 Automatic collaborative classification also lets users return to a topic-restricted surfing context quickly, and replay the last few surfing actions within that topic of interest.

IITB 2000 Personalized topic-based history management is far superior to the one- dimensional history list provided by popular browsers

IITB 2000 Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

IITB 2000 Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

IITB 2000 A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

IITB 2000 A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

IITB 2000 Close integration of the Memex client with the browser is non-trivial to implement but adds greatly to comfort and ease of use

IITB 2000 Memex system diagram Browser Memex server Client JAR Visit Running client applet Download Attach Event-handler servlets Search Folder Context Archive Memex client-server protocol and workload sharing negotiations Relational metadata Text index Mining demons Topic models Taxonomy synthesis Resource discovery Recommendation Classification Clustering

IITB 2000 Document workflow Demon Registry X Per-document version queue NODE table Crawler Search indexer Classifier service Clustering service Garbage collector Push new version Pop and discard old version Browser Memex client Page visit and bookmarking events logged

IITB 2000 Autonomous topic organization  Bookmarks often collected into topics  Surfers use personal topic organization  One-size-fits all taxonomy inadequate Many topics over-developed for most of us But deeper interests often underdeveloped Structure reorganization also desirable  Best taxonomy depends on community behavior as well as page content

IITB 2000 Autonomy and collaboration  Personalization  picking Yahoo nodes  Complex relations between topics  Need “simplest common ground” Coalesce similar topics where possible… …without sacrificing individual taste Sports Hiking Subsumption User2User1 Yahoo Biz Shops Bikeshops Sports Cycling Bikeshops Sports User3 Tree ‘inversion’

IITB 2000 Taxonomy synthesis example  Generating themes makes map simpler  But distorts contents of original folders  Joint optimization gives best themes Entertainment Studios Broadcasting Media kpfa.org bbc.co.uk kron.com channel4.com kcbs.com foxmovies.com miramax.com lucasfilms.com Share document Share folder Share terms Themes ‘Radio’ ‘Television’ ‘Movies’

IITB 2000 Summary and project status  Collaborative resource discovery and topic management system  Testbed for hypertext mining research  Signed Java2 client Netscape 4.5+ available IE5+ planned  Server for Unix and Windows IBM UDB, Berkeley DB, servlets Non-trivial to install and manage Simple-to-use RPMs being planned 