Presentation is loading. Please wait.

Presentation is loading. Please wait.

WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London

Similar presentations

Presentation on theme: "WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London"— Presentation transcript:

1 WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London

2 2 Talk Overview Hypertext and the navigation problem NavigationZones solutionNavigationZone Problems being researched A Demonstration

3 3 Hypertext and Navigation Long history –Bush 1945, memex – trail blazing –Nelson 1965, Xanadu - network of documents Problem of getting lost in hyperspace Navigation aids –Bookmarks –History –Overview diagrams – Recommendations

4 4 State-of-the-Art Navigation Aids Novel User-Interfaces to visualise web sites Clustering (e.g. Self-Organising Maps) Web data mining – finding user patterns Semi-automated navigation, BestTrail algorithm – motivation to follow …

5 5 Typical corporate search

6 6 A typical search scenario 1)Submit a query to a search engine Is it too broad / too specific? Does it capture my information needs? 2)Select a URL from the result set Have I made the right choice? 3)Start manual navigation Where - am I? have I come from ? am I going to ? 4)Goto (1) to reformulate the query

7 7 Content centric approach a c e d * ba e d

8 8 Problems with standard Search Page level relevance scoring –sensitive to query terms No look ahead –click and discover No context –results are totally isolated No navigation support –Users are left on their own to find their way

9 9 Possible solutions (information retrieval) Improve basic IR Link analysis, e.g. pagerank and HITS Meta data tagging –Keywords and taxonomies (semantic web) Natural language –Q&A, sentence analysis, synonyms

10 10 Possible solutions (information seeking) Suggestion engines –Link and content generation Categories and directories –Explicit manual construction Automatic classification –Machine learning techniques

11 11 Are these feasible? Re-architecting corporate information infrastructure is extremely expensive Sophisticated approaches are not always intuitive and are yet to be proven Same problem every couple of years Mergers and acquisitions

12 12 There is, actually, a better way! Treat sequence of pages, or trails, as first- class citizens for search Consider the topology of the area in which you are searching Employ navigational aids

13 13 Context centric approach a c e d * ba c e d * b e a c d * b

14 14 The information value of a trail is higher than the sum of it parts!

15 15 Our approach Provide information retrieval of the highest quality and in addition, Find out what is beyond the most relevant pages by exploring the area Present users with precise and relevant trails Provide navigation assistance within the UI

16 16 NavZone user interface

17 17 First Monday paper Task – find answers to 5 types of questions 1)Fact Finding – What are the term dates? 2)Judgement – Is CSIS a good place to do research? 3)Fact Comparison – Which train station is closest to the college? 4)Judgement Comparison – Is the research in deptA better than that in deptB? 5)General Navigational – How do you get to the checkout? NavZone Usability Study

18 18 % of subjects, 4+ questions correct 59% Google 75% Compass 83% NavZone NavZone vs. Google and Compass

19 19 44 Google 40 Compass 27 NavZone NavZone is bandwidth green ! Average # clicks to complete task

20 20 18 Compass 17 Google 13 NavZone Average time taken per task (min) Wilcoxon Test - Statistically Significant

21 21 The ingredients of the System State-of-the-art web crawler Highly efficient document Indexer Competitive IR Patent protected trail engine and UI

22 22 The main ingredients robot Parser HTML, XML, PDF, PostScript, Word, Other generic format crawler BestTrail web graph user interface trail engine postprocessor inverted file indexer BestTrail web graph user interface

23 23 The crawler Pick a URL from the queue Download the page Parse and extract main features Replace URL in queue with outlinks QDR1R1 PR2R2 queuedownloadersparsers

24 24 The indexer Compute page statistics for IR Compute page navigability potential (PG) Compute page authority ranking (GR) Build page summary information Build inverted index

25 25 The trail engine Compute page scores for query Explore graph from good starting nodes Rank candidate trails Build result set

26 26 Under Development Alternative User-Interfaces Seamless integration with relational databases and file systems Data mining and personalisation Mobile/PDA support

27 27 Open Problem How do we make use of statistical regularities that are present in the web to improve search and navigation? See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/ , many distributions related to the web graph follow a power lawA stochastic model for the evolution of the web.

Download ppt "WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London"

Similar presentations

Ads by Google