Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.

Similar presentations


Presentation on theme: "Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University."— Presentation transcript:

1 Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University

2 Steve Cassidy Computing at MacquarieNo 2 The First Web Page

3 Steve Cassidy Computing at MacquarieNo 3 What is the Web? Documents, text, images, sound A web of hyperlinks –Link one (text) document to others Easy to join –Any Internet user can be a publisher Anarchic –No-one is in charge Very big

4 Steve Cassidy Computing at MacquarieNo 4 The Problem Much of the information available is text-based Text is difficult to process by computers The popular use of computers and the Internet has increased the availability of text-based information Information Overload

5 Steve Cassidy Computing at MacquarieNo 5 The Solution? Only one of the top four commercial search engines finds itself The best navigation should make it easy to find almost anything on the web (once all the data is entered) ‏ The Web 1997

6 Steve Cassidy Computing at MacquarieNo 6 How do they work? Two major steps –Build an inverted index –Match query terms in the index Problems –The web is very big –Finding relevant documents –Avoiding false hits

7 Steve Cassidy Computing at MacquarieNo 7 Inverted Index document D1 D2 D3 D1 D1 D3 D1 D2 computer software information language computer software information language computer library retrieval computer information retrieval filtering D1 D2 D3 document

8 Steve Cassidy Computing at MacquarieNo 8 Building the Index List of web addresses Download web page Parse Web page Index New links Web page text

9 Steve Cassidy Computing at MacquarieNo 9 Building the Index List of web addresses Download web page Parse Web page Index New links Web page text <a name="works"> How Google Works If you aren't interested in learning how Google creates the index and the database of documents that it accesses when processing a query, skip this description. I adapted the following overview from Chris Sherman and Gary Price's wonderful description of How Search Engines Work in Chapter 2 of The Invisible Web (CyberAge Books, 2001). Google consists of three distinct parts, each of which is run on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneiously, significantly speeding up data processing.

10 Steve Cassidy Computing at MacquarieNo 10 Using the Index D1 D2 D3 D1D1 D3 computer software information document D1 D2 language Query: computer software information D1 D2 D3 D1 D3 D1

11 Steve Cassidy Computing at MacquarieNo 11 Server Farm http://www.microsoft.com/technet/archive/windows2000serv/plan/hiavsys.mspx Over 10,000 computers Each with a copy of the index

12 Steve Cassidy Computing at MacquarieNo 12 Relevance Finding pages with search terms is easy Which ones are the best? Google: –Text in titles, headings is important –Text earlier in the page is important –Text of links to this page is important –Important pages link to other important pages

13 Steve Cassidy Computing at MacquarieNo 13 Making the Most of Search Engines Use words likely to appear in the pages you want Use more query terms to narrow your result Be brief Don’t worry about spelling Use “words in quotes” to search for phrases

14 Steve Cassidy Computing at MacquarieNo 14 Other Search Engines www.teoma.com –Offers ‘refine your search’ –Subject specific popularity www.ask.com –Natural language questions search.yahoo.com

15 Steve Cassidy Computing at MacquarieNo 15 The Future Information Extraction –Find all the details of this conference for my diary Question Answering –When did Armstrong land on the moon? The Semantic Web –Exchanging machine readable data

16 Steve Cassidy Computing at MacquarieNo 16 Language Technology SLP148 Language, Logic and Computation COMP248 Language Technology COMP249 Web Technology COMP348 Document Processing and the Semantic Web COMP349 Spoken Language Dialogue Systems

17 Steve Cassidy Computing at MacquarieNo 17 Questions?


Download ppt "Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University."

Similar presentations


Ads by Google