Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.

Similar presentations


Presentation on theme: "Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed."— Presentation transcript:

1 Parallel and Distributed Searching

2 Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed Searching –Collection Partioning –Query Processing –Collection/Results Fusion

3 Boolean Queries Queries with terms connected by AND OR and NOT –(Internet AND retrieval) AND (NOT english) –“world wide web” OR internet

4 Advantages Easy to Implement Allow very precise query specifications Facilitate parallel execution

5 Disadvantages People are bad at Boolean algebra Difficult to interpret to get effective relevance ranking Difficult to include sensible query weighting

6 Parallel Searching Useful in improving performance in very large/heavily used search engines break query down into several subqueries execute each at the same time combine results share subqueries between different searches

7 Distributed Searching More about metasearching and turning plain searching into metasearching

8 Distribution Methods Multiple copies of collection: mirror sites Why not split the documents between servers according to their topics ?

9 Collection Partioning Manual/Semi automatic Topic Partioning –medical vs engineering –books vs CD’s One Central Index One Index per server

10 Distributed Query Processing Select collections to search distribute query to selected collections evaluate query at selected servers in parallel combine results into a final result

11 Source Selection Obtain global term distribution data –on the web ????? Analyse central index of collection relevance Missing gems

12 Missing Gems Example Query –wear characteristics of high titanium steel alloys –actually occurs in medical collection describing use in artificial hips

13 Results Fusion Want to present a single result collected from several sources Also known as collection fusion because it makes several collections appear as one

14 Results Fusion How do you put together the results from several web sites/search engines into a single combined result ? Collection at a time Round robin Relevance Ranked

15 Collection at a Time Use e.g. tf * idf across each collection to rank searched collection by relevance Display the results from the best collection first

16 Tf *idf Tf - term frequency –terms that are frequently mentioned in individual documents improve recall idf - inverse document frequency –inversely proportional to the number of documents which mention a term –prefers discriminating terms

17 Round Robin Take the first document from collection 1 Then the first document from collection 2 and so on for each collection then the second document from collection 1 and so on

18 Relevance based methods Calculate Relevance for the documents returned by each selected source Try to calculate some global statistics Use some special measures

19 Other Alternatives Random Firstcome first show etc ….

20 Conclusions Parallel Searching is one way to speed up searching Distributing Information can help ease/speed searching and but has some dangers Some solutions to the results fusion problem


Download ppt "Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed."

Similar presentations


Ads by Google