Presentation is loading. Please wait.

Presentation is loading. Please wait.

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.

Similar presentations


Presentation on theme: "Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting."— Presentation transcript:

1 Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting 24 April 2006 Boston, MA

2 SEARCH ALL OF THESE SOURCES ONE AT A TIME

3 OR SEARCH THEM ALL AT ONCE

4 Finding the Gold Hidden in the World Wide Web “Google-type” search engines “pan” the surface web for gold “Deep Web” search engines go mining for gold

5 Finding the Gold Hidden in the World Wide Web “Google-type” search engines “pan” the surface web for gold “Deep Web” search engines go mining for gold

6 Challenges Overview Managing a large number of sources Searching a large number of sources in parallel Organizing and ranking the results returned

7 Challenges of Managing Thousands of Data Sources Locate Reliable Sources Categorize Sources by Content Configure Sources for Searching Maintain Sources 4

8 Challenges in Searching Thousands of Sources Automatically Select Sources to Search Retrieve Results from Cache 5 Perform Many Searches in Parallel Bring Back Best Results

9 Source Selection Optimizer Search Conductor Source Selection Optimizer Source Descriptions Previous Results

10 Caching of Search Results Reduces the load (cost) of accessing sources CHALLENGES Requires a large database Need to determine how often to update the cache Works best with lots of users doing similar searches

11 We Address Scalability Through a Grid-Based Solution Uses open standards (Web Services, WSDL, SOAP, XML) Runs on distributed nodes Is platform independent (Java based) Very flexible, providing a framework for integration of various filtering and analysis tools

12 Distributing the Workload as Grid Services

13 Select sources to search Can I get more results from “good” sources? Enough good results? YES Deliver results to user YES NO Perform Search Get Next Results Search Conductor

14 Searching a large number of sources can lead to a flood of results

15 Challenges in Organizing and Ranking Results 5 Multi-tier Relevance Ranking User-driven Ranking Clustering of Results

16 Multi-tier Relevance Ranking QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet MetaRank – Ranks results utilizing custom algorithms applied to meta- data DeepRank – Downloads and indexes full-text documents HEAVY LIFTING REQUIRED!

17 User-driven Ranking Credibility of source Date range Document length Document type Geographic proximity Popularity of document Reading level Relevance Desired: Blending (weighing) of above criteria

18 Clustering

19 A Grand Challenge for Federated Search Source: Walter Warnick, Ph.D., DOE OSTI. Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science. Presented at the Annual Meeting of the American Association for the Advancement of Science, February 16-20, 2006.

20 Mathematician’s Scientific Discovery Biology Researcher’s Scientific Discovery Physics Scientific Discovery Math Databases: Research Papers Correspondence Conferences Biology Databases: Research Papers Correspondence Conferences Physics Databases: Research Papers Correspondence Conferences Global Discovery Search Portal Math Community Biology Community Physics Community Knowledge Diffusion in Action

21 Grid of Grids Each circle = a portal with 10- 100 sources End result is thousands of sources in 2 hops Scaling to the Next Level

22 Abe Lederman 122 Longview Drive Los Alamos, NM 87544 abe@deepwebtech.com www.deepwebtech.com 12 Thank You!


Download ppt "Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting."

Similar presentations


Ads by Google