Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Open Source Intelligence: Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC IOP 06 Sheraton Premier, Tysons Corner, Virginia January.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
GOOGLE SEARCH ENGINE Presented By Richa Manchanda.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
© 2009 Deep Web Technologies, Inc. Federated Search: A Tool for Knowledge Discovery iGroup Online Education Conference Presented by Abe Lederman Founder.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Not All Federated Search Engines are Created Equal Abe Lederman, President and CTO Deep Web Technologies, Inc. Next Generation Library Technologies, May.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Federated Search: True Enterprise Search Abe Lederman, President and CTO Deep Web Technologies Search Engine Meeting – April 28-29, 2008.
Global Discovery: Turning Vision into Reality Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC Symposium: Global Discovery on the.
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Abe Lederman, President and CTO Deep Web Technologies 2008 STIP Working Meeting, April 23, 2008 Federated Search: The Technology For Making Global Discovery.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
1 Chapter 11 Implementation. 2 System implementation issues Acquisition techniques Site implementation tools Content management and updating System changeover.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
© 2011 Deep Web Technologies, Inc. By Abe Lederman President and CTO June 26, 2011 Understanding Differences Between Federated Search and Discovery Services.
© 2012 Deep Web Technologies, Inc. 03 December 2012 By Abe Lederman, CEO Deep Web Technologies Show and Tell Presentation to.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
DISTRIBUTED COMPUTING
© 2010 Deep Web Technologies, Inc. By Abe Lederman President and CTO Explorit Federated Search.
© 2009 Deep Web Technologies, Inc. Federated Search Presentation Explorit Research Accelerator Focus Deep. Get Results.
© 2013 Deep Web Technologies, Inc. Abe Lederman President and CTO Deep Web Technologies ANKOS 2013 Annual Meeting April 26, 2013 Federated Search: A Discovery.
Applying Grid Computing Research to Commercial IR Applications Presented by Carl Sylvia, SBIR Project Manager Deep Web Technologies, LLC GGF-14 – June.
Not All Federated Searches are Created Equal Abe Lederman, President and CTO Deep Web Technologies Thomson Scientific Government Event, April 10, 2008.
© 2012 Deep Web Technologies, Inc. SwetsWise Medical Searcher Powered by Explorit Research Accelerator By Abe Lederman President and CTO July 15, 2012.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Abe Lederman, President and CTO Deep Web Technologies, Inc. ScienceEducation.gov Meeting National Academy of Sciences, March 18, 2009 A Look at the Technology.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
1 Federated Search (Emphasizing WorldWideScience.org) as a Transformational Technology Enabling Knowledge Discovery InterLending and Document Supply Conference.
1 XML Based Networking Method for Connecting Distributed Anthropometric Databases 24 October 2006 Huaining Cheng Dr. Kathleen M. Robinette Human Effectiveness.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Uniting Global Information with Federated Search Abe Lederman, President, Deep Web Technologies Dr. Rosanne Hessmiller, CEO, Ferguson-Lynch Presentation.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 OSTI - Accelerating Science Information Dr. Walter L. Warnick Director U.S. Department of Energy Office of Scientific and Technical Information Federal.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Dr. Walter L. Warnick Director Office of Scientific and Technical Information Office of Science ARPA-E June 24, 2010 Innovative Web Resources Can Advance.
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC 28 th.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
Saving Time with Federated Search Abe Lederman, President, Deep Web Technologies Terry Colby, Director of Sales, Deep Web Technologies Websearch University,
Taking the Library Back from Google Abe Lederman, President and CTO October 18-20, 2007.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Google App Engine. Contents Overview Getting Started Databases Inter-app Communications Modes.
OARE Module 5A: Scopus (Elsevier)
Accessing the VI-SEEM infrastructure
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Building Search Systems for Digital Library Collections
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Federated & Meta Search
Web Mining Department of Computer Science and Engg.
By Abe Lederman President and CTO June 26, 2011
Uniting Global Information with Federated Search
Uniting Global Information with Federated Search
Access to Quality, Deep Web Research Content
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting 24 April 2006 Boston, MA

SEARCH ALL OF THESE SOURCES ONE AT A TIME

OR SEARCH THEM ALL AT ONCE

Finding the Gold Hidden in the World Wide Web “Google-type” search engines “pan” the surface web for gold “Deep Web” search engines go mining for gold

Finding the Gold Hidden in the World Wide Web “Google-type” search engines “pan” the surface web for gold “Deep Web” search engines go mining for gold

Challenges Overview Managing a large number of sources Searching a large number of sources in parallel Organizing and ranking the results returned

Challenges of Managing Thousands of Data Sources Locate Reliable Sources Categorize Sources by Content Configure Sources for Searching Maintain Sources 4

Challenges in Searching Thousands of Sources Automatically Select Sources to Search Retrieve Results from Cache 5 Perform Many Searches in Parallel Bring Back Best Results

Source Selection Optimizer Search Conductor Source Selection Optimizer Source Descriptions Previous Results

Caching of Search Results Reduces the load (cost) of accessing sources CHALLENGES Requires a large database Need to determine how often to update the cache Works best with lots of users doing similar searches

We Address Scalability Through a Grid-Based Solution Uses open standards (Web Services, WSDL, SOAP, XML) Runs on distributed nodes Is platform independent (Java based) Very flexible, providing a framework for integration of various filtering and analysis tools

Distributing the Workload as Grid Services

Select sources to search Can I get more results from “good” sources? Enough good results? YES Deliver results to user YES NO Perform Search Get Next Results Search Conductor

Searching a large number of sources can lead to a flood of results

Challenges in Organizing and Ranking Results 5 Multi-tier Relevance Ranking User-driven Ranking Clustering of Results

Multi-tier Relevance Ranking QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet MetaRank – Ranks results utilizing custom algorithms applied to meta- data DeepRank – Downloads and indexes full-text documents HEAVY LIFTING REQUIRED!

User-driven Ranking Credibility of source Date range Document length Document type Geographic proximity Popularity of document Reading level Relevance Desired: Blending (weighing) of above criteria

Clustering

A Grand Challenge for Federated Search Source: Walter Warnick, Ph.D., DOE OSTI. Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science. Presented at the Annual Meeting of the American Association for the Advancement of Science, February 16-20, 2006.

Mathematician’s Scientific Discovery Biology Researcher’s Scientific Discovery Physics Scientific Discovery Math Databases: Research Papers Correspondence Conferences Biology Databases: Research Papers Correspondence Conferences Physics Databases: Research Papers Correspondence Conferences Global Discovery Search Portal Math Community Biology Community Physics Community Knowledge Diffusion in Action

Grid of Grids Each circle = a portal with sources End result is thousands of sources in 2 hops Scaling to the Next Level

Abe Lederman 122 Longview Drive Los Alamos, NM Thank You!