The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Slides:



Advertisements
Similar presentations
eClassifier: Tool for Taxonomies
Advertisements

Towards a simpler and more efficient BR June 19, 2007 ICES-III Montréal (QC)
- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.
T HE V ALUE OF E NTERPRISE S EARCH Robert Gill & Pieter-Jan De Boeck.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Your Data Any Place, Any Time Manageability. SQL Server 2008 Manageability Challenges Challenges face database administrators today : Managing complex.
© 2009 Oracle Corporation Oracle APEX Forms Conversion Overview.
Course name : SAP BO 24*7 technical support faculty : Realtime experience Rs Trainings: is a brand and providing quality online and offline trainings to.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Hydrological information systems Svein Taksdal Head of section, Section for Hydroinformatics Hydrology department Norwegian Water Resources and Energy.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
An Apartment Industry Overview Presented by Jon Tull eREI & Lead2Lease™ Lead Management.
Sensible Searching: Making Search Engines Work Dr Shaun Ryan CEO S.L.I. Systems
Portal Plans in Nordic Enterprises April 26, 2006 Per Andersen Managing Director IDC Nordic.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Business Intelligence Michael Gross Tina Larsell Chad Anderson.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
WUCM1 exam 1WUCM1. Exam format DURATION: 2 HOURS INSTRUCTIONS – Answer all questions in Section A (50 marks) and two questions from Section B (25 marks.
Lecture Nine Database Planning, Design, and Administration
Overview of Search Engines
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
GLOWEBSTORE GLODASH – General Buying Guide. Packaging Dashboards GLODASH – Explaining Packs Each Pack consists of list of Dashboards Each Dashboard has.
Codesoft Label Creation. Codesoft Edition Overview Editions of Codesoft include: Codesoft Print Only – open and print files created in other versions.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Stefan Kreckwitz Senior System Engineer across Systems GmbH „Future Web-Based Translation Environments“ Localisation Research Forum 28 September 2007,
ArcGIS Workflow Manager An Introduction
CPTE 209 Software Engineering Summary and Review.
Application Software.
DBS201: DBA/DBMS Lecture 13.
New Tools to Increase Sales And to Enhance The User Experience.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
PeopleFinder: Searching for People, not just for Documents Technologies for Knowledge Sharing ICT-Centre CSIRO Alistair McLean, Anne-Marie Vercoustre,
Databases and Education Access Access Course Progression Access courses can be designed for intensive immersion or semester-long courses. Basic.
SharePoint Enterprise Aggregation Caching Feature Product Overview Nimrod Geva Product Group Manager, KWizCom
SYSTEMATIC THOUGHT LEADERSHIP FOR INNOVATIVE BUSINESS Marek Kowalkiewicz, Konrad Juenemann SAP Research Improving information quality in exchanges.
Welcome To Business Summary DiveIn Incorporated is a small company that specializes in the sales of swimming pools supplies to homeowners by mail order.
The Connected Productive Enterprise Technology Enabled Business Productivity in the Digital Decade.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
© RightNow Technologies, Inc. Ask The Experts: Getting the most out of Smart Assistant David Fulton, Product Manager, Web Experience Center Of Excellence,
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Module 10 Administering and Configuring SharePoint Search.
Dan Grady The search for the killer productivity application is over… Copyright 2009, Information Builders. Slide 1.
Introduction FREE Application Performance Analysis Workload Performance Series Software Software Installation Procedure Initial Performance Review Process.
Cloud Computing Best Practices - Templates- Documents and Examples of Cloud Computing in the Public Domain PLUS access to downloadable files 1 Book Benefits:
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
Disaster Recovery Best Practices - Templates- Documents and Examples of Disaster Recovery in the Public Domain PLUS access to content.theartofservice.com.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Learning Objectives Understand the concepts of Information systems.
Job Clouds Presented by: Laura Bright and Brian Lewis May 1st, 2006 Semantic Web / INF 385T.
File: 05_RETS_Implementation_Best_Practices.PPT 1 RETS Implementation Practices RETS Implementation Best Practices  Business Case  RETS Features  RETS.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Making Web Analytics Actionable Peter O’Neill Freelance Web Analytics Consultant Scottish Web Folk 28 th Nov 2008.
Searching the Web for academic information Ruth Stubbings.
Project Management: Messages
Moveware Client Wiki.
Overview & Applications Welcome!
Successful Website Accessibility Testing
Agenda Context of the BR Redesign Redesign Objectives Redesign changes
Chapter 11 user support.
How to Optimize your Knowledge Foundation
Supporting a Business Process
Deep SEARCH 9 A new tool in the box for automatic content classification: DS9 Machine Learning uses Hybrid Semantic AI ConTech November.
Supporting a Business Process
Data Warehouse and OLAP Technology
Presentation transcript:

The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

UK Customers From 2004/5: Staffordshire University, Scottish Care Commission From 2009:The Electoral Commission, Digital UK, Hargreaves Lansdown From 2010: London School of Economics and Political Science, Incisive Media, British Medical Journal, East Ayrshire Council,...

Search is life

Costs of poor search Butler Group: Up to 10% of salary costs wasted through ineffective search IDC: A company with 1000 information workers can expect to waste more than $5M p.a. due to poor search Accenture: A survey of 1000 middle managers spend as long as 2 hrs/day searching for information.

Who's the SearchMaster in your organisation?

Stakeholders expect every SearchMaster to do her duty! To make external website search work –Sales conversions –Information dissemination –Reduced inquiry handling load To provide effective search of corporate information –Happy, productive employees (plus students and other stakeholders)

Give them the tools and they will do the job! Searchmaster End-user Simple Powerful

1. The basic search tool Should: –Have good performance out of the box, without weeks of implementation. –Be simple to configure –Avoid features which are too complex to use or set up. –Be able to cover your content and scale to the necessary level

2. FineTuner Every search deployment is different –Web, database, fileshare, Lotus The weighting of ranking features must accommodate to the differences Manual tweaking is fraught with danger –Fix one query, break a dozen Make a test file and use a tuning tool to learn feature weightings

Testfile Desiderata Representative of real workload –Need an unbiased sample Many queries (typically >> 100) Multiple weighted answers (where applicable) Redirects Equivalent answers See es.csiro.au/C-TEST/

Academic Research on Evaluation Masses of academic research How does it translate to tuning an enterprise search system? –Setting good defaults –Tuning to specific characteristics in hundreds of customer deployments Note: the system starts with no user interaction data. Creation of testfiles must be affordable.

Spreadsheet testfile

LSE Case Study

Sources of testfiles at LSE A-Z Sitemap (>500 entries) –Biased toward anchortext Keymatches file (>500 entries) –Pessimistic Click data (>250 queries with > t clicks) –Biased toward clicks – 100% success! Pop/crit queries (134 manually judged) All biased – Use a sampling tool!

1 2 3 dim 2 dim1 Dimension-at-a-time tuning

Popular/Critical Set

Fine Tuning Summary Tuning a large number of dimensions (Funnelback FineTune covers 38) Millions of query executions Achieves substantial gains

But why do queries still fail? Misspelled –Europian Conferense oninformation retreival Query words don't match document –door or MOPEM v. manually operated personnel egress mechanism There is no answer to that question. –Maybe there should be –Scope issues.

Need more tools!

3. Spelling suggestion tools Suggestions may be useful even if words are correctly spelled: –Carlton furball club Carlton football club Suggestions based on whole query, not word-by-word Don't suggest queries which make no sense in the collection being searched Autocompletion: Guide users to the best query Context is king

4. Query expansion tools Manual rules: –Rego [registration rego] –MOPEM [manually operated personnel egress mechanism door] Related queries (automatic) –Based on co-clicking Contextual navigation (on-the-fly) –Finding superphrases in a deep result set Faceting (semi-automatic)

5. Reporting and alerting tools Reporting on Queries which: –Produced no results –Logged behaviour suggestive of unfulfilment Alerting when: –Submissions of a query (or group of related queries) sharply increase in frequency For: –business intelligence –Triggering creation or changes to content

Query Spike Alerting

Conclusions Search is important Organisations benefit when someone takes responsibility for effective search – the SearchMaster. Academic research into evaluation needs careful translation for use in enterprise search tuning. Further tools are needed to overcome poor queries and missing content. Thanks to Mike Swanson of Oxfam Australia for the Ned Kelly line.