Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gregory Grefenstette Exalead Exalead S.A. © 2009 Search-Based Applications: the Maturation of Search.

Similar presentations


Presentation on theme: "Gregory Grefenstette Exalead Exalead S.A. © 2009 Search-Based Applications: the Maturation of Search."— Presentation transcript:

1 Gregory Grefenstette Exalead Exalead S.A. © 2009 Search-Based Applications: the Maturation of Search

2 Maturation of Search Full text indexing Term weighting Stemming, morphological analysis Phrasal indexing Numerical fields Format extractors Indexing schemes Web search Link analysis Freshness Spam detection Facets, Categories Processing schemes Suggested queries Multimedia Database connectors XML, structured data Reporting Search as a Service

3 3 8 billion URLS, 2 billion images, 200 million videos Wikipedia, cloud tags also Labs.exalead.com

4 Two ways to find information 44 DATABASES SEARCH ENGINES VS

5 Recent Past 5 SEARCH ENGINES DATABASES Structured Data Transaction Precise All tuples SQL Slow Structured Data Transaction Precise All tuples SQL Slow Text Similarity Ranking Intuitive Fast Partial Text Similarity Ranking Intuitive Fast Partial

6 More Recent 6 DATABASES Structured Data Transactions Precise All tuples SQL Slow Structured Data Transactions Precise All tuples SQL Slow Text Similarity Ranking Intuitive Fast Partial Text Similarity Ranking Intuitive Fast Partial Top-K Column store Map Reduce Data Cube Top-K Column store Map Reduce Data Cube Connectors Facets Map Reduce Tables Connectors Facets Map Reduce Tables SEARCH ENGINES

7 NOW DATABASES SEARCH BASED APPLICATIONS SEARCH ENGINES

8 Search based Application An application which uses a search engine component, but whose final purpose is not searching for a document, but rather a domain-oriented process result –Examples: Custom response management Logistic tracking and tracing Contextual Advertising Database reporting after offloading 8

9 Databases are the backbone of search in information systems Current situation Front-office users Database DataWarehouse DataMart BI reports Business processes

10 Search-enabled application Optimized solution for information access Database DataWarehouse SearchEngine Front-office users BI reports Business processes

11 Drawbacks of Using Database Search As a Component

12 12 Search Based ArchitectureStandard Architecture

13 Two advantages Search Engine App

14 How does a Search Based Application work? 14

15 Business items are concrete objects directly understandable by end-users –Product, Customer, Purchase order, Technical support call Each business item becomes a document Straightforward and simple format of the document index allows performance and ease-of-use Search engine can offer rich and powerful query language that allows to make queries as complex and advanced as SQL despite the flat data model Search Engine must support –typed fields, intra field scope search, category/facets 15 Database converted to Business Items Stored as structured documents

16 Product_IDProduct_NameManufacturer_Names 123control switchACME Inc ; The Control Switch Company; Karl GmbH 124red warning light… Database into structured documents

17 Scope Search Product_IDProduct_Name 123control switch 124red warning light Product_IDManufacturer_ID Manufacturer_IDManufacturer_NAME 345ACME Inc. 8574The Control Switch Company 4483Karl GmbH Product_IDProduct_NameManufacturer_Names 123control switchACME Inc ; The Control Switch Company; Karl GmbH 124red warning light… All the manufacturers of a product are aggregated into a single flat document… … but the manufacturer names can still be searched as individual records with scope search "ACME GmbH " does not match the document here)

18 Hierarchical categories 18 Product_IDColorBrandFragileNb of wheels Wheel type 123RedACMEY32 Product_IDCountry 123France 123UK 123Germany Product_IDAttributes 123 Color/Red ; Brand/ACME ; Fragile/Y ; Nb_wheels/3 ; Wheel_type/2; Country/France ; Country/UK; Country/Germany 124 … Multiple kinds of attributes can be mixed in a same category field. The hierarchical tree structure of the categories preserves the differences between attribute types Multi-valued attributes can also be represented by categories. A single category field can be used to store hundreds or thousands of attribute columns.

19 Multi-dimensional facets 19

20 Multi-dimensional facets Search results facets provide aggregate values computed on- the-fly with the search results list –One single search query can return the equivalent of dozens of GROUP BY SQL clauses –Numerical values associated with facets (count, score, …) can be used to perform complex computations on the results list 20 Search performance is not affected by the size of the category tree –Thousands of attribute types can be represented by categories –Facets are dynamically selected by the search results: the displayed attributes are always consistent with the search query (e.g. color and engine type when searching for a car, screen size and CPU speed when searching for a laptop)

21 CASE STUDY LOGISTICS TRACK & TRACE 21

22 Gefco overview A subsidiary of French car maker PSA (Peugeot, Citroën) –Now does most of its business outside of PSA Logistics operator –Carries cars from factories to dealers (road, rail) –Carries freight (parcels ; originally spare parts) –Supply chain and logistic platform design 3.5B, employees, 100 countries

23 The original pain Classical multi-criteria search over Oracle, 2 million rows Poor performance despite 2 years of optimization –Minute response times –Ask users to do simple queries and preferably at some given hours

24 From forms to a search box 24

25 25

26 New application With operational reporting

27

28 French Post Office 28 Project part of the strategic plan of La Poste to improve customer service Tracing of the mail Tracing of the resolution of incidents Context Management of high volumes : 60 million daily records with a 14 day history Management of peaks of updates per second Internal and external access to the information with respect of confidentiality Stakes DataBase Offloading High scalability and management of large volumes Open API to provide high level applications Exalead Choice Partner

29 French Post Office 29

30 Tracing of incidents Real-time system Used as an internal audit tool for the mail Suggestion of addresses for customers Search in file numbers, addresses, names, etc. Tracing of incidents Real-time system Used as an internal audit tool for the mail Suggestion of addresses for customers Search in file numbers, addresses, names, etc.

31 Case Study: RightMove 31

32 Rightmove: Reduce Costs and Improve Performance through Database 32 2 million real estate ads, 29 million monthly visitors Peak throughput 400 queries per second (QPS) 99.99% availability rate. Stats Replaced 30 Oracle CPUs with 9 search CPUs Reduced cost of search per 100 queries from £0.06 to £0.01. Rapid Time to Market and Development Independence new platform to market in 3 months, data connections handled by a built-in ODBC Connector application customization via open, standards-based APIs IT staff achieved independence to modify or expand functionality Easy scaling by adding inexpensive commodity hardware Gains Improve the End User Experience, with Simpler, More Robust Search and More Timely Data Rightmoves new SBA provides search and navigation features, more intuitive and more powerful, automatically incorporates data facets Data refresh rate of less than 2 minutes. Exalead Choice

33 Advantages of Search Based Applications 33

34 SBA functions SBAs users enjoy : –Multi-criteria search with sub-second response times –Single-box fuzzy search, like on the web –Real-time Operational Reporting with multi-axis analytics on tens of axes simultaneously –Security-aware Data Navigation, drill-down and drill-across –Semantic functions to enhance poorly structured content This is difficult to achieve on a database –Heavy software development, heavy hardware and licensing This is difficult to achieve on a data warehouse –Usability, delayed data, project length, licensing

35 35

36 Conclusions Search engines mature –Structured data, high volume, high speed Search based Applications offer –Usage: Search interface familiar to user –Performance: Search engine geared to search, eases load on database platform –Agility: Original database design untouched, reconfiguring output lightweight 36


Download ppt "Gregory Grefenstette Exalead Exalead S.A. © 2009 Search-Based Applications: the Maturation of Search."

Similar presentations


Ads by Google