1 1 A New Content Processing Framework for Search Applications Iain Fletcher

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

Luna imaging, inc hayden ave bldg. one culver city, ca ph fax Insight User Group Meeting.
Enabling Technology Innovation using Open Source Software
What’s New for 2013 Steve Allen CEO, iDatix Corproation.
© eComScience Pvt. Ltd OffshoreSoftwareDevelopment eComScience Consulting perspective.
Upgrading the Oracle Applications: Going Beyond the Technical Upgrade Atlanta OAUG March 19, 1999 Robert Cooney.
Enterprise Search with FAST Rick McDannel Manager of Information Technology.
1 Enterprise Search at A.T. Kearney Amin Negandhi Co-Founder, Partner, Echelon Consulting, LLC An overview of the industry leading search toolsets that.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Apache Struts Technology
Merit Consulting Terje Myrseth MUA – October 2008.
High Availability Options for JD Edwards EnterpriseOne Shawn Scanlon, GSI.
1 Aspire Latest Developments Steve Denny 1.
A2Zeurope Strategic and Operational Consulting November , Place Vendôme Paris - Phone: +33 (0) Fax: +33 (0)
Free Mini Course: Applying UML 2.0 with MagicDraw.
Ngo Van Trung OSS Founder & CEO Magento Overview How to Start a Magento Business.
Tom Sheridan IT Director Gas Technology Institute (GTI)
Tableau Visual Intelligence Platform
Community of Science The Leading Internet Site for Researchers Worldwide
Review Concept of Operations for an Enterprise Architecture Intelligence Center Haiping Luo Note: This presentation is my own thinking, with valuable input.
Knowledge Portals and Knowledge Management Tools
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
Create Award Winning Websites! A Look at the AAM Website Award Winners Presenter: Gail Perry, CPA Publisher/Editor-in-Chief AccountingWEB Sponsored by:
Tableau Visual Intelligence Platform
SharePoint 2010 Business Intelligence Module 3: Business Intelligence Center.
© 2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice. Automation – How to.
Belnet Antispam Pro A practical example Belnet – Aris Adamantiadis BNC – 24 November 2011.
John Chen Chairman, CEO, and President. Opposing Forces Client/Server Explorer COM Distributed C Clusters Mainframe Netscape CORBA Centralized Java MPP.
UNIT-V The MVC architecture and Struts Framework.
Federated Search: True Enterprise Search Abe Lederman, President and CTO Deep Web Technologies Search Engine Meeting – April 28-29, 2008.
Building Public Facing Websites with SharePoint 2010 Prepared for ILTA’s SharePoint for Legal Symposium June 16 th, 2010 George Durzi Principal Consultant.
Wes Preston Agenda  Quick Intro  Overview  Site Details  Notes and Resources  Questions.
ArcGIS Workflow Manager An Introduction
1Copyright © 2012, Oracle. All rights reserved.Confidential Oracle Technology Day May 21, 2013 Art Pasquinelli Director, Repositories and Preservation.
1 Chapter 11 Implementation. 2 System implementation issues Acquisition techniques Site implementation tools Content management and updating System changeover.
Leveraging Oracle Data for Web- Based Reporting Northern California Oracle Users Group May 2001.
OFC 200 Microsoft Solution Accelerator for Intranets Scott Fynn Microsoft Consulting Services National Practices.
COMP-14: Automating your deployments using ANT Gary S Clink Business Consultant.
Strategies for Innovation Sourcing 30 August 2007 Paul McGowan Center for Innovative Technology Herndon, VA / Strategies.
Using Business Scenarios for Active Loss Prevention Terry Blevins t
Irene Khan – Secretary General Building effective and responsive INGOs, the strategic role of HR: The IS Job Value Review 8 February 2008.
SEO ENRICH YOUR MARKET BY SMART SEARCH SOLUTIONS1.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Building Tomorrow’s Corporate Portal David C. Hastings Director, Solutions Management
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
© 2000 Mosaic, Inc. MSTAR TM ( M osaic’s S tructured T esting and A ssessment R epository) “Mosaic’s risk based testing methodology, MSTAR , provides.
Page 1 © 2001, Epicentric - All Rights Reserved Epicentric Modular Web Services Alan Kropp Web Services Architect WSRP Technical Committee – March 18,
IBM Express Runtime © 2007 IBM Corporation 1 Cognos needed to supply customers with additional choices and complete flexibility as they design and deploy.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
System Center Operations Manager 2007 Overview Amit Gatenyo Infrastructure & Security Team Lead Dario.
Chapter Fourteen Communicating the Research Results and Managing Marketing Research Chapter Fourteen.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration.
THE 1 ST NATIONAL PUC DOCKETS DATABASE: AEE POWERSUITE Eric Fitz Director, Engineering and Product Development NARUC Subcommittee on Information Services.
APACHE STRUTS ASHISH SINGH TOMAR ast2124. OUTLINE Introduction The Model-View-Controller Design Pattern Struts’ implementation of the MVC Pattern Additional.
Navigation Framework using CF Architecture for a Client-Server Application using the open standards of the Web presented by Kedar Desai Differential Technologies,
Business Data Integration with MOSS 2007 Naveedullah Khan PMP, MCAD.NET Senior Consultant.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Introducing the New iManage Dan Carmel, Chief Marketing Officer.
Apache Struts Technology A MVC Framework for Java Web Applications.
Not For Distribution – Private and Confidential to Adaptive Planning and Client 1 The Global Leader in Cloud CPM FP&A Best Practices Review for GPUG Upstate.
(OBIA) Training & Placement Program By Keen IT To request free demo session please mail us at
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
SDL Tridion Presentation Frameworks
ABOUT COMPANY Janbask is one among the fastest growing IT Services and consulting company. We provide various solutions for strategy, consulting and implement.
Ingres, Enterprise OpenSource September Agenda  Ingres Company v2  Enterprise… Open Source  The Appliance concept  Ingres Appliances offering.
Web Engineering.
Presentation transcript:

1 1 A New Content Processing Framework for Search Applications Iain Fletcher

2 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 2

3 Search Technologies overview 3 The leading IT services company focused on search engines Consulting Implementation Managed services Technology independent, working with most of the leading search engines 90 staff, 250+ customers

4 Search Technologies overview San Diego, CA San Jose, CR Herndon, VA Ascot, UK Boston, MA Cincinnati, OH

5 Executive team ExecutiveEnterprise Search Industry Experience Kamran Khan President & CEO 18 years: International Sales, VP Sales, Executive John Steinhauer VP Technology 16 years: Development Management, Project Management, Executive Paul Nelson Chief Architect 22 years: Development, Innovation, Architecting, Dev. Management Graham Charlesworth VP Europe 16 years: Business Development, VP Sales, Executive Phil Lewis Tech. Director, Europe 19 years: Development, Innovation, Architecting, Project Management Dennis Tran VP & Founder 21 years: International Sales, VP Sales John Back VP Sales 15 years: Sales, Federal Sales Director Iain Fletcher VP Marketing 16 years: International Sales, Product Management, VP Marketing # years in the search engine industry 5

6 Selected customers 6

7 7 A New Content Processing Framework for Search Applications

8 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 8

9 Enterprise Search - An Indifferent Reputation Major surveys show that no progress has been made during the last 10 years Searchers are successful in finding what they seek 50% of the time or less 2001, IDC, “Quantifying Enterprise Search” More than half cannot find the information they need using their Enterprise search system 2011, MindMetre/SmartLogic, “Mind the Enterprise Search Gap” 9

10 Search Fundamentals 10

11 Metadata Supports Relevance Ranking

12 Metadata Supports Relevance Ranking Supported by great metadata! Title Meta description URL Inbound links Alt tag text Etc. Provided for free by millions of SEO practitioners

13 Key Issues Almost all modern search functions are driven by data structure 13

14 Key Issues The majority of serious problems in serious search systems are caused by data quality issues Also... “Big Data” and BI from unstructured data will face the same challenges Can you trust an analysis if you are unsure of data providence? 14

15 Data quality examples The subscription portal caught out by template information The Intranet search skewed by a new piece of hardware The Intranet search where great quality was the problem! 15

16 Key Issues Data structure and quality issues are addressed in the indexing pipelines of search engines Cleaning, enriching, normalizing, granularizing... It is about process as much as technology And data constantly evolves Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency) Some search engines don’t really have one We’ve written our own 16

17 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 17

18 Document Processing Methodology for Search (DPMS) The Philosophy Understand the Document Model Understand the User Model Includes business-level requirements Create the Search Engine Model Search = the pivot point between User and Data Document everything 18

19 DPMS – The Methodology Assessment (Search Technologies Architect and Business Analyst) DPMS Analysis (Knowledge Engineer, Business Analyst, etc.) Assessment Report Expert assessment and recommendations Validation Aspire DMDs Review (Architect, Domain Experts, Peers) 1 Assessment 2 Detailed Analysis 3 Execution Implementation (Developer) Validate DMDs Search Engine

20 DPMS – The Implementation

21 Introducing “Aspire” Think of it as a stand-alone indexing pipeline with a framework + component architecture Framework built for scalability, performance and flexibility – designed to use cloud elasticity Components built to be autonomous and transparent

22 Technology Suite 100% Java OSGi™ See The Dynamic Module System for Java™ Apache Felix Open source implementation of OSGi Jetty Embedded HTTP server Maven & Maven Repositories For component deployment

23 Component Configuration Any number of document processing pipelines can be used in an application Disparate data sources will need different treatment Components can be shared where appropriate Configurations are easy to change 23

24 Component autonomy Components communicate via XML Each component has a known and transparent input and output, and can be tested in isolation This simplifies problem diagnosis, promotes transparency and controls cost-of-ownership 24

25 Data Quality Monitoring Components have built-in quarantine systems to monitor data quality Content is constantly evolving This provides transparency and enables content issues to be diagnosed and resolved faster 25

26 The Component Library Search Technologies maintains a library of components Currently there are more than 70 Components can be as simple as 3 lines of groovy script, or complex, 3 rd party technologies Many applications can be addressed using existing components + configuration 26

27 Component Upgrading Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system Helpful in the maintenance of complex or mission-critical systems 27

28 Component control Every component has its own control / status page 28

29 A very simple example

30 Security expansion example

31 Patent Assignee Name Normalization

32 Complexity example 32 CPA Global Discover The world’s leading patent research portal 80 million patents from 95 patent offices More than a dozen navigators built Numerous graphical search results display options Whole document comparison features

33 In Summary Many applications today don’t need this level of diligence But as data and data dynamism grows, more will A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies Method. Diligence. Transparency – its not rocket science... Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years 33

34 Thank You! Iain Fletcher