Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Andreas Wagner – CERN IT/OIS Eduardo Alvarez – CERN IT/OIS Sergio Fernandez – CERN.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Andreas Wagner – CERN IT/OIS Eduardo Alvarez – CERN IT/OIS Sergio Fernandez – CERN."— Presentation transcript:

1 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Andreas Wagner – CERN IT/OIS Eduardo Alvarez – CERN IT/OIS Sergio Fernandez – CERN IT/OIS

2 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Summary Introduction to search Inside CERN Search New Search Solution –Concepts, collections, pipelines, stages, architecture –Search features Demo Conclusions and future work Presentation Title - 2

3 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS What is Search? Search is the art of balancing three factors: –Recall How many matching documents were returned? –Precision Of returned documents, how many match the query? –Relevancy How well does a document match the query? Presentation Title - 3

4 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Enterprise Search Wide range of document sources: CERN Search - 4 Web Pages File systems Databases Directories (People and Places) Document repositories (CDS, EDMS, Indico, …) Structured CMS Data Sharepoint, Drupal, Twiki Variety of meta data Different Access Protection Schemes Different retrieval methods and frequencies

5 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Enterprise Search Components of Enterprise Search: –Search Engine / Search Technology –Integration within existing infrastructure (authentication, authorization) –Document retrieval Not only Web pages Database/XML data (CDS, Indico, Phone data) –Protected documents Access for document data In addition information about ACLs needed –Ranking of documents –Enterprise Search is not only a question about the search technology used! CERN Search - 5  collaboration with data owners

6 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS What about Google What makes Google Web search so good –Huge Web space analysis capabilities, –Huge usage data used for “voting” the results  most popular results are promoted –Substantial resources to tune and correct results; - usage data analysis - taking into account popular events - hand edited results for popular single key word searches –Personalize filter of results Based on : Location, Preferences, search historial, … Above is valid for all public web search engines, Yahoo, Bing At the same time Web Search is not Enterprise search! CERN Search - 6

7 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Summary Introduction to search Inside CERN Search New Search Solution –Concepts, collections, pipelines, stages, architecture –Search features Demo Conclusions and future work CERN Search - 7

8 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Search at CERN? Why Search Service? If… –Every systems usually has its own search system Probably one of the best place for this service Quite a lot different content sources High rate of new content Solutions are not always optimal Centralize the search of content Presentation Title - 8

9 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search A Central Search solution to provide for users –Single entry point for searching information on several content sources at the same time for service providers –Search backend service »TWiki, Drupal, Sharepoint, JACOW, Groups Start of project in February 2006: Based on commercial product from FAST (Microsoft subsidiary and market leader) CERN Search in production since 2007 Present resources 1 PJAS & some fraction of a staff CERN Search - 9

10 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search Last Progress 2009 Migration to FAST ESP 5.3 2010 Reorganization of the Indexed Web Space (Improved relevancy) 2010 – Twiki protected pages indexed –Service used as default Twiki search 1Q 2011 – Indico Protected Docs + Material 1Q 2011 – Index of the Sharepoint content 3Q 2011 – Migration to FAST Search Server 2010 for Sharepoint Presentation Title - 10

11 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Overview of Indexed documents CERN Search - 11

12 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Summary Introduction to search Inside CERN Search New Search Solution –Concepts, collections, pipelines, stages, architecture –Search features Demo Conclusions and future work CERN Search - 12

13 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS ConceptsI Document Pipeline Processing Stage Presentation Title - 13 Collection Crawler (Files, Web) Collection A Collection B

14 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search - 14 Concepts II Content API Query API Filter API Connectors (Push&Pull) Document retrievalDocument indexingDocument processing Document Content Flow

15 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search - 15 Indexing Protected Content I To allow indexing protected content we need to Retrieve the document Search engine needs access to document Obtaining document ACLs To be able to decide who is allowed to find a document Often not trivial since most systems answer the question: “Has a given user the right to access a given document?” and not “Tell me who has access to a given document?” This is due to often complex permission models including inheritance, fine granularity of permissions and changing permission during document lifecycle …

16 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search - 16 Indexing Protected Content II Document Processing Resolve ACLs to SIDs Sent to Indexer with document FSA (FAST Security Authorization) Component Active Directory integration, i.e. based on CERN accounts and e-groups Search Index CERN Search Document Repository Document Processing Active Directory Users & Groups Doc + ACL ACL Document

17 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search - 17 Authentication / Authorisation CERN Search Active Directory Users & Groups Search Index Search Front End Query & Identity Group Membership Authentication (SSO) & Search Query Processing Authentication by Front-End FSA creates filter with expanded user credentials and groups

18 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS FAST Search for Sharepoint Cluster Architecture Presentation Title - 18

19 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Index Profile Final representation of each document Set of attributes to index (Managed Prop) –Title –Author –Last modified date –ACLs Define properties queryables, refiners, sort Define FullTextIndex Properties Define mappings to FullTextIndex Flexible Presentation Title - 19

20 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Result Ranking – Rank Profiles CERN Search - 20

21 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Ranking Issues at CERN Flat Web space –Lack of metadata (Copy-Paste, not well meta html tags,...) –Isolated sites (not many inter-links, only CERN main page) Good experience with well structured content –Indico, CDS How to improve ranking? –Manual Tuning of results, promote, demote –Modify rank profile –Custom processing stage for static rank points Not easy, –Manpower intensive –Better understand of data indexed –Not magic solution, balance rank profile for different collections CERN Search - 21

22 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Changes on FAST ESP products Before only one product –FAST ESP 5.3 (Standalone product) Now, several possibilities –FAST FAST Search Server 2010 for Internal Applications (FSIA) FAST Search Server 2010 for Internet Sites (FSIS) –Microsoft + FAST FAST Search Server 2010 for Sharepoint (FS4SP) –Same core –Configuration and OTB pipeline adapted for Sharepoint –Reduced set of tools, others migrated to Sharepoint or Powershell cmdlets Presentation Title - 22

23 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS FAST Search for Sharepoint Arquitecture Overview Presentation Title - 23

24 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS FAST Search for Sharepoint Topology Presentation Title - 24 Sharepoint Crawler Sharepoint Sites Web Sites File Shares Exchange public folders Lotus Notes FAST Enterprise Crawler Search Centre

25 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Server Architecture Two systems (Production + Dev) Using Sharepoint Central Service Production –1 admin node –1 crawler + pre-processing node –4 nodes index cluster Both roles Indexer and Search 2 rows –Backup –Query performance 2 columns –Easy handle more than 30 million documents –High reliability on critical components Content Distributors, QueryServers, Document Processors Presentation Title - 25

26 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Fast Search for Sharepoint New features (I) New Query Suggestions model –Based on dictionary and common user queries Best Bets & Visual Best Bets Custom search experience (per user/role) New management system (microsoft style) –SCOM, Powershell,… Sharepoint integration Phonetic and nickname search Thumbnails and previews in results Presentation Title - 26

27 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Fast Search for Sharepoint New features (II) Entity extraction Office Web Apps integration Relevance improvements with social behaviour –Click-through relevancy Enhanced Results Refinement –Deep results refinement –Based on any managed properties –Similar results Federation Search Presentation Title - 27

28 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Migration Process Migrate Pipelines Adapt Retrieval and Pre-processing scripts Port Custom processing stages Migrate feed process to use Sharepoint Crawlers (Files Shares) Customize Search Centre to offer same functionality than old system Create general helpers tools –Manage index profile –Manage keywords, best bets,… Presentation Title - 28

29 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Examples Best Bets & Visual Best Bets Presentation Title - 29

30 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Examples Visual Refiners Presentation Title - 30

31 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Examples Federation search examples (google, bing, twitter) Presentation Title - 31

32 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Search Driven Application Presentation Title - 32

33 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Summary Introduction to search Inside CERN Search New Search Solution –Concepts, collections, pipelines, stages, architecture –Search features Demo Conclusions and future work CERN Search - 33

34 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Summary Introduction to search Inside CERN Search New Search Solution –Concepts, collections, pipelines, stages, architecture –Search features Demo Conclusions and future work CERN Search - 34

35 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Conclusions Succesfully migrated all the content from old system –Experience in the same technology Reduced tools and help for other content than Sharepoint But, –New interesting features, Sharepoint integration –Complete Search Centre More community behind High cohesion between Sharepoint and Search Services Presentation Title - 35

36 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Next Steps Integration with Drupal –Customized pre-processing, processing, index and query Index SSO Centrally Manage Sites –Own SSO Crawling, Get ACLs, processing Continue evolving the new system –Take advantage all FS4SP features Office WebApps, Visual Refiners, phonetic search,... –Together with content providers improve Relevancy, Best Bets,... Presentation Title - 36

37 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search: http://cern.ch/search http://cern.ch/search and also via: –CERN Intranet & Public Pages –TWiki –IT, HR, PH Websites –JACOW CERN Search @ CERN Search - 37

38 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS CERN Search - 38


Download ppt "CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t OIS Andreas Wagner – CERN IT/OIS Eduardo Alvarez – CERN IT/OIS Sergio Fernandez – CERN."

Similar presentations


Ads by Google