Presentation is loading. Please wait.

Presentation is loading. Please wait.

14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation.

Similar presentations


Presentation on theme: "14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation."— Presentation transcript:

1 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

2 14 Mar 05 2 sanofi pasteur The vaccines business of sanofi-aventis Group sanofi-aventis Group Formed in 2004 by the merger of Sanofi-Synthélabo + Aventis 2004 Revenues = 25.4 Billion Euros 100,000 Employees 3 rd largest Pharma company in the world 1 st in Europe sanofi pasteur World leader in Vaccines 2004 Revenues = 1.6 Billion Euros 8,000 Employees Heritage includes Louis Pasteur (1890’s) and other vaccine pioneers (Merieux, Slee)

3 14 Mar 05 3 Global CIO with Global Functional Heads CIOs for N. America and France CIO – R&D CIO – Industrial Operations CIO – Commercial Operations (Sales & Marketing) CIO – Business Support (Functions – Finance, HR, etc.) Director, Global Infrastructure & Operations Director, IS Quality Director, IS Strategic Planning & Innovation Director, IS Strategic Planning & Innovation – responsibilities Transversal role – bridging functions & technologies Manage the Long Range Planning process Manage the Global IS Portfolio Verity Champion – formulate the strategy and foster appropriate pilots and applications sanofi pasteur IS Organization

4 14 Mar 05 4 Verity Experience at sanofi pasteur Pre Verity K2 (through 2003) Limited applications – primarily intranet Verity K2 Acquisition End of 2003 Two primary applications targeted: Improve Intranet search results Global Medical Affairs - share common disease / vaccine information 2004 Verity K2 – Pilots + Applications 4 Pilots to explore taxonomies and multi-repository search Plus 5 Applications Developed with two consultancies: Verity Consulting Services – for French pilots & applications Raritan Technologies, Inc. – for N. American pilots & applications

5 14 Mar 05 5 Search 101 – Basic Concepts Google – a familiar search engine to many Easy to use and results are ranked, often showing the best results near the top of the list (and paid sponsored links on the right). Results are ranked based on Google’s proprietary & typically secret algorithms Users often mention Google when describing the type of search they would like to have

6 14 Mar 05 6 101 - But there is much more Content to Search than what exists on the open Web Enterprise generated content is huge … Office documents eMails Database driven web pages Not to mention other types of media (voice, video, etc.) These estimates are from 2003 Study (UC Berkeley) With some volumes expected to double in 3 yrs The increasing dilemma … How can I find what I need in a timely fashion? Will I be forced to recreate what I can’t find? Annual Information Volumes 1 Media TypeTerabytes 2 Comments Scholarly publications 637,600 titles per year Searchable Web167Openly accessible sites Office Documents 1,39710.75 billion pages per year Deep Web91,850DB driven web sites eMails (originals) 440,60631 billion emails sent per day Hard Disk Drives 1,986,00044 million items per year 1 Source: How much information 2003? 2 Terabyte = 1 million million bytes, or approx 50,000 trees made into paper and printed.

7 14 Mar 05 7 Navigation Browsing for information is the most common way to locate content of interest Typically, information is organized in a hierarchy of folders Taxonomies – can provide the structure and logic But, content must still be stored in appropriate locations, with meaningful descriptions (file names, abstracts, etc.) As complexity increases, finding content by navigation becomes more and more difficult Complexity factors include – volume & scope of content, multiple storage repositories, multiple copies of documents, etc. Search Represents the primary alternative to navigation Simple text searches are very common in specific content repositories, but they may not produce effective results Sophisticated search tools can yield prioritized and comprehensive lists of results … but they require content access, rules and other techniques. 101 – Navigation & Search are Complimentary

8 14 Mar 05 8 101 - Requirements of a Good Search Engine: Access - content must be accessible to the search tools First, access must be public or the user must have permission to search the site. (NB: Google cannot search secured or protected web sites.) Through a pre-established index of the content produced by a crawler or spider (this is the approach used by Google, producing very fast search results). Through bots that scan content at the time of the query. Some workers (bots) make use of local search engines often provided with a set of content. Search Results – what will produce the best set of results? Simple text string, possibly with Boolean operators may locate only exact matches. Boolean skill may be necessary to enhance results. Rules augmented searches can locate many more items that are missed in a simple search, because they recognize synonyms, associated terms, etc. Ranking Results – vastly improves the value of the search Algorithms are used to score items found by the search, and rank order the results, attempting to place the best matches near the top. Ranking scores can take into account many factors, such as where the search term is found - keyword list, the title or only the body of text? How often it appears. Proximity to other related key terms. Etc. Bot is common parlance on the Internet for a software program acting as an agent on behalf of a user. Bots interact with other network services intended for people, as if it was a real person. One typical use of bots is to gather information. The term is derived from the word robot, reflecting the autonomous character in the "virtual robot"-ness of the concept.

9 14 Mar 05 9 101 - The Business Value of Good Search Tools Parametric Search Create & Maintain Enterprise Taxonomie s Federated or Consolidat ed Search sanofi pasteur Rules Tag Content reSearch Classify Content based on Enterprise Rules Classified Content Business Decision News, Journals, Etc. Identify Key Parameters Impacting decision Selected & Ranked References Make Decision Other Inputs Define Business Problem Consider Inputs & Evaluate Alternative s Business Value = Better Informed Decisions Taxonomies provide the foundation for vastly improved search results Simple Text Search Key Issues: How long to find needed info? Quality of results? Missing or inaccessible info?

10 14 Mar 05 10 Verity Pilots & Applications so far … 2004 Pilots MeSH * Taxonomy Extension Added depth and granularity on vaccine topics Departmental Shared Folder 2 nd Taxonomy study – Process Development 3 rd build to the taxonomy Consolidated IS Content Search Combine Verity collections from 3 different sources Applications VaccinePlace.com Public service web site Intranet K2 Upgrade Static HTML pages + attachments Global Medical Content Internal shared access to common disease & vaccine information RPI Newsline Regulatory publications *MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

11 14 Mar 05 11 US Vaccine Educational Web Sites Corpus of Documents HTML pages of various vaccine information sites, including: Daptacel.com Influenza.com Meningitisvaccine.com Rabies.com Tetanus.org Travelersvaccines.com VaccineProtection.com Business Drivers Increase consumer access to information on vaccine-preventable diseases. Consolidate Internet access to several sites focused on vaccine- preventable diseases Search Approach Simple keyword text search Taxonomy Extensions None

12 14 Mar 05 12 VaccinePlace.com

13 14 Mar 05 13 Simple Text Search Results help VaccinePlace visitors find information quickly …

14 14 Mar 05 14 … or visitors can Browse to learn

15 14 Mar 05 15 But simple text searches are just the beginning – our Model for Improved Search Results includes … Taxonomies Provide the foundation for vastly improved search results But public and commercial taxonomies often lack the richness and knowledge available in the enterprise Our method for developing an enterprise taxonomy included: Use an existing taxonomy as a starting point - MeSH + A Professional Librarian – Hugh McNaught + Subject matter experts + Verity experts – Raritan Technologies, Inc. = Robust Taxonomy + Enterprise Rules Parametric Search Portal Is essential to test the taxonomy / rules effectiveness Provide enhanced access to the documents in the collection *MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

16 14 Mar 05 16 Taxonomy Concepts Taxonomies – what are they? A hierarchical classification of things, or the principles underlying the classification. Almost anything, animate objects, inanimate objects, places, and events, may be classified according to some taxonomic scheme. Why develop and use taxonomies? By developing and applying taxonomies that are specific to the collection(s) of interest, items in the collection(s) can be retrieved faster and easier. The items retrieved will be more relevant and more precise to the query asked. Sources of taxonomies MeSH = Medical Subject Heading from the US National Library of Medicine / NIH Library of Congress and other public domain sources Commercial taxonomies (Factiva, Verity, etc.) Internally developed – can enrich public domain / commercial taxonomies with enterprise knowledge

17 14 Mar 05 17 MeSH * + Reference Manager – 1 st Pilot to Launch Development of a sanofi pasteur Taxonomy Corpus of Documents Vaccine related scientific publications Abstracts stored in the Reference Manager DB Business Drivers Global Information & Library Sciences desire to significantly improve quality of search results across a broad range of collections Recognition that public taxonomies such as MeSH, are not as rich in vaccine terms as needed Search Approach Parametric + key word on title, abstract + key words Taxonomy Extensions Vaccine nodes of MeSH* taxonomy Products Companies Geography *MeSH = Medical Subject Heading

18 14 Mar 05 18 MeSH nodes Structure Top Level MeSH D24 Nodes, including Vaccines

19 14 Mar 05 19 Expanded Vaccine Node Poliovirus Vaccine Structure

20 14 Mar 05 20 Verity Intelligent Classifier (VIC) - Provides tools to Enhance the Taxonomy and Create Rules Taxonomy Pane To create & modify the users’ navigation structure Topics Pane To create & modify the rules – synonyms, concepts, relationships, etc.

21 14 Mar 05 21 Poliovirus Vaccine Rules This is the set of Rules for the Poliovirus Vaccine The Inactivated node is expanded in this example. The high level node corresponds with a node in the structure. The rules ‘roll up’ to each higher level. All nodes contain Terms pertaining to the node, and Products used to treat that Virus Verity Query Language Is the syntax used by VIC to create & modify the rules

22 14 Mar 05 22 MeSH Extension – Product Taxonomy / Flu node Flu vaccine brand names

23 14 Mar 05 23 Parametric Search Portal of the Reference Manager DB based on an Expanded MeSH Taxonomy Company information added to MeSH Product information added to MeSH Geography nodes from MeSH 6016 articles on Viruses

24 14 Mar 05 24 Clicking on a Parameter such as Influenza Vaccine automatically limits results … 5 BCG articles also mentioning Flu Vaccine Results can then be combined with a text search for more precise selections Further breakdowns of the specific context for the hits Titles of the articles meeting the selected parameters are listed here. 455 articles reference the Americas

25 14 Mar 05 25 Search Parameters Company - sanofi-aventis - GSK - Wyeth - etc. Country - N. Am. - Europe - China - etc. Verity Search Capabilities Leverage the Rules Incorporated in the Taxonomies Text / Keyword / Parametric Search Federated / Consolidat ed Multi- source Search sanofi pasteur Rules Tagged Verity Collections VIC Franchise - Flu - Pediatric - Traveler - Menactra Rules reflect: - Synonyms - Concepts - Relationships Disease - Flu - Tetanus - Polio - etc. Focused Reference Set Includes only articles matching the selected parameters Source 1 Source 2 Source n Targeted Ranked Results Text / key words search across multiple sources & consolidate results in one view sanofi pasteur Taxonomies

26 14 Mar 05 26 Department Shared Folder Application Corpus of Documents Internal documents stored on a shared network drive Contents included a variety of Microsoft Office documents and Adobe Acrobat files Business Drivers Need to locate relevant documents without a detailed knowledge of the folder structure & filing system Search Approach Parametric + key word on title, abstract, key words and full text Taxonomy Extensions Franchises – new nodes & rules (ex: Travel Vaccines) Companies – building on version 1 MeSH

27 14 Mar 05 27 Intranet K2 Enhancement Corpus of Documents All HTML static pages on sanofi pasteur intranet Attachments But not yet contents of applications accessed through the Intranet Business Drivers Need to greatly improve search results on intranet Search Approach Key word on title + full text Results ranked according to standard Verity algorithms Not yet available – Benefits from applying taxonomy rules, synonyms, etc. Benefits from federated searches of applications accessed via Intranet Taxonomy Extensions None yet Approach – for subject areas such as IS, HR, Purchasing, etc. We could acquire commercial or public domain taxonomies Or, we could develop something internally, similar to what was done for vaccines

28 14 Mar 05 28 Intranet – K2 Search Results

29 14 Mar 05 29 IS Content – Consolidated Search Portal Corpus of Documents IS Intranet sites IS shared folders (network drives) IS Exchange Public folders eRooms – not yet included in this pilot Business Drivers Pilot techniques to access content stored in various online repositories Search Approach Key word on content Taxonomy Extensions None yet Exploring public domain & commercial options

30 14 Mar 05 30 Search Access Components – a Summary Components DescriptionToolsCapabilities Experience To-date Verity Collection - Full text index of a corpus of documents Verity K2 - Std Verity ranked list of results - Required for taxonomy appl. - Intranet K2 upgrade Taxonomy - Hierarchy structure - Rules for searching & ranking results VIC - Structured browsing - Synonyms, rules 3 pilots – Ref Man DB, Shared Drive, Process Development Gateways - Access to proprietary repository formats Verity stds- Access & index contents, while respecting security - Documentum (Global Content application) Workers & Extractors - Agent using repository’s native search engine Custom Developmt - Return results - Create Verity collection - eRoom planned 05. Basic Search Portal - Simple text search of keywords / contents Verity or Custom Developmt - Unranked results - Possibly limited by a qualifier (date, author). - VaccinePlace.com - Other internal apps Parametric Search Portal - Predefined parameters (ex: product, co. name) Verity or Custom - Reduced set based on parameters - Ref Man DB (MeSH) - Shared Dept Drive - Global Content Federated Search Portal - Multiple sources, using native search engines - workers- Combine results from multiple sources - None yet Consolidated Search Portal - Multiple sources, using extract Custom Developmt - Combined results + taxonomy ranking - IS Pilot (in progress)

31 14 Mar 05 31 Extend the sanofi pasteur Enterprise Taxonomy – Build other non-vaccine nodes (ex: IS, HR, Legal, IO, etc.) Apply to other applications – such as the Intranet sites Add Gateways eRoom – worker and extractor to create Verity Collections Data Discovery – a new application of Verity technology Goal – review nature of internal content existing today on network drives, Public Folders, eRooms, etc. Identify candidates for archiving / destruction Isolate content worth including in Verity collections R&D Consolidated Search Portal Explore needs and develop a business case Across a broad array of internal and external sources 2005 Verity Projects – Applying What we have Learned and Extending our Learning

32 14 Mar 05 32 Questions? Gordon.Campbell@sanofipasteur.com (570) 839-4277 Swiftwater, PA 18370


Download ppt "14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation."

Similar presentations


Ads by Google