14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation.

Slides:



Advertisements
Similar presentations
Support.ebsco.com Nursing Reference Center Tutorial.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
SCOPUS Searching for Scientific Articles By Mohamed Atani UNEP.
Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Holyoke Public Schools Professional Development By, Judy Taylor
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Engineering Village ™ ® Basic Searching On Compendex ®
Information Retrieval in Practice
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Coolheads Consulting Copyright © 2003 Coolheads Consulting The Internal Revenue Service Tax Map Michel Biezunski Coolheads Consulting New York City, USA.
Web of Science: An Introduction Peggy Jobe
BUSINESS DRIVEN TECHNOLOGY
Libraries and Institutional Content Management Systems
Overview of Search Engines
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Transforming the Way We Work Logistics Community of Practice Jill Garcia Defense Acquisition University 14 July 2006.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Electronic CommerceNonhlanhla Shongwe  Introduction  Mission statement  Product  Business model  SWOT Analysis  Conclusion.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Natural Resource Program Center Dissolving Data Boundaries Search Mar /17/2011 Dan Kocol Functional Analyst I&M.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Flexible Text Mining using Interactive Information Extraction David Milward
1 nlresearch.com The First ReSearch Engine: Northern Light® Susan M. Stearns Director of Enterprise Marketing March, 1999.
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
Never-ending Search: (What you REALLY need to know about online searching) Ms. Emili school year.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Click on the tab to find journals by Subjects. From the drop down menu, we will select Parasitology and Parasitic Diseases.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Strategies for Conducting Research on the Internet Angela Carritt User Coordinator, Oxford University Library Services Angela Carritt User Education Coordinator,
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ITGS Databases.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Monster cloud search Access and search resumes from all talent sources, no matter where they're located august 2014.
Internet Research – Illustrated, Fourth Edition Unit A.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
Information Retrieval
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
Library & Information Resources Forestry Librarians Meeting Improving MyRoots Search.
2004/051 >> Supply Chain Solutions That Deliver Users.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Lesson 10—Networking BASICS1 Networking BASICS The Internet and Its Tools Unit 3 Lesson 10.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Search Engine Optimization
Information Retrieval in Practice
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Chapter 8 Browsing and Searching the Web
Search Engine Architecture
Lesson 6: Databases and Web Search Engines
Taxonomies, Lexicons and Organizing Knowledge
Search Techniques and Advanced tools for Researchers
IL Step 3: Using Bibliographic Databases
Lesson 6: Databases and Web Search Engines
Introduction to Information Retrieval
Presentation transcript:

14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 2 sanofi pasteur The vaccines business of sanofi-aventis Group sanofi-aventis Group Formed in 2004 by the merger of Sanofi-Synthélabo + Aventis 2004 Revenues = 25.4 Billion Euros 100,000 Employees 3 rd largest Pharma company in the world 1 st in Europe sanofi pasteur World leader in Vaccines 2004 Revenues = 1.6 Billion Euros 8,000 Employees Heritage includes Louis Pasteur (1890’s) and other vaccine pioneers (Merieux, Slee)

14 Mar 05 3 Global CIO with Global Functional Heads CIOs for N. America and France CIO – R&D CIO – Industrial Operations CIO – Commercial Operations (Sales & Marketing) CIO – Business Support (Functions – Finance, HR, etc.) Director, Global Infrastructure & Operations Director, IS Quality Director, IS Strategic Planning & Innovation Director, IS Strategic Planning & Innovation – responsibilities Transversal role – bridging functions & technologies Manage the Long Range Planning process Manage the Global IS Portfolio Verity Champion – formulate the strategy and foster appropriate pilots and applications sanofi pasteur IS Organization

14 Mar 05 4 Verity Experience at sanofi pasteur Pre Verity K2 (through 2003) Limited applications – primarily intranet Verity K2 Acquisition End of 2003 Two primary applications targeted: Improve Intranet search results Global Medical Affairs - share common disease / vaccine information 2004 Verity K2 – Pilots + Applications 4 Pilots to explore taxonomies and multi-repository search Plus 5 Applications Developed with two consultancies: Verity Consulting Services – for French pilots & applications Raritan Technologies, Inc. – for N. American pilots & applications

14 Mar 05 5 Search 101 – Basic Concepts Google – a familiar search engine to many Easy to use and results are ranked, often showing the best results near the top of the list (and paid sponsored links on the right). Results are ranked based on Google’s proprietary & typically secret algorithms Users often mention Google when describing the type of search they would like to have

14 Mar But there is much more Content to Search than what exists on the open Web Enterprise generated content is huge … Office documents s Database driven web pages Not to mention other types of media (voice, video, etc.) These estimates are from 2003 Study (UC Berkeley) With some volumes expected to double in 3 yrs The increasing dilemma … How can I find what I need in a timely fashion? Will I be forced to recreate what I can’t find? Annual Information Volumes 1 Media TypeTerabytes 2 Comments Scholarly publications 637,600 titles per year Searchable Web167Openly accessible sites Office Documents 1, billion pages per year Deep Web91,850DB driven web sites s (originals) 440,60631 billion s sent per day Hard Disk Drives 1,986,00044 million items per year 1 Source: How much information 2003? 2 Terabyte = 1 million million bytes, or approx 50,000 trees made into paper and printed.

14 Mar 05 7 Navigation Browsing for information is the most common way to locate content of interest Typically, information is organized in a hierarchy of folders Taxonomies – can provide the structure and logic But, content must still be stored in appropriate locations, with meaningful descriptions (file names, abstracts, etc.) As complexity increases, finding content by navigation becomes more and more difficult Complexity factors include – volume & scope of content, multiple storage repositories, multiple copies of documents, etc. Search Represents the primary alternative to navigation Simple text searches are very common in specific content repositories, but they may not produce effective results Sophisticated search tools can yield prioritized and comprehensive lists of results … but they require content access, rules and other techniques. 101 – Navigation & Search are Complimentary

14 Mar Requirements of a Good Search Engine: Access - content must be accessible to the search tools First, access must be public or the user must have permission to search the site. (NB: Google cannot search secured or protected web sites.) Through a pre-established index of the content produced by a crawler or spider (this is the approach used by Google, producing very fast search results). Through bots that scan content at the time of the query. Some workers (bots) make use of local search engines often provided with a set of content. Search Results – what will produce the best set of results? Simple text string, possibly with Boolean operators may locate only exact matches. Boolean skill may be necessary to enhance results. Rules augmented searches can locate many more items that are missed in a simple search, because they recognize synonyms, associated terms, etc. Ranking Results – vastly improves the value of the search Algorithms are used to score items found by the search, and rank order the results, attempting to place the best matches near the top. Ranking scores can take into account many factors, such as where the search term is found - keyword list, the title or only the body of text? How often it appears. Proximity to other related key terms. Etc. Bot is common parlance on the Internet for a software program acting as an agent on behalf of a user. Bots interact with other network services intended for people, as if it was a real person. One typical use of bots is to gather information. The term is derived from the word robot, reflecting the autonomous character in the "virtual robot"-ness of the concept.

14 Mar The Business Value of Good Search Tools Parametric Search Create & Maintain Enterprise Taxonomie s Federated or Consolidat ed Search sanofi pasteur Rules Tag Content reSearch Classify Content based on Enterprise Rules Classified Content Business Decision News, Journals, Etc. Identify Key Parameters Impacting decision Selected & Ranked References Make Decision Other Inputs Define Business Problem Consider Inputs & Evaluate Alternative s Business Value = Better Informed Decisions Taxonomies provide the foundation for vastly improved search results Simple Text Search Key Issues: How long to find needed info? Quality of results? Missing or inaccessible info?

14 Mar Verity Pilots & Applications so far … 2004 Pilots MeSH * Taxonomy Extension Added depth and granularity on vaccine topics Departmental Shared Folder 2 nd Taxonomy study – Process Development 3 rd build to the taxonomy Consolidated IS Content Search Combine Verity collections from 3 different sources Applications VaccinePlace.com Public service web site Intranet K2 Upgrade Static HTML pages + attachments Global Medical Content Internal shared access to common disease & vaccine information RPI Newsline Regulatory publications *MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

14 Mar US Vaccine Educational Web Sites Corpus of Documents HTML pages of various vaccine information sites, including: Daptacel.com Influenza.com Meningitisvaccine.com Rabies.com Tetanus.org Travelersvaccines.com VaccineProtection.com Business Drivers Increase consumer access to information on vaccine-preventable diseases. Consolidate Internet access to several sites focused on vaccine- preventable diseases Search Approach Simple keyword text search Taxonomy Extensions None

14 Mar VaccinePlace.com

14 Mar Simple Text Search Results help VaccinePlace visitors find information quickly …

14 Mar … or visitors can Browse to learn

14 Mar But simple text searches are just the beginning – our Model for Improved Search Results includes … Taxonomies Provide the foundation for vastly improved search results But public and commercial taxonomies often lack the richness and knowledge available in the enterprise Our method for developing an enterprise taxonomy included: Use an existing taxonomy as a starting point - MeSH + A Professional Librarian – Hugh McNaught + Subject matter experts + Verity experts – Raritan Technologies, Inc. = Robust Taxonomy + Enterprise Rules Parametric Search Portal Is essential to test the taxonomy / rules effectiveness Provide enhanced access to the documents in the collection *MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

14 Mar Taxonomy Concepts Taxonomies – what are they? A hierarchical classification of things, or the principles underlying the classification. Almost anything, animate objects, inanimate objects, places, and events, may be classified according to some taxonomic scheme. Why develop and use taxonomies? By developing and applying taxonomies that are specific to the collection(s) of interest, items in the collection(s) can be retrieved faster and easier. The items retrieved will be more relevant and more precise to the query asked. Sources of taxonomies MeSH = Medical Subject Heading from the US National Library of Medicine / NIH Library of Congress and other public domain sources Commercial taxonomies (Factiva, Verity, etc.) Internally developed – can enrich public domain / commercial taxonomies with enterprise knowledge

14 Mar MeSH * + Reference Manager – 1 st Pilot to Launch Development of a sanofi pasteur Taxonomy Corpus of Documents Vaccine related scientific publications Abstracts stored in the Reference Manager DB Business Drivers Global Information & Library Sciences desire to significantly improve quality of search results across a broad range of collections Recognition that public taxonomies such as MeSH, are not as rich in vaccine terms as needed Search Approach Parametric + key word on title, abstract + key words Taxonomy Extensions Vaccine nodes of MeSH* taxonomy Products Companies Geography *MeSH = Medical Subject Heading

14 Mar MeSH nodes Structure Top Level MeSH D24 Nodes, including Vaccines

14 Mar Expanded Vaccine Node Poliovirus Vaccine Structure

14 Mar Verity Intelligent Classifier (VIC) - Provides tools to Enhance the Taxonomy and Create Rules Taxonomy Pane To create & modify the users’ navigation structure Topics Pane To create & modify the rules – synonyms, concepts, relationships, etc.

14 Mar Poliovirus Vaccine Rules This is the set of Rules for the Poliovirus Vaccine The Inactivated node is expanded in this example. The high level node corresponds with a node in the structure. The rules ‘roll up’ to each higher level. All nodes contain Terms pertaining to the node, and Products used to treat that Virus Verity Query Language Is the syntax used by VIC to create & modify the rules

14 Mar MeSH Extension – Product Taxonomy / Flu node Flu vaccine brand names

14 Mar Parametric Search Portal of the Reference Manager DB based on an Expanded MeSH Taxonomy Company information added to MeSH Product information added to MeSH Geography nodes from MeSH 6016 articles on Viruses

14 Mar Clicking on a Parameter such as Influenza Vaccine automatically limits results … 5 BCG articles also mentioning Flu Vaccine Results can then be combined with a text search for more precise selections Further breakdowns of the specific context for the hits Titles of the articles meeting the selected parameters are listed here. 455 articles reference the Americas

14 Mar Search Parameters Company - sanofi-aventis - GSK - Wyeth - etc. Country - N. Am. - Europe - China - etc. Verity Search Capabilities Leverage the Rules Incorporated in the Taxonomies Text / Keyword / Parametric Search Federated / Consolidat ed Multi- source Search sanofi pasteur Rules Tagged Verity Collections VIC Franchise - Flu - Pediatric - Traveler - Menactra Rules reflect: - Synonyms - Concepts - Relationships Disease - Flu - Tetanus - Polio - etc. Focused Reference Set Includes only articles matching the selected parameters Source 1 Source 2 Source n Targeted Ranked Results Text / key words search across multiple sources & consolidate results in one view sanofi pasteur Taxonomies

14 Mar Department Shared Folder Application Corpus of Documents Internal documents stored on a shared network drive Contents included a variety of Microsoft Office documents and Adobe Acrobat files Business Drivers Need to locate relevant documents without a detailed knowledge of the folder structure & filing system Search Approach Parametric + key word on title, abstract, key words and full text Taxonomy Extensions Franchises – new nodes & rules (ex: Travel Vaccines) Companies – building on version 1 MeSH

14 Mar Intranet K2 Enhancement Corpus of Documents All HTML static pages on sanofi pasteur intranet Attachments But not yet contents of applications accessed through the Intranet Business Drivers Need to greatly improve search results on intranet Search Approach Key word on title + full text Results ranked according to standard Verity algorithms Not yet available – Benefits from applying taxonomy rules, synonyms, etc. Benefits from federated searches of applications accessed via Intranet Taxonomy Extensions None yet Approach – for subject areas such as IS, HR, Purchasing, etc. We could acquire commercial or public domain taxonomies Or, we could develop something internally, similar to what was done for vaccines

14 Mar Intranet – K2 Search Results

14 Mar IS Content – Consolidated Search Portal Corpus of Documents IS Intranet sites IS shared folders (network drives) IS Exchange Public folders eRooms – not yet included in this pilot Business Drivers Pilot techniques to access content stored in various online repositories Search Approach Key word on content Taxonomy Extensions None yet Exploring public domain & commercial options

14 Mar Search Access Components – a Summary Components DescriptionToolsCapabilities Experience To-date Verity Collection - Full text index of a corpus of documents Verity K2 - Std Verity ranked list of results - Required for taxonomy appl. - Intranet K2 upgrade Taxonomy - Hierarchy structure - Rules for searching & ranking results VIC - Structured browsing - Synonyms, rules 3 pilots – Ref Man DB, Shared Drive, Process Development Gateways - Access to proprietary repository formats Verity stds- Access & index contents, while respecting security - Documentum (Global Content application) Workers & Extractors - Agent using repository’s native search engine Custom Developmt - Return results - Create Verity collection - eRoom planned 05. Basic Search Portal - Simple text search of keywords / contents Verity or Custom Developmt - Unranked results - Possibly limited by a qualifier (date, author). - VaccinePlace.com - Other internal apps Parametric Search Portal - Predefined parameters (ex: product, co. name) Verity or Custom - Reduced set based on parameters - Ref Man DB (MeSH) - Shared Dept Drive - Global Content Federated Search Portal - Multiple sources, using native search engines - workers- Combine results from multiple sources - None yet Consolidated Search Portal - Multiple sources, using extract Custom Developmt - Combined results + taxonomy ranking - IS Pilot (in progress)

14 Mar Extend the sanofi pasteur Enterprise Taxonomy – Build other non-vaccine nodes (ex: IS, HR, Legal, IO, etc.) Apply to other applications – such as the Intranet sites Add Gateways eRoom – worker and extractor to create Verity Collections Data Discovery – a new application of Verity technology Goal – review nature of internal content existing today on network drives, Public Folders, eRooms, etc. Identify candidates for archiving / destruction Isolate content worth including in Verity collections R&D Consolidated Search Portal Explore needs and develop a business case Across a broad array of internal and external sources 2005 Verity Projects – Applying What we have Learned and Extending our Learning

14 Mar Questions? (570) Swiftwater, PA 18370