Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide.

Similar presentations


Presentation on theme: "Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide."— Presentation transcript:

1 Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide Metadata Applications Ron Daniel & Joseph Busch Taxonomy Strategies

2 2 T AXONOMY S TRATEGIES LLC The business of organized information Workshop goals 1. What is the Dublin Core? 2. Answer these enterprise-wide metadata ROI questions: What is the value proposition for adding metadata to content? Does metadata make content reusable? Findable? Improve productivity? How can metadata value be measured in a way that quantifies how it contributes to the bottom line? 3. Answer these Business process questions: How is Dublin Core tagging being done on content to expose metadata to portals, search engines, and other metadata-aware applications? How are metadata value spaces (controlled vocabularies) maintained within an enterprise? Across enterprises? 4. Answer these technology questions: What tools exist to use Dublin Core and other metadata standards in enterprise information management environments?

3 3 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

4 4 T AXONOMY S TRATEGIES LLC The business of organized information Who we are: Joseph Busch Over 25 years in the business of organized information v Founder, Taxonomy Strategies v Director, Solutions Architecture, Interwoven v VP, Infoware, Metacode Technologies (acquired by Interwoven, November 2000) v Program Manager, Getty Foundation v Manager, Pricewaterhouse Metadata and taxonomies community leadership v President, American Society for Information Science & Technology v Director, Dublin Core Metadata Initiative v Adviser, National Research Council Computer Science and Telecommunications Board v Reviewer, National Science Foundation Division of Information and Intelligent Systems v Founder, Networked Knowledge Organization Systems/Services

5 5 T AXONOMY S TRATEGIES LLC The business of organized information Who we are: Ron Daniel, Jr. Over 15 years in the business of metadata & automatic classification v Principal, Taxonomy Strategies v Standards Architect, Interwoven v Senior Information Scientist, Metacode Technologies (acquired by Interwoven, November 2000) v Technical Staff Member, Los Alamos National Laboratory Metadata and taxonomies community leadership v Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group v Acting chair: XML Linking working group v Member: RDF working groups v Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.

6 6 T AXONOMY S TRATEGIES LLC The business of organized information Recent & current projects Government v Commodity Futures Trading Commission v Defense Intelligence Agency v ERIC v Federal Aviation Administration v Federal Reserve Bank of Atlanta v Forest Service v GSA Office of Citizen Services (www.firstgov.gov)www.firstgov.gov v Head Start v Infocomm Development Authority of Singapore v NASA (nasataxonomy.jpl.nasa.gov)nasataxonomy.jpl.nasa.gov v Small Business Administration v Social Security Administration v USDA Economic Research Service v USDA e-Government Program (www.usda.gov)www.usda.gov Commercial v Allstate Insurance v Blue Shield of California v Debevoise & Plimpton v Halliburton v Hewlett Packard v Motorola v PeopleSoft v Pricewaterhouse Coopers v Siderean Software v Sprint v Time Inc. Commercial subcontracts v Agency.com – Top financial services v Critical Mass – Fortune 50 retailer v Deloitte Consulting – Big credit card v Gistics/OTB – Direct selling giant NGOs v CEN v IDEAlliance v IMF v OCLC

7 7 T AXONOMY S TRATEGIES LLC The business of organized information What we do Organize Stuff

8 8 T AXONOMY S TRATEGIES LLC The business of organized information Who are you? Tell us: v Your name v Your organization v Your job title v The things you want to get from this workshop

9 9 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

10 10 T AXONOMY S TRATEGIES LLC The business of organized information Metadata: Different definitions v Library & Information Science Author/Title/Subject Controlled Vocabularies for Subject Codes (e.g. Dewey) Authority Files for Author Names v Database Tables/Columns/ Datatypes/Relationships References for some values

11 11 T AXONOMY S TRATEGIES LLC The business of organized information Metadata: Why it matters v Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better. v Enriching content with structured metadata is critical for supporting search and personalized content delivery. v Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching. v Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.

12 12 T AXONOMY S TRATEGIES LLC The business of organized information Metadata: Supports core functions Asset metadata – Who: Creator, Publisher, Contributor, Type, Format, Identifier Subject metadata – What, Where & Why: Subject, Title, Description, Coverage Relational metadata – Links between and to: Source, Relation Use metadata – When & How: Date, Language, Rights Enabled Functionality Complexity More efficient editorial process Better navigation & discovery

13 13 T AXONOMY S TRATEGIES LLC The business of organized information Hierarchical classification of things into a tree structure What is a taxonomy? Systematics view KingdomPhylumClassOrderFamilyGenusSpecies Animalia Chordata Mammalia Carnivora Canidae Canis C. familiari Linnaeus … SegmentFamilyClassCommodity 44-Office Equipment and Accessories and Supplies.12-Office Supplies.17-Writing Instruments.05-Mechanical pencils.06-Wooden pencils.07-Colored pencils UNSPSC …

14 14 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

15 15 T AXONOMY S TRATEGIES LLC The business of organized information Dublin Core: A little more complicated Elements 1.Identifier 2.Title 3.Creator 4.Contributor 5.Publisher 6.Subject 7.Description 8.Coverage 9.Format 10.Type 11.Date 12.Relation 13.Source 14.Rights 15.Language Abstract Access rights Alternative Audience Available Bibliographic citation Conforms to Created Date accepted Date copyrighted Date submitted Education level Extent Has format Has part Has version Is format of Is part of Is referenced by Is replaced by Is required by Issued Is version of License Mediator Medium Modified Provenance References Replaces Requires Rights holder Spatial Table of contents Temporal Valid Refinements Box DCMIType DDC IMT ISO3166 ISO639-2 LCC LCSH MESH Period Point RFC1766 RFC3066 TGN UDC URI W3CTDF Encodings Collection Dataset Event Image Interactive Resource Moving Image Physical Object Service Software Sound Still Image Text Types

16 16 T AXONOMY S TRATEGIES LLC The business of organized information Dublin Core framework for corporate use v Not just 15 elements v A framework to enable cross-resource exploration and use Dublin Core is framework for integration metadata at BellSouth Source: Todd Stephens, BellSouth

17 17 T AXONOMY S TRATEGIES LLC The business of organized information Element Data TypeLength Req. / RepeatSourcePurpose Asset Metadata Unique IDIntegerFixed1System suppliedBasic accountability Recipe TitleStringVariable1Licensed ContentText search & results display Recipe summaryStringVariable1Licensed ContentContent Main IngredientsListVariable? Main Ingredients vocabulary Key index to retrieve & aggregate recipes, & generate shopping list Subject Metadata Meal TypesListVariable*Meal Types vocab Browse or group recipes & filter search results CuisinesListVariable*Cuisines CoursesListVariable*Courses vocab Cooking MethodFlagFixed*Cooking vocab Link Metadata Recipe ImagePointerVariable?Product GroupMerchandize products Use Metadata RatingStringVariable1Licensed ContentFilter, rank, & evaluate recipes Release DateDateFixed1Product GroupPublish & feature new recipes Legend: ? – 1 or more * - 0 or more Metadata: A data specification – a recipe example dc:identifier dc:title dc:description X X X X X dcterms:hasPart dc:date dc:type=recipe, dc:format=text/html, dc:language=en

18 18 T AXONOMY S TRATEGIES LLC The business of organized information Why Dublin Core? Dublin Core is a de-facto standard across many other systems and standards v RSS (1.0), OAI v Inside organizations – portals, CMS, … Mapping to DC elements from most existing schemes is simple v Beware of force-fits Why will metadata already exist? v Because of search projects, portal integration projects, etc. that are creating it or standardizing a mapping. Source: Todd Stephens, BellSouth Per-Source Data Types, Access Controls, etc. Dublin Core and Similar Taxonomies, Vocabularies, Ontologies

19 19 T AXONOMY S TRATEGIES LLC The business of organized information Creator An entity primarily responsible for making the content of the resource In other words – Author, Photographer, Illustrator, … v Potential refinements by creative role v Rarely justified Creators can be persons or organizations Key Point – Reminder: Name variations are a big issue in data quality: v Ron Daniel v Ron Daniel, Jr. v Ron Daniel Jr. v R.E. Daniel v Ronald Daniel v Ronald Ellison Daniel, Jr. v Daniel, R. Name fields may contain other information v Case, W. R. (NASA Goddard Space Flight Center, Greenbelt, MD, United States) Best practice – Validate names against LDAP or other Authority File Refinements None Encodings None

20 20 T AXONOMY S TRATEGIES LLC The business of organized information Example – Name mismatches One of these things is not like the other: v Ron Daniel, Jr. and Carl Lagoze; Distributed Active Relationships in the Warwick Framework v Hojung Cha and Ron Daniel; Simulated Behavior of Large Scale SCI Rings and Tori Ron Daniel; High Performance Haptic and Teleoperative Interfaces Differences may not matter If they do v This error cannot be reliably detected automatically v Authority files and an error-correction procedure are needed

21 21 T AXONOMY S TRATEGIES LLC The business of organized information Contributor An entity responsible for making contributions to the content of the resource. In practice – rarely used. v Difficult to distinguish from Creator. v Adds UI Complexity for no real gain Best Practice? Recommendation – Dont use. Refinements None Encodings None

22 22 T AXONOMY S TRATEGIES LLC The business of organized information Publisher An entity responsible for making the resource available. Problems: v All the name-handling stuff of Creator. v Hierarchy of publishers (Bureau, Agency, Department, …) Refinements None Encodings None

23 23 T AXONOMY S TRATEGIES LLC The business of organized information Title A name given to the resource. Issues: v Hierarchical Titles e.g. Conceptual Structures: Information Processing in Mind and Machine (The Systems Programming Series) v Untitled Works v Metaphysics Refinements Alternative Encodings None

24 24 T AXONOMY S TRATEGIES LLC The business of organized information Identifier An unambiguous reference to the resource within a given context Best Practice: URL Future Best Practice: URI? Problems v Metaphysics v Personalized URLs v Multiple identifiers for same content v Non-standard resolution mechanisms for URIs Recommendations – Plan how to introduce long-lived URLs Refinements Bibliographic Citation Encodings URI

25 25 T AXONOMY S TRATEGIES LLC The business of organized information Date A date associated with an event in the life cycle of the resource Woefully underspecified. Typically the publication or last modification date. Best practice: YYYY-MM-DD Refinements Created Valid Available Issued Modified Date Accepted Date Copyrighted Date Submitted Encodings DCMI Period W3C DTF (Profile of ISO 8601)

26 26 T AXONOMY S TRATEGIES LLC The business of organized information Subject The topic of the content of the resource. Best practice: Use pre-defined subject schemes, not user- selected keywords. v Supported Encodings probably not useful for most corporate needs Factor Subject into separate facets. v People, places, organizations, events, objects, services v Industry sectors v Content types, audiences, functions v Topic Some of the facets are already defined in DC (Coverage, Type) or DCTERMS (Audience) Refinements None Encodings DDC LCC LCSH MESH UDC

27 27 T AXONOMY S TRATEGIES LLC The business of organized information Coverage The extent or scope of the content of the resource. In other words – places and times as topics. Key Point – Locations important in SOME environments, irrelevant in others. Time periods as subjects rarely important in commercial work. Best Practice – ISO , Refinements Spatial Temporal Encodings Box (for Spatial) ISO3166 (for Spatial) Point (for Spatial) TGN (for Spatial) W3CTDF (for Temporal)

28 28 T AXONOMY S TRATEGIES LLC The business of organized information Description An account of the content of the resource. In other words – an abstract or summary Key Point – Whats the cost/benefit tradeoff for creating descriptions? v Quality of auto-generated descriptions is low v For search results, hit highlighting is probably better Refinements Abstract Table of Contents Encodings None

29 29 T AXONOMY S TRATEGIES LLC The business of organized information Type The nature or genre of the content of the resource Best Current Practice: Create a custom list of content types, use that list for the values. v Try to avoid image, audio, and other format names in the list of content types, they can be derived from Format. v No broadly-acceptable list yet found. Refinements None Encodings DCMI Type

30 30 T AXONOMY S TRATEGIES LLC The business of organized information Format The physical or digital manifestation of the resource. In other words – the file format Best practice: Internet Media Types Outliers: File sizes, dimensions of physical objects Refinements Extent Medium Encodings IMT

31 31 T AXONOMY S TRATEGIES LLC The business of organized information Language A language of the intellectual content of the resource. Best Practice: ISO 639, RFC 3066 Dialect codes: Advanced practice Refinements None Encodings ISO639-2 RFC1766 RFC3066

32 32 T AXONOMY S TRATEGIES LLC The business of organized information Relation A reference to a related resource Very weak meaning – not even as strong as See also. Best practice: Use a refinement element and URLs. Refinements Is Version Of Has Version Is Replaced By Replaces Is Required By Requires Is Part Of Has Part Is Referenced By References Is Format Of Has Format Conforms To Encodings URI

33 33 T AXONOMY S TRATEGIES LLC The business of organized information Source A reference to a resource from which the present resource is derived Original intent was for derivative works Frequently abused to provide bibliographic information for items extracted from a larger work, such as articles from a Journal Refinements None Encodings URI

34 34 T AXONOMY S TRATEGIES LLC The business of organized information Rights Information about rights held in and over the resource Could be a copyright statement, or a list of groups with access rights, or … Refinements Access Rights License Encodings None

35 35 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

36 Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of Dublin Core metadata in Corporate Environments /businessdomains/isss/cwa/cwa15247.asp

37 37 T AXONOMY S TRATEGIES LLC The business of organized information Dublin Core: CEN/ISSS Workshop on Dublin Core Metadata – corporate uses v Applied Information Technique v AstraZenica v BBC v BellSouth v Cisco v Daimler Chrysler v Giunti Labs v GSK v Halliburton v HP v IBM v Intel v John Wiley & Sons v Lilly v PeopleSoft v Rohm Haas v SAP v Software AG v Unisys

38 38 T AXONOMY S TRATEGIES LLC The business of organized information How is Dublin Core used in corporate environments? Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core – Guidance information for the deployment of Dublin Core metadata in Corporate Environments

39 39 T AXONOMY S TRATEGIES LLC The business of organized information JurisdictionIndustry Impact BRM ImpactForm TypeAgencyAudienceKeyword Topic Taxonomy: e-Forms example 0001 Legislative 1000 Judicial 1100 Executive Office of Pres 0003 Exec Depts 1200 Agriculture 1300 Commerce 9700 Defense 9100 Education 8900 Energy 7500 HHS 7000 DHS 8600 HUD 1400 Interior 1500 Justice 1600 Labor 1900 State 6900 Transport 2000 Treasury 3600 Veterans Ind Agencies Intl Orgs Application Approval Claim Information request Information submission Instructions Legal filing Payment Procurement Renewal Reservation Service request Test Other input Other transaction Agriculture & food Commerce Communica- tions Education Energy Env pro Foreign rels Govt Health & safety Housing & comm dev Labor Law Named grps National def Nat resources Recreation Sci & tech Social pgms Transport All General Citizen Business Govt Employee Native American Non- resident Tourist Special group 00 Generic 11 Agriculture 21 Mining 22 Utilities 23 Construct Manuf 42 Wholesale Retail Trans 51 Info 52 Finance 54 Profession 55 Mgmt 56 Support 61 Education 62 Health Care 71 Arts 72 Hospitality 81 Other Services 92 Public Admin Federal State + Local + Other + Citizen Srvcs Social Srvs Defense Disasters Econ Dev Education Energy Env Mgmt Law Enf Judicial Correctional Health Security Income Sec Intelligence Intl Affairs Nat Resour Transport Workforce Science Delivery Support Management Controlled Vocabularies Facets

40 40 T AXONOMY S TRATEGIES LLC The business of organized information How Dublin Core is extended? Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core – Guidance information for the deployment of Dublin Core metadata in Corporate Environments

41 41 T AXONOMY S TRATEGIES LLC The business of organized information Custom business process document types? Ouch! Oil & gas services company document types analysis, appraisals, assessments, forecasts, predictions agendas, plans, designs, schedules, workflow applications, proposals, requests, requirements permits, consents, approvals, rejections, certificates work orders, correspondence auditing, compliance, testing, inspections, operations reports lessons learned, after-action reviews, meeting minutes, FAQs policies, procedures, training manuals, standards, best practices research notes, journal articles newsletters, bulletins, press releases ads, brochures, data sheets, technical notes, case studies, price lists checklists, templates, forms, logos, branding software, database forms

42 42 T AXONOMY S TRATEGIES LLC The business of organized information The power of taxonomy facets 10,000 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (10 4 ) Easier to maintain Can be easier to navigate

43 43 T AXONOMY S TRATEGIES LLC The business of organized information Taxonomic metadata example: Form SS-4. Employer Identification Number (EIN) FacetValues AgencyIRS Content TypeInformation Submission Industry Impact Generic JurisdictionFederal Programs & Services Support Delivery of Services/General Government/Taxation Management Keyword TopicCommerce/Employment taxes AudienceBusiness

44 44 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

45 45 T AXONOMY S TRATEGIES LLC The business of organized information Fundamentals of metadata ROI v Tagging content using metadata and a taxonomy are costs, not benefits. v There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues. v Putting metadata and a taxonomy into operation requires UI changes and/or backend system changes, as well as data changes. v You need to determine those changes, and their costs, as part of the ROI.

46 46 T AXONOMY S TRATEGIES LLC The business of organized information Common metadata ROI scenarios v Catalog site Increased sales. Increased productivity. v Customer support Cutting costs. Increased sales. v Compliance Avoiding penalties. v Knowledge worker productivity Less time searching, more time working. v Executive Mandate No ROI study, just someone with a vision and a budget.

47 47 T AXONOMY S TRATEGIES LLC The business of organized information Guided Navigation 2-3 clicks to product No dead ends Metadata ROI: Catalog site

48 48 T AXONOMY S TRATEGIES LLC The business of organized information Metadata ROI: Catalog site v Increased sales Product findability. Product cross-sells and up- sells. Customer loyalty. v 1-5% increase in sales $57.6B sales (04) $2.1B net income (04) v Enterprise portal cost $6M $600M to $2B/year $21M to $105M/year 1-5% increase in productivity $50K average cost per employee 310,400 employees (04) $155M to $776M/year

49 49 T AXONOMY S TRATEGIES LLC The business of organized information Metadata ROI: Customer support model Policy categories for browsing Type and go to search for specific policies Good search results for policy topics, e.g., pets Refine search offered with results Help on search page, not a click away.

50 50 T AXONOMY S TRATEGIES LLC The business of organized information Metadata ROI: Customer support model v Self service Fewer customer calls. Faster, more accurate CSR responses through better information access. v 25-50% service efficiency increase 300K customer service calls per month $6 cost per call v Manual processing 100,000 documents 2 pages per document $4 per page $800K $5.4M to $10.8M/yr $186M to $930M/year ($575M) to $169M/year 1-5% increased sales $18.6B sales (04) ($761M) net income (04)

51 51 T AXONOMY S TRATEGIES LLC The business of organized information Metadata ROI: Compliance v Avoiding penalties for breaching regulations SOX: up to 5 years in jail SOX: up to $5M v Following required procedures v Loss of company $100B revenue (00) v Loss of partner companies Arthur Andersen $100B

52 52 T AXONOMY S TRATEGIES LLC The business of organized information Knowledge workers spend up to 2.5 hours each day looking for information … … But find what they are looking for only 40% of the time. Kit Sims Taylor

53 53 T AXONOMY S TRATEGIES LLC The business of organized information High cost of not finding information v The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs … Sue Feldman, High cost of poor classification v Poor classification costs a 10,000 user organization $10M each yearabout $1,000 per employee. Jakob Nielsen, useit.com But better search itself is a weak ROI

54 54 T AXONOMY S TRATEGIES LLC The business of organized information 26% 9% Knowledge workers spend more time re-creating existing content than creating new content Kit Sims Taylor

55 55 T AXONOMY S TRATEGIES LLC The business of organized information Metadata ROI: Productivity v Decreased cost to market Decreased development cost Increased R&D productivity Reduced time for sales & marketing v 1-5% decrease in drug development cost $800M/drug v 5-10% increase in R&D productivity 13% of revenue $39B in sales (04) v 10-20% decrease in time for sales & marketing 13% of revenue v Enterprise document management system cost $10M $8M to $16M/drug $254M to $507M/year $254M to $507M/year

56 56 T AXONOMY S TRATEGIES LLC The business of organized information Metadata FAQ: Executive mandate is key v There is no ROI out of the box v Just someone with a vision …and the budget to make it happen. v Whats really needed? Demos and proofs of value. So that a stronger cost benefit argument can be made for continuing the work

57 57 T AXONOMY S TRATEGIES LLC The business of organized information Metadata FAQ: How do you sell it? v Dont sell metadata or taxonomy, sell the vision of what you want to be able to do. v Clearly understand what the problem is and what the opportunities are. v Do the calculus (costs and benefits) v Design the taxonomy (in terms of LOE) in relation to the value at hand.

58 58 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

59 59 T AXONOMY S TRATEGIES LLC The business of organized information Overview of metadata practices v Identify the team v Use (or map to) Dublin Core for basic information. v Extend with custom elements for specific facts. v Use pre-existing, standard, vocabularies as much as possible. ISO country codes for locations Product & service info from ERP system Validate author names with LDAP directory v Design a QC Process Start with an error-correction process, then get more formal on error detection Large-scale ontologies may be valuable in automated error detection

60 60 T AXONOMY S TRATEGIES LLC The business of organized information Factor Subject into smaller facets v Size DMOZ tries to organize all web content, has more than 600k categories! Difficulty in navigating, maintaining Hidden facet structure v Classification Schemes vs. Taxonomies

61 61 T AXONOMY S TRATEGIES LLC The business of organized information Sources for 7 common vocabularies VocabularyDefinitionPotential Sources OrganizationOrganizational structure.FIPS 95-2, U.S. Government Manual, Your organizational structure, etc. Content TypeStructured list of the various types of content being managed or used. DC Types, AGLS Document Type, AAT Information Forms, Records management policy, etc. IndustryBroad market categories such as lines of business, life events, or industry codes. FIPS 66, SIC, NAICS, etc. LocationPlace of operations or constituencies. FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, etc. FunctionFunctions and processes performed to accomplish mission and goals. FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc. TopicBusiness topics relevant to your mission and goals. Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc. AudienceSubset of constituents to whom a piece of content is directed or intended to be used. GEM, ERIC Thesaurus, IEEE LOM, etc. Products and Services Names of products/programs & services. ERP system, Your products and services, etc. dc:publisher dc:type dc:coverage dc:subject dcterms:audience

62 62 T AXONOMY S TRATEGIES LLC The business of organized information Cheap and Easy Metadata v Some fields will be constant across a collection. v In the context of a single collection those kinds of elements add no value, but they add tremendous value when many collections are brought together into one place, and they are cheap to create and validate.

63 63 T AXONOMY S TRATEGIES LLC The business of organized information Taxonomy Business Processes Taxonomies must change, gradually, over time if they are to remain relevant Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions A team will need to maintain the taxonomy on a part- time basis Taxonomy team reports to some other steering committee

64 64 T AXONOMY S TRATEGIES LLC The business of organized information Published CVs and STs Consuming Applications Syndicated Terminologies Intranet Search Web CMS Archives ERMS Custodians Notifications Change Requests & Responses ISO Other External ERP Other Internal Vocabulary Management System Other Controlled Items … Intranet Nav. DAM … Definitions about the Controlled Vocabulary Governance Environment Controlled Vocabulary Governance Environment 2:CV Team decides when to update CVs 3:Team adds value via mappings, translations, synonyms, training materials, etc. 1:Syndicated Terminologies change on their own schedule 4:Updated versions of CVs published to consuming applications CVs

65 65 T AXONOMY S TRATEGIES LLC The business of organized information Other Controlled Items Taxonomy Team will have additional items to manage: v Charter, Goals, Performance Measures v Editorial rules v Team processes v Tagger training materials (manual and automatic) v Outreach & ROI Communication plan Website Presentations Announcements v Roadmap

66 66 T AXONOMY S TRATEGIES LLC The business of organized information Taxonomy governance | Generic team charter Taxonomy Team is responsible for maintaining: v The Taxonomy, a multi-faceted classification scheme v Associated taxonomy materials, such as: Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review) Team evaluates costs and benefits of suggested change Taxonomy Team will: v Manage relationship between providers of source vocabularies and consumers of the Taxonomy v Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices v Promote awareness and use of the Taxonomy

67 67 T AXONOMY S TRATEGIES LLC The business of organized information Other Controlled Items - Editorial Rules To ensure consistent style, rules are needed Issues commonly addressed in the rules: Sources of Terms Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Hierarchy and Polyhierarchy Languages and Character Sets Length Limits Other – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Spaces Synonyms and Acronyms Term Arrangement (Alphabetic or …) Term Label Order (Direct vs. Inverted) Must also address issue of what to do when rules conflict – which are more important? Rule NameEditorial Rule Use Existing Vocabularies Other things being equal, reusing an existing vocabulary is preferred to creating a new one. Ampersands The character '&' is preferred to the word and in Term Labels. Example: Use Type: Manuals & Forms, not Manuals and Forms. Special Characters Retain accented characters in Term Labels. Example: España Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character & which IS NOT preceded by a comma. Example: Education, Learning & Employment, not Education, Learning, & Employment. Capitalization Use title case (where all words except articles are capitalized). Example: Education, Learning & Employment NOT Education, learning & employment NOT EDUCATION, LEARNING & EMPLOYMENT NOT education, learning & employment ……

68 68 T AXONOMY S TRATEGIES LLC The business of organized information Roles in Two Taxonomy Governance Teams Executive Sponsor v Advocate for the taxonomy team Business Lead v Keeps team on track with larger business objectives v Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs v Obtains needed resources if those in team cant accomplish a particular task Technical Specialist v Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. v Helps obtain data from various systems Content Specialist v Teams liaison to content creators v Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. v Small-scale Metadata QA Responsibility Taxonomy Specialist v Suggests potential taxonomy changes based on analysis of query logs, indexer feedback v Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner v Reality check on process change suggestions Business Lead Custodians v Responsible for content in a specific CV. Training Representative v Develops communications plan, training materials Work Practices Representative v Develops processes, monitors adherence IT Representative v Backups, admin of CV Tool Info. Mgmt. Representative v Provides CV expertise, tie-in with larger IM effort in the organization. Team structure at a different org.

69 69 T AXONOMY S TRATEGIES LLC The business of organized information Taxonomy governance | Where changes come from experience End User Firewall Taxonomy Content Tagging Logic Application UI Tagging UI Tagging Staff Taxonomy Editor Staff notes missing concepts Query log analysis Requests from other parts of NASA experience End User Taxonomy Team Firewall Taxonomy Content Tagging Logic Tagging Logic Application UI Application UI Tagging UI Tagging UI Tagging Staff Taxonomy Editor Staff notes missing concepts Query log analysis Requests from other parts of the organization Team considerations 1.Business goals 2.Changes in user experience 3.Retagging cost Recommendations by Editor 1.Small taxonomy changes (labels, synonyms) 2.Large taxonomy changes (retagging, application changes) 3.New best bets content Application Logic

70 70 T AXONOMY S TRATEGIES LLC The business of organized information Principles v Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective subjects, they are objective objects. v Clearly identify the Custodians of the facets, and the process for maintain and publishing them. v Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable. For example, labels like Anarchist or Prime Minister can be applied to the same person at different times (e.g. Nelson Mandela).

71 71 T AXONOMY S TRATEGIES LLC The business of organized information Enterprise Portal challenges when organizing content v Multiple subject domains across the enterprise Vocabularies vary Granularity varies Unstructured information represents about 80% v Information is stored in complex ways Multiple physical locations Many different formats v Tagging is time-consuming and requires SME involvement v Portal doesnt solve content access problem Knowledge is power syndrome Incentives to share knowledge dont exist Free flow of information TO the portal might be inhibited v Content silo mentality changes slowly What content has changed? What exists? What has been discontinued? Lack of awareness of other initiatives

72 72 T AXONOMY S TRATEGIES LLC The business of organized information Challenges when organizing content on enterprise portals v Lack of content standardization and consistency Content messages vary among departments How do users know which message is correct? v Re-usability low to non-existent v Costs of content creation, management and delivery may not change when portal is implemented: Similar subjects, BUT Diverse media Diverse tools Different users v How will personalization be implemented? v How will existing site taxonomies be leveraged? v Taxonomy creation may surface holes in content

73 73 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Q&A 6:45Adjourn

74 74 T AXONOMY S TRATEGIES LLC The business of organized information Methods used to create & maintain metadata Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core – Guidance information for the deployment of Dublin Core metadata in Corporate Environments

75 75 T AXONOMY S TRATEGIES LLC The business of organized information The Tagging Problem v How are we going to populate metadata elements with complete and consistent values? v What can we expect to get from automatic classifiers?

76 76 T AXONOMY S TRATEGIES LLC The business of organized information Tagging v Province of authors (SMEs) or editors? v Taxonomy often highly granular to meet task and re-use needs. v Vocabulary dependent on originating department. v The more tags there are (and the more values for each tag), the more hooks to the content. v If there are too many, authors will resist and use general tags (if available) v Automatic classification tools exist, and are valuable, but results are not as good as humans can do. Semi-automated is best. Degree of human involvement is a cost/benefit tradeoff.

77 77 T AXONOMY S TRATEGIES LLC The business of organized information Automatic categorization vendors | Analyst viewpoint Accuracy Level highlow Content Volumes low high

78 78 T AXONOMY S TRATEGIES LLC The business of organized information Considerations in automatic classifier performance v Classification Performance is measured by Inter-cataloger agreement Trained librarians agree less than 80% of the time Errors are subtle differences in judgment, or big goofs v Automatic classification struggles to match human performance Exception: Entity recognition can exceed human performance v Classifier performance limited by algorithms available, which is limited by development effort v Very wide variance in one vendors performance depending on who does the implementation, and how much time they have to do it 1)80/20 tradeoff where 20% of effort gives 80% of performance. 2)Smart implementation of inexpensive tools will outperform naive implementations of world-class tools. Accuracy Development Effort/ Licensing Expense Regexps Trained Librarians potential performance gain

79 79 T AXONOMY S TRATEGIES LLC The business of organized information Tagging tool example: Interwoven MetaTagger Manual form fill-in w/ check boxes, pull-down lists, etc. Auto keyword & summarization

80 80 T AXONOMY S TRATEGIES LLC The business of organized information Tagging tool example: Interwoven MetaTagger Auto-categorization Parse & lookup (recognize names) Rules & pattern matching

81 81 T AXONOMY S TRATEGIES LLC The business of organized information Metadata tagging workflows v Even purely automatic meta- tagging systems need a manual error correction procedure. Should add a QA sampling mechanism v Tagging models: Author-generated Central librarians Hybrid – central auto-tagging service, distributed manual review and correction Compose in Template Submit to CMS Analyst Editor Review content Problem? Copywriter Copy Edit content Problem? Har d Cop y Web site Y YN N Approve/Edit metadata Automatically fill-in metadata Tagging Tool Sys Admin Sample of author-generated metadata workflow.

82 82 T AXONOMY S TRATEGIES LLC The business of organized information Automatic categorization vendors | Pragmatic viewpoint Accuracy Level highlow Content Volumes low high

83 83 T AXONOMY S TRATEGIES LLC The business of organized information Seven practical rules for taxonomies 1. Incremental, extensible process that identifies and enables users, and engages stakeholders. 2. Quick implementation that provides measurable results as quickly as possible. 3. Not monolithichas separately maintainable facets. 4. Re-uses existing IP as much as possible. 5. A means to an end, and not the end in itself. 6. Not perfect, but it does the job it is supposed to do such as improving search and navigation. 7. Improved over time, and maintained.

84 84 T AXONOMY S TRATEGIES LLC The business of organized information Agenda 3:30Introductions: Us and you 3:45Background: Metadata & controlled vocabularies 4:00Dublin Core: Elements, issues, and recommendations 4:30Dublin Core in the wild: CEN study and remarks 4:45Enterprise-wide metadata ROI questions 5:00Break 5:15ROI (Cont.) 5:30Business processes 6:15Tools & technologies 6:30Summary, Q&A 6:45Adjourn

85 85 T AXONOMY S TRATEGIES LLC The business of organized information Summary: Categorize with a purpose v What is the problem you are trying to solve? Improve search Browse for content on an enterprise-wide portal Enable business users to syndicate content Otherwise provide the basis for content re-use v How will you control the cost of creating and maintaining the metadata) needed to solve these problems? CMS with a metadata tagging products Semi-automated classification Taxonomy editing tools Guided navigation tools

86 Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Contact Info Ron Daniel Joseph Busch


Download ppt "Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide."

Similar presentations


Ads by Google