Taxonomies and Meta Data for Business Impact

Taxonomies and Meta Data for Business Impact
April 13, 2005 Theresa Regli, Molecular, Inc. Ron Daniel, Jr., Taxonomy Strategies LLC

Agenda 1:30 Welcome and Introductions
1:40 Taxonomy Definitions and Examples 2:10 Business Case and Motivations 2:30 Case Study: NASA 2:45 Tagging and Tools 3:00 Break 3:15 Running a Taxonomy Project 3:45 Taxonomy Maintenance and Governance 4:15 Case Study: PC Connection 4:30 Summary and Discussion 5:00 Adjourn

Who we are: Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies (acquired by Interwoven, November 2000) Technical Staff Member, Los Alamos National Laboratory Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group Acting chair, XML Linking working group Member, RDF working groups Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.

Recent & current projects
Government Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank Atlanta Forest Service Goddard Space Flight Center Head Start Infocomm Development Authority of Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration U.S.D.A. Economic Research Service U.S.D.A. e-Government Program ( U.S.G.S.A. Office of Citizen Services ( Commercial Allstate Insurance Blue Shield of California Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Sprint Time Inc. Commercial subcontracts Critical Mass - Fortune 50 retailer Deloitte Consulting - Top credit card issuer Gistics – Direct selling giant NGO’s CEN IDEAlliance OCLC Allstate Insurance. Prepared and facilitated taxonomy development process workshop. Blue Shield of California. Developed multi-phased enterprise taxonomy roadmap which laid out content projects over time, and content classification facets to be developed to support those projects. CEN - European Committee for Standardization. Developed best practice guidelines on metadata and knowledge management through interviews and workshops with European and American companies. Critical Mass (subcontractor) Revised the existing product taxonomy, identified an optimum set of attributes to support flexible product merchandising, and developed a prototype to expose the new taxonomy in a faceted search and navigation interface for a Fortune 50 retailer. Defense Intelligence Agency. Prepared recommended approach and functional specifications for automatic categorization system to organize current intelligence reports. Deloitte Consulting (subcontractor) Defined and validated use cases with business users, then developed and tested metadata models and taxonomy outlines to support enterprise document management initiatives for a top credit card issuer. Federal Aviation Administration. Developed an agency-wide metadata standard and faceted taxonomy for the FAA based on Dublin Core. Forest Service. Conducted an agency-wide content audit using information retrieval spiders to systematically extract information and populate a metadata database. Halliburton. Consolidated multiple taxonomies developed for separate product lines, and developed rules and procedures for applying the taxonomy to content and delivering it through a portal application. Hewlett Packard. Developed taxonomy roadmap with recommended subject metadata facets, recommended controlled vocabularies, and taxonomy governance procedures for the business to employee portal. IDEAlliance. Developed generic content aggregation XML dtd. Motorola. Prepared metadata and taxonomy recommendations to develop and support a content supply chain. OCLC. Developed outreach template to lobby software vendors to implement support for Dublin Core metadata. PeopleSoft. Developed core metadata specification and faceted taxonomy based on Dublin Core. NASA (Jet Propulsion Laboratory contract) Completed classification scheme that encompasses all of NASA web content, and provided three specializations for internal and external portal projects. Then refined the taxonomy and developed standard formats that support its use. NASA Goddard Space Flight Center. Reviewed GSFC taxonomies, and their potential for integration into the NASA Taxonomy. Small Business Administration. Developed taxonomy specifications for populating the metadata for the GSA Business Gateway e-Forms initiative. Social Security Administration. Prepared taxonomy recommendations for tagging web content with Dublin Core metadata, and developed new vision for flexible site organization, site navigation & site search that uses explicit metadata. Time, Inc. Developed specific content aggregation XML dtd. U.S.D.A. Economic Research Service. Developed scalable metadata framework to enable content reuse, and handle changes in business goals, customer needs, and retrieval concerns. U.S.D.A. e-Government Program. Validated core metadata specification and faceted taxonomy based on Dublin Core, National Agricultural Library Thesaurus (NALT), and other pertinent vocabulary standards to be used as the standard for agency-wide information management. U.S. General Services Administration Office of Citizen Services. Developed content architecture for FirstGov (the official portal of the U.S. government) including an XML content model, and metadata and vocabulary specifications to facilitate a reference implementation of a content management system application.

Who we are: Theresa Regli
Over a decade of experience in cross-media publishing and content management 7 years of consulting 4 years in “traditional” media: newspapers, publishing Brought many New England newspapers online in the mid-90s Principal Consultant, CM and User Experience, Molecular Focus on users / customers and how they interact with and use information, industry education and conferences Background in linguistics Named as “one to watch” in 2005 by CMS Watch Passion for how people, cultures – and businesses – use words and language

About Molecular - Digital strategy - User experience design/redesign
Offerings designed to help organizations leverage technology to increase revenues and decrease costs 10+ years of Internet professional services expertise 120+ consultant professionals Integrated service offerings - Digital strategy - User experience design/redesign - Development & implementation - Multi-site integration - Multi-channel integration

Setting the stage: some definitions…
What is Knowledge Management? The process through which firms generate value from their intellectual assets The efficient sharing of knowledge across the enterprise: not focused on presentation Often incorrectly used synonymously with CM What is Document Management? The effective storage and retrieval of documents Traditionally not about the creation aspect of new content/documents Often incorrectly used synonymously with CM – some of the tools have evolved towards CM

Some more definitions…
What is Content Management? The integration of various technologies and processes to manage content - conception thru deployment The management of content lifecycle: create, approve, tag, publish What is Enterprise Content Management? Vendor/analyst term to include all content across the firm (web, catalog, digital, etc.) Integration of various systems to create one unified, “virtualized” system (CRM, financial, marketing, etc.) Typically thought of as a strategy and not an implementation

Putting it all together: golf anyone?
Caddy provides advice Caddy tells other caddies Other caddies provide advice Caddy master collects advice and creates tip booklet for all caddies Owner implements at 10 courses: ‘Online Caddy’ system and Personal Cart system Course Tip Sheet Golfdigest.com Course Yardage Books Knowledge Sharing Knowledge Management KM to ECM Content Management Document Management

What makes DM, CM and ECM possible?
Taxonomy Framework for organizing information based on user needs Law for categorizing information Meta Data Information about content: "data about the data" The categories, sub-categories and terms that make up a taxonomy are often employed as meta data Meta data is leveraged by a CMS to find and display content easily and consistently Enables more precise search results and personalization That they have made considerable progress in identifying the metadata elements (i.e., structure and fields, as in a DTD). Our work for them will be: a. Idenitify which fields need taxonomy. b. Product these taxonomies

Foundations for ECM Success: Key Terms
Facets: Allow for a more complex classification structure, where the categories are applied to the information like keywords. Thus, information about a subject can be “approached” and found in different ways. For example… Hypertension Publications / Medical / Journal of Hypertension Diseases / Cardiovascular / Hypertension Associations / Medical / American Society of Hypertension Red Rock Crab Animals / Invertebrates / Crustaceans World / Seas / Pacific World / Land / Australasia

Foundations for ECM Success: Key Terms
Synonym Ring: A set of words/phrases that can be used interchangeably for searching. (Hypertension, high blood pressure) Thesaurus: A tool that controls synonyms and identifies the relationships among terms Controlled Vocabulary: A list of preferred and variant terms, with relationships (hierarchical and associative) defined. A taxonomy is a type of controlled vocabulary.

Sample Taxonomies That they have made considerable progress in identifying the metadata elements (i.e., structure and fields, as in a DTD). Our work for them will be: a. Idenitify which fields need taxonomy. b. Product these taxonomies

The Library of Congress
A) General Works B) Philosophy, Psychology, Religion C) History: Auxiliary Sciences D) History: General and Old World E) History: United States F) History: Western Hemisphere G) Geography, Anthropology, Recreation H) Social Science J) Political Science K) Law L) Education M) Music N) Fine Arts P) Literature & Languages Q) Science R) Medicine S) Agriculture T) Technology U) Military Science V) Naval Science Z) Bibliography & Library Science While both taxonomies are used in libraries, note how the differences in classification are specifically accommodating: Audience Subject matter That they have made considerable progress in identifying the metadata elements (i.e., structure and fields, as in a DTD). Our work for them will be: a. Idenitify which fields need taxonomy. b. Product these taxonomies

Category Facets Meta data (rheumatoid is a type of arthritis)
Enables user-intuitive presentation of information That they have made considerable progress in identifying the metadata elements (i.e., structure and fields, as in a DTD). Our work for them will be: a. Idenitify which fields need taxonomy. b. Product these taxonomies

That they have made considerable progress in identifying the metadata elements (i.e., structure and fields, as in a DTD). Our work for them will be: a. Idenitify which fields need taxonomy. b. Product these taxonomies

Taxonomy as Multi-Faceted Browsing Tool

Epicurious, First Facet
Browse > Picnics

Epicurious, Second Facet
Browse > Picnics > Poultry

Business Case and Motivations for Taxonomies
We divide taxonomy projects into three problems: the ROI Problem, the Tagging Problem, and the Taxonomy Problem The ROI Problem: How are we going to use content, metadata, and taxonomies in applications to obtain business benefits?

What technology analysts have said
“Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better.” “Enriching content with structured metadata is critical for supporting search and personalized content delivery.” “Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.” “Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.” The WHY part

Metadata specification – a recipe example
Element Data Type Length Req. / Repeat Source Purpose Asset Metadata Unique ID Integer Fixed 1 System supplied Basic accountability Recipe Title String Variable Licensed Content Text search & results display Recipe summary Content Main Ingredients List ? Main Ingredients vocabulary Key index to retrieve & aggregate recipes, & generate shopping list Subject Metadata Meal Types * Meal Types vocab Browse or group recipes & filter search results Cuisines Courses Courses vocab Coking Method Flag Cooking vocab Link Metadata Recipe Image Pointer Product Group Merchandize products Use Metadata Rating Filter, rank, & evaluate recipes Release Date Date Publish & feature new recipes Legend: ? – 1 or more * - 0 or more

Fundamentals of taxonomy ROI
Building and maintaining a taxonomy, and tagging content with it, are costs. They are not benefits There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues Putting a new taxonomy into operation requires UI changes and/or backend system changes, as well as data changes Every metadata field costs money, time, and goodwill You need to determine those changes, and their costs, as part of the taxonomy ROI

Common taxonomy ROI scenarios
Catalog site - ROI based on increased sales through improved: Product findability Product cross-sells and up-sells Customer loyalty Call center - ROI based on cutting costs through: Fewer customer calls due to improved website self-service Faster, more accurate CSR responses through better information access Compliance – ROI based on: Avoiding penalties for breaching regulations Following required procedures (e.g. Medical claims) Knowledge worker productivity - ROI based on cutting costs through: Less time searching for things Less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs Executive mandate No ROI at the start, just someone with a vision and the budget to make it happen

Taxonomy Justification | Knowledge Worker Productivity
Huge cost to the user & organization Finding information (time, frustration, precision) “15%-30% of an employee’s time is spent looking for information, and they find it only 50% of the time” IDC Research, on the business drivers for building a taxonomy Sun’s usability experts calculated that 21,000 employees were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10M per year Web Design and Development, Jakob Nielsen Managers spend 17% of their time (6 weeks a year) searching for information Information Ecology, Thomas Davenport & Lawrence Prusack Lost Learning Value Related products, services, projects, people

Challenges of organizing content on enterprise portals (1)
Multiple subject domains across the enterprise Vocabularies vary Granularity varies Unstructured information represents about 80% Information is stored in complex ways Multiple physical locations Many different formats Tagging is time-consuming and requires SME involvement Portal doesn’t solve content access problem Knowledge is power syndrome Incentives to share knowledge don’t exist Free flow of information TO the portal might be inhibited Content silo mentality changes slowly What content has changed? What exists? What has been discontinued? Lack of awareness of other initiatives The complexity of storage of information makes it a significant challenge to integrate all the data stores to act as a single seamless repository Content silos result in poor communication among groups ; lots of extra work because one group doesn’t know what the other is doing or has already done Yahoo employs a completely manual approach to tagging. All content is considered by SMEs.

Challenges of organizing content on enterprise portals (2)
Lack of content standardization and consistency Content messages vary among departments How do users know which message is correct? Re-usability low to non-existent Costs of content creation, management and delivery may not change when portal is implemented: Similar subjects, BUT Diverse media Diverse tools Different users How will personalization be implemented? How will existing site taxonomies be leveraged? Taxonomy creation may surface “holes” in content

FAQ – How do you sell it? Don’t sell the taxonomy, sell the vision of what you want to be able to do Clearly understanding what the problem is and what the opportunities are Do the calculus (costs and benefits) Design the taxonomy (in terms of LOE) in relation to the value at hand

NASA Taxonomy Project Goal: Enable Knowledge Discovery
Make it easy for various audiences to find relevant information from NASA programs quickly Provide easy access for NASA resources found on the Web Share knowledge by enabling users to easily find links to databases and tools Provide search results targeted to user interests Enable the ability to move content through the enterprise to where it is needed most Comply with E-Government Act of 2002 Be a leading participant in Federal XML projects

NASA Taxonomy Project Goal: Develop Best Practices
Design process that: Incorporates existing federal and industry terminology standards like NASA AFS, NASA CMS, FEA BRM, NAICS, and IEEE LOM Provides a product for the NASA XML namespace registry Complies with metadata standards like Z39.19, ISO 2709, and Dublin Core Practices believed to increase interoperability and extensibility

Development Process: Interviews
> 70 Interviews conducted across NASA complex. Funders 4% Public 18% Projects 13% Scientists Administrators 26% Researchers Engineers 22% Categorized by type - 52%–Projects, Engineering & Science

Scale of NASA Taxonomy Facet # Terms Source Audiences 62 Custom
Business Purpose 96 Existing Competencies 169 Content Types Industries 22 Instruments 56 Semi Locations 106 Missions/Projects 648 Organizations 323 Subject Categories 78 Total 1656 Facets combine, so millions of documents can be finely categorized with a relatively small number of values.

NASA Taxonomy Web Site http://nasataxonomy.jpl.nasa.gov Background and
Link to XML DTDs and Schema Link to Metadata Specification Background and training materials Links to Controlled Vocabularies

Benefits of Approach Facets and Use of Standards made it possible to respond to three unexpected needs during and after the project: Search demo Semantic search demo Integration with detailed vocabularies

Example | NASA Taxonomy Search Prototype
Current Search State Facets, Values, and Counts

Example 1 | NASA Taxonomy Search Prototype
¾ of the way through the project, request was made to see a demo of the taxonomy in action Taxonomy was represented in RDF Metadata was scraped from a few repositories around NASA (~220k records), converted to RDF Some metadata automatically created with simple keyword matches RDF loaded into Seamark search tool Time: approx 2 man-weeks Additional cost: $0 Result: Useful demo that illustrated new facts

Example 2 | Semantic Search
After project was over, another project was doing ‘semantic search’ They heard about NASA Taxonomy They downloaded the RDF file for the Missions & Projects vocabulary, mapped to their RDF/OWL tool, and used it to answer questions about different types of missions They did not have to ask any questions or request any data changes Courtesy Dean Allemang, Top Quadrant, Robert Brummett, NASA HORM

Example 3 | Local Extension
After project, JPL wanted to incorporate content from additional repositories Existing metadata was easily mapped as extensions to NASA taxonomy RDF mapping allowed Search tool to make immediate use of the metadata.

The Tagging Problem How are we going to populate metadata elements with complete and consistent values? What can we expect to get from automatic classifiers?

Tagging Province of authors (SMEs) or editors? Taxonomy often highly granular to meet task and re-use needs Vocabulary dependent on originating department The more tags there are (and the more values for each tag), the more hooks to the content If there are too many, authors will resist and use “general” tags (if available). Automatic classification tools exist, and are valuable, but results are not as good as humans can do. “Semi-automated” is best Degree of human involvement is a cost/benefit tradeoff

Automatic categorization vendors | Analyst viewpoint
high Content Volumes low Scalability requires simple creation of granular metadata and taxonomies. Better content architecture means more accurate categorization, and more precise content delivery. Surprisingly, most organizations are better off buying tools from lower left quadrant. Their absolute accuracy is less, but it comes with a lot of other features – UI, versioning, workflow, storage – that provide the basis for building a QA process. low high Accuracy Level

Considerations in Automatic Classifier Performance
Classification Performance is measured by “Inter-cataloger agreement” Trained librarians agree less than 80% of the time Errors are subtle differences in judgment, or big goofs Automatic classification struggles to match human performance Exception: Entity recognition can exceed human performance Classifier performance limited by algorithms available, which is limited by development effort Very wide variance in one vendor’s performance depending on who does the implementation, and how much time they have to do it Accuracy Development Effort/ Licensing Expense Regexps Trained Librarians potential performance gain 80/20 tradeoff where 20% of effort gives 80% of performance. Smart implementation of inexpensive tools will outperform naive implementations of world-class tools.

Tagging tool example | Interwoven MetaTagger
Manual form fill-in w/ check boxes, pull-down lists, etc. Auto keyword & summarization

Tagging tool example | Interwoven MetaTagger
Auto-categorization Rules & pattern matching Parse & lookup (recognize names)

Metadata tagging workflows
Even ‘purely’ automatic meta-tagging systems need a manual error correction procedure. Should add a QA sampling mechanism Tagging models: Author-generated Central librarians Hybrid – central auto-tagging service, distributed manual review and correction Compose in Template Submit to CMS Analyst Editor Review content Problem? Copywriter Copy Edit content Hard Copy Web site Y N Approve/Edit metadata Automatically fill-in metadata Tagging Tool Sys Admin Even if an organization has automatic meta-tagging tools, they need to have an exception-handling process for correcting errors, and should have a QA process for monitoring at least a sample of the automatic meta tags. There are two main models for adding metadata – author-generated vs. centralized librarians. Both have problems. Author-generated has spotty coverage and quality. Centralized staff can be difficult to justify financially. Hybrid approach Sample of ‘author-generated’ metadata workflow

Automatic categorization vendors | Pragmatic viewpoint
high Content Volumes low Scalability requires simple creation of granular metadata and taxonomies. Better content architecture means more accurate categorization, and more precise content delivery. Surprisingly, most organizations are better off buying tools from lower left quadrant. Their absolute accuracy is less, but it comes with a lot of other features – UI, versioning, workflow, storage – that provide the basis for building a QA process. low high Accuracy Level

Seven practical rules for taxonomies
Incremental, extensible process that identifies and enables users, and engages stakeholders Quick implementation that provides measurable results as quickly as possible Not monolithic—has separately maintainable facets Re-uses existing IP as much as possible A means to an end, and not the end in itself Not perfect, but it does the job it is supposed to do—such as improving search and navigation Improved over time, and maintained

Typical Project Timeline*
Weeks 1-2 Weeks 3-4 Weeks 5-6 Weeks 7-8 Weeks 9-10 Weeks 11-12 Kick-off and Prep Interviews Content Analysis Document Requirements Develop & Validate Taxonomy Implementation Caveat: this will vary greatly based on the complexity of the content and the organization

Seven phases of taxonomy and metadata design
1 Identify Objectives Conduct interviews 2 Inventory Content ID sources, spider assets & extract metadata Define fields & purpose 3 Specify Metadata 4 Model Content Define content chunks & XML DTDs 5 Specify Vocabularies Compile controlled vocabularies We tend to break the job of developing the metadata and taxonomy into seven steps in order to talk about them. But in reality, there are some pre-requisites, and things are not as simple as the seven steps might imply... 6 Specify Procedures Develop workflow, rules & procedures 7 Train Staff Develop materials & train staff

Seven phases of taxonomy and metadata design
1 Identify Objectives Interview core team and stakeholders 2 Inventory Content ID sources, spider assets & extract metadata Define fields & purpose 3 Specify Metadata 4 Model Content Define content chunks & XML DTDs 5 Specify Vocabularies Compile controlled vocabularies 6 Specify Procedures Start with UI sketches, off-the-shelf rules. 7 Train Staff Manually tag small sample Review tagged samples, default procedures Gather additional sources, if any Revise if needed, bake into alpha CMS Revise, use in alpha CMS alpha workflows in CMS Use alpha CMS to tag larger sample Interview alpha users Modify CMS for beta Revise, use in beta CMS Modify & extend workflows Finalize training materials & train staff Tailor the default materials Use beta CMS to tag larger sample Interview beta users Modify for 1.0 Revise using team procedure Finalize procedure materials Also, the seven steps need to be applied iteratively in planning, alpha, beta, and final iterations. Don’t take these exact task descriptions, durations, and intervals too seriously because they will vary from one project to another. The main point is just that to do a good job you need to take an initial taxonomy, apply it to some content, then evaluate and modify accordingly. You then have to apply it to a larger sample of content the next time. The nice thing about this is that you can start small, and that in the larger iterations you can help test the CMS system and feed into the alpha and beta testing processes. Synergy! --- But there are some large points that hold true across a broad range of projects. A good chunk of time needs to be spent at the start gathering requirements, looking at sample materials, and trying some things manually before starting to code them into a CMS (or other system). The alpha development stage will probably take the most time, with beta and final being smaller and smaller modifications to the heart of the system, but larger documentation and training efforts. I can just hear people saying that this will take a long time. Well, guess what? You are right, it will. But do you know the average length of time for a successful CMS implementation? Do you know the relative frequency of successful implementations where people started out thinking it would take a long time, versus those where people thought they could just “stand it up” because their vendor told them so? It is much safer to plan on doing things in an iterative fashion, learning from early stages and adjusting them, then to plan on getting everything right the first time when there are so many unknowns. Stage Plan & Prototype Alpha Dev & Test Beta D&T Final D&T Participants Project Team Stakeholders and SMEs Friendly Users Audiences

Project Prep | Key Considerations
What is the level of knowledge about taxonomy in the company as a whole? What are the most important priorities for the taxonomy? How much do I know about the subject matter? How much ramp up do I need? How many types of content will I need to consider? How much content is there (quantity-wise)? How many stakeholders and subject matter experts (SMEs) are there? How are they organized? (e.g. one “owner/SME” per product line?) What types of politics or challenges exist today between groups of owners/subject matter experts? Will they debate and/or argue over terminology or what should be classified where?

Project Prep | Key Considerations
Does any of the terminology need to be created from scratch or re-written? What kind of data store will the taxonomy be used in? (Database? XML repository?) Has any user feedback been received so far (internal or external, formal or informal), as to what they like and don’t like about finding the company’s information? Is there a product database of any sort in existence today? What product characteristics are accounted for? (name, description, number, etc.) If there is a web site, how is it organized today? (e.g. products, solutions, roles, etc.) How will users tag content using this taxonomy? Do they have that software/interface in place today? Will we need to train users to tag content?

Content Analysis | Steps and Approaches
Conduct stakeholder interviews to determine project goals and success metrics Be sure to be prepared with your own! Conduct industry competitive analysis if appropriate Review content and create a high-level inventory Determine the terms the business uses to categorize information (top-down approach) Determine the term the employees use when seeking information (bottom-up approach) Gather all terms / categories / content types Check vis-à-vis original content inventory to ensure everything is accounted for Will discuss CVS vs. RMF on Competitive analysis piece

Example | Document Topic Inventory

Example | Product Topic Inventory

Taxonomy creation process | Steps and Approaches
SME analysis of content to determine categories and/or tags Workshops with SME and stakeholders to gain additional understanding of content Card sorting exercises with business users or end customers to determine intuitive clustering and category names Auto-generation of “rough” taxonomy via software tool Refine with SMEs and taxonomy experts Iterative taxonomy creation over a period of several weeks depending on size and scope of the effort Validate taxonomy via user testing

Taxonomy creation process | Best practices
Be aware of the competition: how they name and categorize products Involve engineers early: ensure that the taxonomy you’re creating can be used with the technology Be aware of key parties’ viewpoints After determining the high-level categories, have a midpoint check in with stakeholders to ensure you’re on the right track and build ongoing consensus For the purposes of web design, leverage sample page layouts to show how categorization and tagging will affect page layout and content Remember taxonomies must evolve and progress as your business changes

Taxonomy Business Processes
Taxonomies must change, gradually, over time if they are to remain relevant Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions A team will need to maintain the taxonomy on a part-time basis Taxonomy team reports into CM governance or steering committee

Taxonomy governance | Change process overview
CV Consumers 2: Taxonomy Team decides when to update CV snapshots Taxonomy Facets 2: NASA Taxonomy Team Site Search Tool decides when to Site Search Tool CV Sources update snapshots of external CVs Portal Subject Codes Portal Codes Taxonomy Tool Working Copies Working Papers of CVs, maintain in Project Archives Taxonomy Tool Expertise NASA Competencies 3: Team adds value via definitions, synonyms, classification rules, training materials, etc. Web CMS 3: Team adds value to ’ ’ snapshots through Other Internal CVs from other definitions, synonyms, NASA Sources 4: Updated versions of CVs published to consumers classification rules, DAM DMS ’ ’ training materials, etc. 4: Updated versions of CVs to Consumers External Standard Tagging Tool Internally Created External Standard Metatagging Tool Internally Created Vocabularies CVs Taxonomy Governance Environment Search UI NASA Taxonomy Search UI 1: External controlled vocabularies (CVs) change on their own schedule Governance Environment

Taxonomy governance | Generic team charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated taxonomy materials, such as: Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review) Committee will consider costs and benefits of suggested change Taxonomy Team will: Manage relationship between providers of source vocabularies and consumers of the Taxonomy Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices Promote awareness and use of the Taxonomy

Taxonomy governance team | Generic roles
Executive Sponsor Advocate for the taxonomy team Business Lead Keeps committee on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs Obtains needed resources if those in committee can’t accomplish a particular task Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems Content Specialist Committee’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Taxonomy Specialist Suggests potential taxonomy changes based on analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner Reality check on process change suggestions

Taxonomy governance | Where changes come from
Firewall Firewall Firewall Application UI Application Application Tagging UI Tagging Tagging UI UI UI UI Tagging Logic Application Logic Content Content Tagging Tagging Logic Logic Taxonomy Taxonomy Staff Staff Query log Query log notes notes analysis analysis ‘ ‘ missing missing ’ ’ concepts concepts End User End User Tagging Staff Tagging Staff Recommendations by Editor Small taxonomy changes (labels, synonyms) Large taxonomy changes (retagging, application changes) New “best bets” content The taxonomy must be changed over time. Suggestions for changes can come from users, through query log analysis, and staff, from feedback form. Governance structure needed to make sure changes are justified. Committee considerations Business goals Changes in user experience Retagging cost Taxonomy Editor Taxonomy Editor experience experience Steering Committee Steering Committee Requests from other Requests from other parts of the organization parts of NASA

Taxonomy governance | Taxonomy maintenance workflow
Problem? Yes No Add to enterprise Taxonomy Suggest new name/category Review new name Copy edit new name Problem? Taxon-omy No Yes Taxonomy Tool Analyst Editor Copywriter Sys Admin

Sample Taxonomy Editor: Data Harmony
Hierarchy Browser Standard Term Info

Taxonomy editing tools vendors
Immature industry – no vendors in upper-right quadrant! Most popular taxonomy editor? MS Excel high Ability to Execute High functionality, high cost ($100k!) low Widely used, cheap, single-user Niche Players Visionaries Completeness of Vision

Measuring Metadata and Taxonomy Quality
Taxonomy development is an iterative process Elicit feedback via walk-throughs, tagging samples, and card sorting exercises Use both qualitative and quantitative methods, and remain flexible throughout

Taxonomy testing | Qualitative methods
Process Validation Walk-throughs Show and explain Approach Consistency to rules Appropriateness to task Usability Testing Contextual analysis Tasks are completed successfully Time to complete task is reduced User Satisfaction Survey Reaction to new interface Reaction to search results Tagging samples Tag sample content with taxonomy Content ‘fit’ Fills out content inventory Training materials for people & algorithms Basis for quantitative methods

Quantitative Method | How evenly does it divide the content?
Background: Documents do not distribute uniformly across categories Zipf (1/x) distribution is expected behavior 80/20 rule in action (actually 70/20 rule) Methodology: Part of alpha test of ‘content type’ for corporate intranet 115 URLs selected at random from search index were manually categorized. Inaccessible files and ‘junk’ were removed Results: Results were slightly more uniform than the Zipf distribution, which is better than expected

In the trade, “Corn Tortillas” are a Dairy item!
Quantitative Method | How intuitive (repeatable) are the categorizations? Methodology: Closed Card Sort For alpha test of a grocery site 15 Testers put each of 100 best-selling products into one of 10 pre-defined categories Categories where fewer than 14 of 15 testers put product into same category were flagged Results: “Cocoa Drinks – Powder” is best categorized in both “Beverages” and “Grocery”. % of Testers Cumulative % of Products 15/15 54% 14/15 70% 13/15 77% 12/15 83% 11/15 85% <11/15 100% In the trade, “Corn Tortillas” are a Dairy item!

Quantitative Method | How does taxonomy “shape” match that of content?
Background: Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas Methodology: 25,380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource) Counts of terms and documents summed within taxonomy hierarchy Results: Roughly Zipf distributed (top 20 terms: 79%; top 30 terms: 87%) Mismatches between term% and document% flagged Term Group % Terms % Docs Administrators 7.8 15.8 Community Groups 2.8 1.8 Counselors 3.4 1.4 Federal Funds Recipients and Applicants 9.5 34.4 Librarians 1.1 News Media 0.6 3.1 Other 7.3 2.0 Parents and Families 6.0 Policymakers 4.5 11.5 Researchers 2.2 3.6 School Support Staff 0.2 Student Financial Aid Providers 1.7 0.7 Students 27.4 7.0 Teachers 25.1 11.4 Source: Courtesy Keith Stubbs, US. Dept. of Education

Metadata Maturity Model
Taxonomy governance processes must fit the organization As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and Metadata Honestly assess your organization’s metadata maturity in order to design appropriate governance processes We are starting to define a maturity model, similar to the SCCM model in the software world: Initial - ad hoc, each project begins from scratch. Repeatable - Procedures defined and used, but not standardized across organization or are misapplied to projects. Defined – Standard processes are tailored for project needs. Strategic training for long-range goals is in place. Managed – Projects managed using quantitative quality measures. Process itself is measured and controlled. Optimizing – Continual process improvement. Extremely accurate project estimation.

Purpose of Maturity Model
Estimating the maturity of an organization’s information management processes tells us: How involved the taxonomy development and maintenance process should be Overly sophisticated processes will fail What to recommend as first steps Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals Mature processes have expenses which must be justified by consequent cost savings or revenue gains IT Maturity may not be core to your business

Metadata Maturity Scorecard
Initial Repeatable Defined Managed Optimizing Organizational Structure Executive Sponsorship * Budgeting Hiring & Training Quality Assurance Manual Processes 1 Automated Processes Project Management Estimating & Scheduling Cost Control Project Methodology 2 Design and Execution Planning Design Excellence Development Maturity 1 – X is starting to examine search query logs, which is an important first step in improving search. But this is only an isolated example. 2 – IT has a project methodology they are trying to use across all projects. But not all business units have project methodologies.

Metadata Maturity Quick Quiz
What process is in place to examine query logs? Is there a process for adding directories and content to the repository, or do people just do what they want? Is there an organization-wide metadata standard, such as an extension of the Dublin Core, used by search tools, multiple repositories, etc.? Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools? Who is breathing down my neck to improve search on our intranet? Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content). Is there an established QA procedure for ensuring metadata accuracy and conformance? Are there established qualitative and quantitative measures of metadata quality? Is there a centralized metadata group with tools and services offered around the organization? Are there hiring and training practices especially for metadata and taxonomy positions? Have features been removed from the metadata standard?

Example: PC Connection
We recently redesigned the web site of PC Connection. These are a few shots of their old web sites. As you can see, there aren’t a lot of ways to find products. Note how the navigation on the left is cluttered with various functionality. They were applying an excellent search and filter technology called Endeca, which leveraged the categories and product attributes to allow users to filter products by different criteria to find information. However, as we learned in usability testing, the design of the filters to narrow the list of results was not prominent or intuitive. Then, once the user got into a product page, the filtering criteria disappeared, and users were not able to continue to refine if need be. We set out to leverage the product taxonomy and refine the user experience of finding a product.

Leveraging Technology: Endeca
Drop-down menus More visible More traditional to match user expectations Challenge: Give the “results” more real estate but keep the filters prominent This is the very important product category page showing the Endeca filters and how the user can drill down as they look for a product. Client feedback was that the results here did not have enough horizontal real estate.

PC Connection: The Solution
DHTML “slider” applied for second-level navigation, exposing all product attributes for easy filtering Validated solution Powerful comparisons between first and second usability tests (e.g., 5 out of 8 participants used filters on first test, 10 out of 10 used filters on second test) We then presented our recommended approach using a dhtml script to make the filters slide out to the left and hide on mouse click.

PC Connection: Results
All product categories consistently accessible Drop-down menus with product attributes facilitate ease of filtering Easier to use different facets of taxonomy to find desired products Customers use, rather than struggle with, navigation

1:40 Taxonomy Definitions and Examples 2:10 Business Case and Motivations 2:30 Case Study 2:45 Tagging and Tools 3:00 Break 3:15 Running a Taxonomy Project 3:45 Taxonomy Maintenance and Governance 4:15 Case Study 4:30 Summary and Discussion 5:00 Adjourn

Lessons Learned: Taxonomies for Business Impact
Content is no longer king: the user is Understand how your users/customers want to interact with information before designing your taxonomy and the user interface Carry those user needs through to the back-end data structure and front-end user interface Empower the user with the categories and content attributes they need to filter and find what they want Leverage UE design best practices like usability testing to determine needs and validate taxonomy and interface design Remember that taxonomy is a “snapshot in time”: keep it up to date, let it evolve

What is the problem you are trying to solve?
Summary What is the problem you are trying to solve? Improve search (or findability) Browse for content on an enterprise-wide portal Enable business users to syndicate content Otherwise provide the basis for content re-use Comply with regulations What data and metadata do you need to solve it? Where will you get the data and metadata? How will you control the cost of creating and maintaining the data and metadata needed to solve these problems? CMS with a metadata tagging products Semi-automated classification Taxonomy editing tools Appropriate governance process

1:40 Taxonomy Definitions and Examples 2:10 Business Case and Motivations 2:30 Case Study 2:45 Tagging and Tools 3:00 Break 3:15 Running a Taxonomy Project 3:45 Taxonomy Maintenance and Governance 4:15 Case Study 4:30 Summary and Discussion 5:00 Adjourn

Thank you! Theresa Regli Ron Daniel

Taxonomies and Meta Data for Business Impact

Similar presentations

Presentation on theme: "Taxonomies and Meta Data for Business Impact"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Taxonomies and Meta Data for Business Impact

Similar presentations

Presentation on theme: "Taxonomies and Meta Data for Business Impact"— Presentation transcript:

Similar presentations

About project

Feedback