Presentation on theme: "Metadata & Taxonomies for a More Flexible Information Architecture"— Presentation transcript:
1Metadata & Taxonomies for a More Flexible Information Architecture Information Architecture SummitMarch 16, 2002Amy J. Warner, Ph.D.
2Outline What I’ll cover: Underlying themes: Metadata and IA. Metadata schema.Vocabulary development.Underlying themes:Standards.Reality.Some IR (information retrieval) issues.Amy J. Warner, Ph.D.
3What is Metadata?Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.Chris Taylor University of QueenslandAmy J. Warner, Ph.D.
4Types & Functions of Metadata Introduction to Metadata, Getty Information InstituteAmy J. Warner, Ph.D.
5Confusing Terminology Controlled vocabulariesSubject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materialsThesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering)Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum speciesAmy J. Warner, Ph.D.
7Metadata & IA Users Business Context Content Determine how target audience(s) searchfor and use informationContentUsersBusinessContextIdentify patternsin contentDetermine how stakeholderswant to organize &presenttheir informationAmy J. Warner, Ph.D.
8IA ‘Generations’ ‘Brochureware’ Pages served from database Metadata-driven websiteCMSAmy J. Warner, Ph.D.
9Metadata in Metadata-Driven Websites Author Title DocType Audience URLJ. Jones xxxx White Paper EmployeesMetadataRecordsContentAmy J. Warner, Ph.D.
10Two Parts to Generating a Metadata Schema Decisions about indexable parameters (attributes, aspects) of documents; this corresponds to fields in the database records.Decisions about the elements (terms, descriptors, subject headings, tags) that these fields contain.Amy J. Warner, Ph.D.
11Two Possibilities Content already exists Identify content that exists--content inventory.Most or all content does not existUse ‘wish lists’ to identify desired content.To do content inventory, need to go to those who are going to develop, own, maintain content.Amy J. Warner, Ph.D.
12Content Analysis Look for patterns, similarities: logical--themes, sensitivity, specialization.physical--formats, dynamic vs. static (dated vs. rarely updated).Look for relationships--note connections between content (parent-child, sibling, dependencies.Begin to create groupings.Amy J. Warner, Ph.D.
13Generating a Metadata Table The beginning of a metadata-driven website.Determine the major indexable parameters or attributes for each major document type in your sample.Determine what major types of rules or general guidelines your indexing system will follow for each attribute.Create an X-by-Y table.Put indexable attributes on the X axis and the rules on the Y axis.Fill in the decisions you make about each rule application in the individual cells of the table.Amy J. Warner, Ph.D.
15User and Stakeholder Involvement When organizing content, start with the content, generate the metadata, and then evaluate with users and stakeholders.When organizing entities (i.e. products, projects) where content is not the major focus, start with stakeholders and users to determine metadata.Amy J. Warner, Ph.D.
16Identify Terms Published Reference Materials Content Users Experts Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.ContentRepresentative sample of web site / intranet.UsersSearch log analysis, surveys, interviews.ExpertsAuthors, subject experts.Amy J. Warner, Ph.D.
17Organize Terms Define preferred terms. Link synonyms and variants. Synonym RingsGroup preferred terms by subject.Identify broader and narrower terms. Taxonomies / HierarchiesIdentify related terms. ThesauriAmy J. Warner, Ph.D.
18Variant Terms Variant terms provide the user with entry points into the vocabulary.Synonyms (same meaning):cats USE felines helicopters USE whirlybirdsLexical Variants (different word forms):paediatrics USE pediatrics BK USE Burger KingQuasi-Synonyms (treated as equivalent):generic posting: beagle USE dogantonyms/continuum: wetness USE drynessAmy J. Warner, Ph.D.
19Term SpecificityAssuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).Vocabulary A Vocabulary BUnited States United StatesCaliforniaSan DiegoAmy J. Warner, Ph.D.
20Compound Terms Article Title: “Software for Information Architects” Amy J. Warner, Ph.D.
21Facets Facets of a Topic Facets of Documents Topic Things (entities) ConceptsProcessesPeopleOrganizationsOccupationsetc.TopicAudienceIntellectual LevelFormTypeLanguageDateFacets of a Topic Facets of DocumentsAspects of Documentsto IndexControlledVocabular(ies)Amy J. Warner, Ph.D.
22Facet AnalysisFacets come from content inventory, intuition, and users.Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).Amy J. Warner, Ph.D.
23Polyhierarchy Strict Hierarchies Polyhierarchies Each term appears in only one place in the hierarchy.Essential for placement of physical objects.PolyhierarchiesTerms cross-listed in multiple categoriesAccepts complex nature of reality.Amy J. Warner, Ph.D.
24Polyhierarchy Medical Subject Headings (MeSH) Compound terms needed to manage 6 million documents in Medline.High level of pre-coordination forces polyhierarchy.Terms may have more than one BT.Amy J. Warner, Ph.D.
25Facets, Coordination, Specificity Amy J. Warner, Ph.D.
26Semantic Relationships Equivalence:Use/Used For (USE/UF)Leads from variants to preferrede.g., prams: USE baby carriagesAmy J. Warner, Ph.D.
27Semantic Relationships Hierarchical:Broader Term/Narrower Term (BT/NT)TypesGeneric (class/species, inheritance)Vertebrata NT AmphibiaWhole-Part (associative unless exclusive)Ear NT Vestibular ApparatusInstance (proper name)Seas NT Mediterranean SeaAmy J. Warner, Ph.D.
28Semantic Relationships Associative:Related Term (RT, See Also)Non-hierarchical and non-equivalentRelation should be “strongly implied”e.g., hammers RT nailsAmy J. Warner, Ph.D.
29Associative Relationships Field of Study and Object of Study:Forestry RT ForestsProcess and its Agent:Temperature Control RT ThermostatConcepts and their Properties:Poisons RT ToxicityAction and Product of Action:Weaving RT ClothConcepts Linked by Causal Dependence:Bereavement RT DeathAmy J. Warner, Ph.D.
30Leveraging the Thesaurus User Interface:Generate browsable indexes (site-wide, sub-site, specialized authority lists).Enable Field-Specific Searching (filters, zones, sorting).Support personalization (map profile to vocabulary).Behind the Scenes:Enable efficient content management.Support decentralized tagging.Amy J. Warner, Ph.D.
31Uses of Metadata-Driven Website RoutingSearchNavigationAmy J. Warner, Ph.D.
32Routing Document Stream Metadata Filter Document Subset From IndividualContributors orSyndication ServiceProfile orFilterAmy J. Warner, Ph.D.
33Generalizations about Routing Can be ‘push’ or ‘pull’.Can be driven by various metadata elements (e.g., audience, topic, etc.).May have both internal and external metadata schemes to consider; mapping may be an important issue.Amy J. Warner, Ph.D.
34Searching Searching User Query Databases Document Subset http://…. MetadataRecordsAmy J. Warner, Ph.D.
42Generalizations about Search & Navigation The relationship between the metadata and search engine capabilities is crucial.Controlled vocabulary and keyword searching are often both enabled.Navigation and search are often both provided as complements to each other.Amy J. Warner, Ph.D.
43Contact: Amy J. Warner, Ph.D. Questions?? firstname.lastname@example.org