Alternatives to Metadata IMT 589 February 25, 2006
IMT589- Applied and Structural Metadata2 Ways to Express Meaning: for people & machines General Logic Glossaries / Controlled Vocabularies Data and Document Metamodels Formal Knowledge Bases & InferenceInformal Taxonomies and Thesauri Terms Thesauri formal Taxonomies Frames (OKBC) Data Models (UML, STEP) Restricted Logics (OWL, Flogic) Principled, informal taxonomies ad hoc Hierarchies (Yahoo!) structured Glossaries XML DTDs Data Dictionaries (EDI) ‘ordinary’ Glossaries XML Schema DB Schema Michael Uschold. Copyright © 2004 Boeing. All rights reserved. Boeing Technology | Phantom Works | E&IT | Mathematics and Computing Technology
February 25, 2006IMT589- Applied and Structural Metadata3 Web of Trust? Jenkins article describes RDF as method for achieving “Web of Trust” After this quarter, do you see any barriers to this vision? How far did the team in this article get toward that vision? Do you think the “keyword” element is sufficient to establish the vision?
February 25, 2006IMT589- Applied and Structural Metadata4 Domain Ontology Thing Individual Spatial ThingTemporal Thing Upper Ontology Event Hydraulic System Fuel System Pumping Hydraulic Pump Aircraft Engine Driven Pump Pump Mechanical Device Engine Jet Engine Fuel Pump Fuel Filter has- part done- by part-of connected-to Collection supplies-fuel-to = Generalization = Other Relationships Generic vs. Specific Ontologies Michael Uschold. Copyright © 2004 Boeing. All rights reserved. Boeing Technology | Phantom Works | E&IT | Mathematics and Computing Technology
February 25, 2006IMT589- Applied and Structural Metadata5 Automatic Indexing Rule-based systems Legacy from early AI days Require intensive upfront effort to build Usually pretty domain specific Don’t tend to scale well Bayesian Rely on similar document types for good success Requires training sequence Problems with scaling again
February 25, 2006IMT589- Applied and Structural Metadata6 Automatic Indexing Natural language approaches Requires sophisticated processing techniques to obtain word matches Highly computing intensive Again problems with scaling Other approaches Clustering algorithms- Latent Semantic Indexing-
February 25, 2006IMT589- Applied and Structural Metadata7 Another Example of Cost Johns Hopkins study baselined cleanup on author names– 7 minutes per name Automatic cleanup took 8 seconds per record but was only successful 58% of the time Conclude automated tools are a good assist, but not a solution
February 25, 2006IMT589- Applied and Structural Metadata8 Google Uses inherent characteristics of HTML markup to build associations Relies on human linking for relevance Enhances with markup characteristics New approach, based on widespread adoption of a simple standard Relies on large body of self-referring content for success
February 25, 2006IMT589- Applied and Structural Metadata9 Semantic Web Ambitious undertaking to provide context for everything Example of automated metadata generation dependent on existing classification scheme High processing overhead for large quantities Probably not sufficient for precise access in local content sets Shirky’s cautions reflect the realities of the world- but it’s a noble goal
February 25, 2006IMT589- Applied and Structural Metadata10 Where Does Metadata Fit? We tend to think that the hard problems are the big ones. So we believe that searching the Web is hard because it's so huge. But I've been thinking lately that the really hard problems are actually the ones in the middle. In the middle, many algorithms don't work that well with moderate document sets, context becomes more important, interaction is critical, and you can't get the user "in the ballpark" anymore--you have to get them to right to the thing they're looking for. Karl Fast-
February 25, 2006IMT589- Applied and Structural Metadata11 Last Words All MSIM students are experts in Information Management All experts in Information Management love Metadata Therefore, all MSIM students love Metadata Randy Pinol, IMT589, 2006