Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security.

Similar presentations


Presentation on theme: "The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security."— Presentation transcript:

1 The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security

2 Meta-Future Most of our information use and access today is based on an anonymous access model Most of our information use and access today is based on an anonymous access model It is increasingly clear that anonymous access to information and the packaging of information for single use contexts is neither sufficient for users nor an efficient use of development/engineering resources It is increasingly clear that anonymous access to information and the packaging of information for single use contexts is neither sufficient for users nor an efficient use of development/engineering resources We need to think in terms of contextualization and sensitization of information so that it can be used in any context where it pertains We need to think in terms of contextualization and sensitization of information so that it can be used in any context where it pertains In the future, information will flow – information, not the systems in which it lives or was created, will be our focus In the future, information will flow – information, not the systems in which it lives or was created, will be our focus Information needs to be agile and mobile – it needs to be sensitized to the contexts in which it might be used, to the interests of those who might use it, and to the applications that might consume it Information needs to be agile and mobile – it needs to be sensitized to the contexts in which it might be used, to the interests of those who might use it, and to the applications that might consume it

3 Meta-Future Envision a future like that described in the Netcentric Information Models formulated by the Dept. of Defense Envision a future like that described in the Netcentric Information Models formulated by the Dept. of Defense Information is created, tagged, posted and shared Information is created, tagged, posted and shared Any applications or users can – according to security privileges – use any information they can find, in any application they need to use to do their work Any applications or users can – according to security privileges – use any information they can find, in any application they need to use to do their work Technology becomes increasingly invisible but more logic based Technology becomes increasingly invisible but more logic based More and different kinds of information such as reference sources need to be managed and maintained More and different kinds of information such as reference sources need to be managed and maintained This meta-future is heavily dependent upon the existence of rich, conceptual, sensitized, meaningful metadata This meta-future is heavily dependent upon the existence of rich, conceptual, sensitized, meaningful metadata This future is now – it is simply a practical view of the Semantic Web This future is now – it is simply a practical view of the Semantic Web

4 The problem with metadata This future sounds wonderful and the contextualization vision is exciting but there’s just one problem…metadata This future sounds wonderful and the contextualization vision is exciting but there’s just one problem…metadata Metadata…. Metadata…. –Is expensive and time consuming to create –Is sometimes subjective and not granular enough –Doesn’t always address the ways that users and systems think about the information it describes –May not tell us enough about the information to trust it –may address only one context – the context for which it is created –May lives in the source application where it was created –May not be as accessible as the information asset If a Meta-Future depends on metadata, we have to solve these problems If a Meta-Future depends on metadata, we have to solve these problems

5 The problem with technologies Many of the tools are so tightly integrated, you might generate rich metadata, but it will not make your information agile or mobile Many of the tools are so tightly integrated, you might generate rich metadata, but it will not make your information agile or mobile Statistical clustering engines do not get us to persistent meaning or contextualization. Clustering engines are great for thresholding or pattern tracings, but they will not generate the kind of metadata we need to realize this future Statistical clustering engines do not get us to persistent meaning or contextualization. Clustering engines are great for thresholding or pattern tracings, but they will not generate the kind of metadata we need to realize this future We need semantic engines at the base of all our metadata efforts, and these engines need to be available in multiple languages -- semantics vary by language We need semantic engines at the base of all our metadata efforts, and these engines need to be available in multiple languages -- semantics vary by language Magic black box approaches are neither meaningful nor sustainable -- you need to have access to the programs through a user-friendly interface so you can adapt them to your environment without having to have programming knowledge Magic black box approaches are neither meaningful nor sustainable -- you need to have access to the programs through a user-friendly interface so you can adapt them to your environment without having to have programming knowledge You need to have several different kinds of technologies to do what I’m going to describe today – not just one tool You need to have several different kinds of technologies to do what I’m going to describe today – not just one tool

6 Content Dimension User Dimension Information Diffusion (Context Sensitive – Group)_ Information Gathering & Transformation (Context Sensitive – Person) Understanding the Dimensions of Contextualization Topic Scheme Business Activity Scheme Centralized Collections Content Elements & Structure (XML) Content Metadata Ideas & Tacit Knowledge Content Quality Management Topic Thesaurus Anonymous Access (Context Free) Institutional Roles Institutional Profiles Communities Of Practice Communities SDI Social Groups Social Group Profiles Individual Profiles Individual Profiles Browsing Parametric Searching By Source Searching By Tools Programmatic Metadata Capture Results Clustering Text Classification Personal SDI Social Group SDI Individual Discovery Individual Learning Task Oriented SDI Directories of Expertise Concept Filtering Threshold Filtering User-User Profile Matching Sense Making Content Repurposing Collaborative Filtering Content Aggregation Recommender Engines Publishing Syndication Engines Business Process Awareness Community Building Social Filtering Knowledge Sharing Advisory Services Q&A Systems Concept Extraction Task Filtering Results Sorting Searching Country Scheme Region Scheme Bank’s Business Language Collection Development Policy Translation Systems Organizational Entities Client Profiles Partner Profiles Authorization Rules Authentication Rules Metadata Management Context Dimension Workflow Management Online Training

7 Vision of Contextualization We need to address metadata challenges not in a traditional way but in the future context – with the idea that metadata is contextualizable and sensitized – to support information agility and mobility We need to address metadata challenges not in a traditional way but in the future context – with the idea that metadata is contextualizable and sensitized – to support information agility and mobility In order to achieve contextualization you need to have ‘extreme metadata’ In order to achieve contextualization you need to have ‘extreme metadata’ –Metadata about the information –Metadata about the user –Metadata about the context –Rich metadata designed to meet many functional requirements –Metadata in multiple languages Metadata needs to be ‘interpretable’ for and in a context Metadata needs to be ‘interpretable’ for and in a context –Reference sources not only for traditional metadata but for all of the relationships and logic that are present in an ontology (simply different kinds of taxonomy representations) –Metadata must reflect any context or interest that a user might express –Still need to have some control over metadata in order to make it understandable in different contexts

8 Content Entity1 Content Elements Content Metadata Topic Class Scheme Business Process Scheme Thesaurus Country Names Region Names Skill Sets/ Competencies Standard Statistical Variables Has values uses Has Contains User Has relationship to Has Meaning in Context Contextual Matrix & Sensiing Contextual Logic uses HierarchyFlat TaxonomyNetwork Taxonomy Profile Has Business Rule Logic Has values Content Parts Has Metadata Has Faceted TaxonomyRing Taxonomy New View of Ontology People Referenced Orgs Referenced Metadata

9 Getting to Rich Metadata Given the future demand for rich, contextualizable metadata, and all of the traditional drawbacks… how will we achieve this future Given the future demand for rich, contextualizable metadata, and all of the traditional drawbacks… how will we achieve this future We need to look for a different model for creating and sustaining metadata and reference sources We need to look for a different model for creating and sustaining metadata and reference sources We need to teach technologies how to capture the metadata we need and how to maintain our reference sources We need to teach technologies how to capture the metadata we need and how to maintain our reference sources I’d like to show you an example of how we might achieve that future I’d like to show you an example of how we might achieve that future Please keep in mind that I’m showing you an example of what is possible – Enterprise Search, Authority Control/Entity Discovery Please keep in mind that I’m showing you an example of what is possible – Enterprise Search, Authority Control/Entity Discovery

10 Fueling Semantic Search With Metadata Or, ….if Metadata is Dead, Semantic Web and Semantic Search Are Dead

11 Flat taxonomy Hierarchical taxonomy Ring taxonomy Fielded Search = Faceted Taxonomy

12 Ring Taxonomy Network Taxonomy Metadata

13 More explicit View of faceted taxonomy

14 Building and Maintaining Taxonomies Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata generated and to more fully developing and maintaining subject headings/thesauri and classification schemes as part of a suite of categorization tools Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata generated and to more fully developing and maintaining subject headings/thesauri and classification schemes as part of a suite of categorization tools Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture Continue to work closely with subject experts to define the controlled vocabularies and classification schemes Continue to work closely with subject experts to define the controlled vocabularies and classification schemes It means that you have to have a metadata infrastructure that looks something like that ontology we just reviewed It means that you have to have a metadata infrastructure that looks something like that ontology we just reviewed There is no silver bullet ontology tool out there that will do this work for you – your knowledge and skills are critical There is no silver bullet ontology tool out there that will do this work for you – your knowledge and skills are critical

15 Metadata Capture Methods Identification/ Distinction Use Management Compliant Document Management Human Capture Programmatic Capture Inherit from System Context Extrapolate from Business Rules Search & Browse

16 Smart Use of Technologies Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy) Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy) –Oracle data classes used to represent Topic Classification scheme  hierarchical taxonomy as reference source for the attribute – Topic  used for Browse, Search, Content Syndication, Personalization –1 st challenge is to architect the hierarchy correctly  3 distinct data classes, not a tree structure with inheritance  Allows you to use the three data classes for distinct functions across systems but still enforce relationships across the classes

17 Relationships across data classes 3 Oracle Data classes

18 Topic data class

19 Subtopic Data Class

20 Subsubtopic Data class

21 Categorizing and Indexing Content Let’s look at how we’re categorizing our content to this structure automatically Let’s look at how we’re categorizing our content to this structure automatically Topic classification, geographical region assignment, keywording examples Topic classification, geographical region assignment, keywording examples Can apply this approach to any kind of content Can apply this approach to any kind of content Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional level Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the functional level Also note that we can do this across many languages Also note that we can do this across many languages

22 Semantic Analysis Using The Technologies to Best Advantage Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules engines Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules engines Teragram works in 23 languages Teragram works in 23 languages Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc. Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc. Use Concept Extraction to capture keywords Use Concept Extraction to capture keywords Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc. Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc. Use Summarization to generate a ‘gist’ of the content Use Summarization to generate a ‘gist’ of the content

23 How does semantic analysis work?

24 Semantic Analysis Basics Once you have made some sense of the sentence (decompose), reconstruct entities for information extraction (compose) Once you have made some sense of the sentence (decompose), reconstruct entities for information extraction (compose) –Identify names and other fixed form expressions – people, organizations, actions, relationships, places –Identify basic noun groups, verb groups, formatting elements, logic statements –Construct complex noun groups and verb groups –Identify event structures –Identify common elements and associate

25 Leveraging the Topic Structure Each subtopic is a knowledge domain (hierarchical taxonomy) Each subtopic is a knowledge domain (hierarchical taxonomy) Each subtopic has an extensive concept level definition (1,000 – 5,000+ concepts) Each subtopic has an extensive concept level definition (1,000 – 5,000+ concepts) Concepts are controlled vocabularies in their raw form (flat taxonomy) Concepts are controlled vocabularies in their raw form (flat taxonomy) Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network (network taxonomy) Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network (network taxonomy) Categorization tools work with topic structure & concept definitions to categorize and index content Categorization tools work with topic structure & concept definitions to categorize and index content The following screen illustrates how that same structure is embedded into Teragram profile to support categorization The following screen illustrates how that same structure is embedded into Teragram profile to support categorization

26 Subtopics Domain concepts or controlled vocabulary

27 Extensive operators allow us to write grammatical rules to manage typical semantic problems

28 Concept based rules engine allows us to define patterns to capture other kinds of data

29 Example of use of Authority Control to capture country names but extract ‘authorized’ version of country name Example of use of a gazetteer + concept extraction + rules engine to support semantic interoperability

30 Use of concept extraction + rules engine to capture Loan #, Credit #, Project ID#

31

32

33 Overview of Process & Tools ActivityApproachTools Create new facet Human review & consultation, data structures, governance Oracle DBMS, in future Metadata Repository tools (ISO 11179); Oracle representation of data classes Create new class Human review & harmonization of existing information structures; tool based discovery of new structures through clustering & extraction Teragram dynamic concept extraction using grammars, categorization, clustering; Oracle representation of data classes Create new concept Create training sets working with experts, identify & integrate existing vocabularies Teragram concept extraction, Oracle representation of values Create new relationship Human relationship creation, augmented by technological discovery Teragram clustering engine, MultiTes Thesaurus Management System, Oracle copy of thesaurus relationships Create new metadata Enterprise Profile Development with human review in some cases, no review in others; Metadata in the language of the document/content Teragram enterprise profile leveraging concept extraction, categorization, and summarizaiton

34 Enterprise Profile Development & Maintenance Enterprise Metadata Profile Concept Extraction Technology Country Organization Name People Name Series Name/Collection Title Author/Creator Title Publisher Standard Statistical Variable Version/Edition Categorization Technology Topic Categorization Business Function Categorization Region Categorization Sector Categorization Theme Categorization Rule-Based Capture Project ID Trust Fund # Loan # Credit # Series # Publication Date Language Summarization e-CDS Reference Sources for Country, Region, Topics Business Function, Keywords, Project ID, People, Organization Data Governance Process for Topics, Business Function, Country, Region, Keywords, People, Organizations, Project ID Teragram Team TK240 Client ISP IRISImageBank Factiva JOLIS E-Journals Enterprise Profile Creation and Maintenance UCM Service Requests Update & Change Requests

35 ImageBank Integration Content Capture ISP Integration Enterprise Profile Development & Maintenance XML Wrapped Metadata Dedicated Server – Teragram Semantic Engine – Concept Extraction, Categorization, Clustering, Rule Based Engine, Language Detection APIs & Integration APIs & Integration Content Capture XML Wrapped Metadata Factiva Metadata Database IRIS Integration APIs & Integration Enterprise Metadata Capture Strategy TK240 Client XML Output e-CDS Reference Sources APIs & Technical Integration Content Owners Business Analyst IDU IndexersSITRC Librarians IRIS Functional Team Enterprise Metadata Capture – Functional Reference Model

36 Impacts & Outcomes Information Access impacts Information Access impacts –Increased precision of search –Better control over recall –Searching like we talk –Exact match searching – known item searching will work better –Metadata based searching now begins to resemble full-text searching but with all the advantages of structure & context, and a significant reduction in the amount of noise Productivity Improvements Productivity Improvements –Can now assign deep metadata to all kinds of content –Remove the human review aspect from the metadata capture –Reduce unit times where human review is still used Information Quality impacts Information Quality impacts –All metadata carries the information architecture with it –Apply quality metrics at the metadata level to eliminate need to build ‘fuzzy search architectures’ – these rarely scale or improve in performance –Use the technologies to identify and fix problems with our data

37 In Progress Impacts Same methodology can be leveraged to develop a structure of lines of business, entities prominent in particular domains, relationships among entities in a domain, standard statistical variables, etc. Same methodology can be leveraged to develop a structure of lines of business, entities prominent in particular domains, relationships among entities in a domain, standard statistical variables, etc. The richer the metadata and the more fully elaborated the reference structures, the closer we come to understanding at a system level what is happening in a particular domain at any point in time The richer the metadata and the more fully elaborated the reference structures, the closer we come to understanding at a system level what is happening in a particular domain at any point in time It is this overall structure which can then be leveraged in other contexts, perhaps even a counter-terrorism context, to threshold events It is this overall structure which can then be leveraged in other contexts, perhaps even a counter-terrorism context, to threshold events Without metadata, though, no information asset can be secured but still its importance known Without metadata, though, no information asset can be secured but still its importance known Without metadata, no information is agile or mobile Without metadata, no information is agile or mobile

38 Thank You. Questions & Discussions


Download ppt "The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security."

Similar presentations


Ads by Google