Enterprise Taxonomies – Finding the LCD to Support Interoperability

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Intelligence Step 5 - Capacity Analysis Capacity Analysis Without capacity, the most innovative and brilliant interventions will not be implemented, wont.
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Database Systems: Design, Implementation, and Management Tenth Edition
Enterprise Taxonomies - Context, Structures & Integration Presentation to American Society of Indexers Annual Conference – Arlington Virginia – May 15,
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Requirements Analysis Concepts & Principles
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
FREMA: e-Learning Framework Reference Model for Assessment Design Patterns for Wrapping Similar Legacy Systems with Common Service Interfaces Yvonne Howard.
1 ECCF Training 2.0 Introduction ECCF Training Working Group January 2011.
Managing Records in 21st Century Stories from the World Bank Group.
Information Architecture Web Design – Sec 2-5 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course.
Enterprise Architecture
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
1 Developing an Ontolog Ontology Denise A. D. Bedford April 13, 2006.
The Future of Metadata Denise Bedford World Bank Presentation to Fall Metadata Forum November 2, 2005 Department of Homeland Security.
Get More Value from Your Reference Data—Make it Meaningful with TopBraid RDM Bob DuCharme Data Governance and Information Quality Conference June 9.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
The Challenge of IT-Business Alignment
Towards International Alignment Introduction to the Canadian Health Data Model and potential contribution to HL7 Harmonization processes.
Profiling Metadata Specifications David Massart, EUN Budapest, Hungary – Nov. 2, 2009.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
SOFTWARE DESIGN.
Interfacing Registry Systems December 2000.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
1 Everyday Requirements for an Open Ontology Repository Denise Bedford Ontolog Community Panel Presentation April 3, 2008.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Information Architecture The Open Group UDEF Project
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Enterprise Architectures Course Code : CPIS-352 King Abdul Aziz University, Jeddah Saudi Arabia.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
EI Architecture Overview/Current Assessment/Technical Architecture
Information Architecture
Managing Records in 21st Century
Information Architecture
Database Design Hacettepe University
Presentation transcript:

Enterprise Taxonomies – Finding the LCD to Support Interoperability Denise A. D. Bedford, Ph.d. Senior Information Officer Information Solutions Group World Bank

Making sense of all the terms What do we mean by an ontology? What do we mean by a taxonomy? What do we mean by semantics and how do they relate? What is architecture, and what kinds of architectures do you need to think about? How does the approach you take impact your ability to achieve interoperability?

Context Before I start down this path, please let me provide a little context for what we are trying to do, why, and how we are trying to do it You have all heard about the Bank’s experiences in knowledge management What you may not have heard about, though, is the primary lesson we learned in that initiative – that the basic component of any KM environment is an enterprise-wide integrated information foundation This support a KLE, repurposing of content…. Disclosure Policy implementation

Big Picture Enterprise Functional Architecture Site Specific Searching Publications Catalog World Bank Catalog/ Enterprise Search Recommender Engines Personal Profiles Portal Content Syndication Browse & Navigation Structures Metadata Repository Of Bank Standard Metadata Reference Tables Topics, Countries Document Types Transformation Rules Data Governance Bodies Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract IRIS Doc Mgmt System Web Content Mgmt. Metadata Board Documents Metadata IRAMS Metadata JOLIS Metadata InfoShop Metadata Concept Extraction, Categorization & Summarization Technologies

How to get there What approach should we take? Many business processes with entirely appropriate business systems supporting them 10,000 experts working in 30 subject domains, performing work in 30 business lines, generating all kinds of content, in at least six languages, 60 years of knowledge in various formats – most of it print Will an ontology work? Probably, but what does it look like?

Ontology - Definitions ..in the context of knowledge sharing, an ontology means a specification of a conceptualization …a description or specification of the concepts and relationships that can exist for an agent or a community of agents …set of concept definitions, or definitions of a formal vocabulary …ontological commitment is an agreement to use a vocabulary in a way that is consistent with the specifications

Our interpretation of definition Architecture that can be developed to define the structural aspects of a domain – complex architecture that supports resource description, categorization/classification, and semantics Semantics pertains to language, including the sense-making, lexical, morphological aspects Semantics are sometimes confused with information architecture with the result being suboptimal information, functional, technical and presentation architectures

Facet as Core Component Domain Lexicon or Thesaurus Topic Classification Languages Lines of Business Classification

Taxonomies, ontologies, topics, oh my! What do you think when you hear the word, taxonomies? A hierarchy…. Partially correct, but not completely correct There are four types of taxonomies, but one typically overshadows the others A quick review of taxonomies

Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Review of Basic Concepts & Approaches Essential concepts include: content metadata & metadata repositories navigation architectures search architectures portal architectures Information architectures in portals and web environments are increasingly complex, integrating technologies to deliver value to users. Seamless integration is the goal. Taxonomies are critical data structures that help us to build a sustainable foundation, to integrate efficiently and to manage the complexity. Whole information architecture is more adaptable and sustainable over time if it adheres to established standards and best practices. Whole Information Architecture is more cost effective if new technologies currently being researched or under development can be integrated without re-engineering the base. Like the content and processes they support, each component of a portal has a life cycle. Architectures Semantics

Facet Taxonomy Architecture Faceted taxonomy architecture looks like a star. Each node in the star structure is associated with the object in the center.

Flat Taxonomy Architecture Energy Environment Education Economics Transport Trade Labor Agriculture

Hierarchical Taxonomy Architecture A hierarchical taxonomy is represented as a tree architecture. The tree consists of nodes and links. The relationships become ‘associations’ with meaning. Meanings in a hierarchy are fairly limited in scope – group membership, Type, instance. In a hierarchical taxonomy, a node can have only one parent.

Critical Distinction It is critical to understand the difference between a classification hierarchy which is used to build collections of objects that fit the definition of a class, and concept hierarchies which build broader/narrower relationships between concepts. The reason for this will be clear as we walk through the different build approaches - it impacts the ‘edge’ condition for domain contextuality

Network Taxonomy Architecture A network taxonomy is a plex architecture. Each node can have more than one parent. Any item in a plex structure can be linked to any other item. In plex structures, links can be meaningful & different.

Critical Distinction It is critical to understand where you have the ability to define the meaning of the linkages or relationships between concepts in this structure Linkages carry different meanings – the most primitive of which is statistical co-occurrence If you are using semantic engines that can expose verb groups (links) as well as noun groups (nodes) and begin to categorize the type of VG’s associated with distinct NG’s, then you are taking the first step towards sense making If you use only Bayesian or regression analysis, instead of building meaning into links over time, you will shift the meaning each time you add content

Are there gaps or lapses when we move from theory into practice? Beyond basics…. Are there gaps or lapses when we move from theory into practice?

So, how do you build an ontology or an enterprise content architecture? Does it matter where you begin? What do you mean by the ‘top’ of the ontology? Is it the ‘top’ or starting point in the interface? Or, by ‘top’ do you mean the core components in the overall architecture? How will the approach you select facilitate or constrain your ability to grow the ontology over time? Let’s look at a typical approach taken around country today

Does where you start matter? How does your decision of ‘top’ or ‘core’ impact interoperability later? What happens when you start with categorization or classification (hierarchy)? What happens when you start with faceted description? What happens when you start with network structure?

Starting with a hierarchy,… This is how most ontology builds begin today Most of the ontologies I have seen begin with the programmatic topical clustering of concepts – different than classification of content Clustering of concepts is based on statistical modeling of single word concepts that appear in a training set of documents rather than noun-phrases and verb-phrase decomposition/recomposition

Starting with a hierarchy,… Apply morphological rules to compensate after clustering for co-occurrence of words that are actually multiword concepts Concepts may or may not be contextualized - identified as the type of concept The cluster structure can be shallow or deep depending on statistical associations Clusters are progressively refined through statistical concentration

Topic Hierarchies – Two Actions Creating Structures & Classifying Content Topic classification structures are: created dynamically using statistical models applied to electronic content structures can be created by humans with content dynamically associated with the structure humans can create structures and define the rules for associating content with each class

Creating a Faceted Classification Structure When the structure becomes more than two dimensions, ie transforms from a hierarchy to a faceted taxonomy, how do you meet the application level requirements without (1) compromising the hierarchy, or (2) building extensive and unique code to create the new application layer? What is the blue classes and facets all represented policy statements – how would the user find them?

What happens when you add in the Real Semantics? Morphological Rule & Sense Problems… Semantics describing different aspects of policies When you add the Real Semantics into the structure, you’re creating a semantic network structure at each classification node, depending on how you defined your classes, and potentially for each facet attached to a node. In a faceted structure, you may need to apply the semantic network to common facets that are attached to different nodes, at different levels of the classification scheme, or miss some facets that logically use the semantics

Taking that approach,…. You’ve built an architecture that is potentially confounded and disjointed at all levels – how will you build better contextual search system? You’re applying the solution to content at multiple levels in the architecture – defies the OHIO principle from even a system perspective The architecture shifts each time you add new content – users have to relearn the interface everytime they use it

Taking that approach,…. You’ve excluded content that has electronic surrogates, but is only available in print – domain information value varies – this could be a significant gap Information, functional, technical and presentation architecture are too tightly connected – no flexibility to design open and efficient technical architecture over time, to repurpose information architecture or design presentation architecture for different audiences If you used one tool to built this complex architecture and a better component comes along, how much value will you lose?

Taking hierarchy approach,…. Best you can do is to throw a full-text search engine against all of the architecture and the content – this results in complete loss of context Rather than leveraging domain lexicons which are already well defined for established domains, or can easily be developed for emerging domains, we revert back to Kindergarten grammars Maintenance and use are a nightmare at all levels Multilingual access requires either an exponentially complex (to do it well) or a simple (and inherently suboptimal) solution

Starting with Network Structure This means starting at the very bottom – preferably at the concept level, but too often only at the word level This approach is used to dynamically discover ‘clusters’ described earlier. You can use this approach to discover clusters which can be converted into classes However, if you use this approach to try to discover classes across domains – a simple statistical approach will surface what is statistically significant (the very generic concepts across all domains) and will submerge the domain specific concepts

Beginning at the Concept Level – Network Structures What happens to structure when you add new documents? What happens to concepts when you move old documents out of current index structures? Or when a new concept is introduced? How do you apply and manage this structure explicitly? How do you integrate it into an existing architecture?

End Game Strategy These two approaches assume navigation – and it assumes that we can present more or deeper paths whenever a problem arises, or it assumes that we push all of the options to the interface for selection This approach assumes that the architecture is explicit – what is implicit and where is it in the overall design? Taking this approach is an efficient system application and maintenance nightmare Sure, you can find a way to solve problems at each level or each extension, however you end up with something that is potentially worse than what we used to call ‘spaghetti code’

Starting with Faceted Structure Facet as core enables multiple ‘top’ and ‘deep’ ontologies, in effect builds the structure based on the access points for all kinds of users Look at your users and define the End Game… Individual citizen looking for IRS information that pertains to her situation Homeland Security analyst looking for linkages across events, products, names, countries (each is a facet that provides a context) DoD Program Officer and DARPA Program Officer looking for proposed or funded programs with common elements Dept. of Interior looking for successful techniques to maintain brush growth in Western forests

Facet as Core Component Domain Lexicon or Thesaurus Topic Classification Languages Lines of Business Classification

Taking facet approach,…. Anchor the basic description framework on objects – any and all kinds of objects Allows you to extend the architecture up and down – from ‘whole’ to ‘components within a container’ to creating ‘containers of wholes’ Allows you to integrate electronic & print content Allows you to apply class scheme values to objects, but maintain them as distinct sources outside of a single application Allows you to develop deep domain semantics and apply at facet level – again maintain as distinct sources outside of a single application

Back to Interoperability Which of these approaches has a better chance of achieving true interoperability in dynamic environments? Which of these approaches has a better chance of fitting into and sustaining an Enterprise Architecture to achieve integration across source systems – at any level? Can we separate the information, functional, technical and presentation architectures to get there? Key question is how do you anchor your ontology, or your enterprise taxonomy structure?

Hierarchy Network Facet Recommended Model Description Recommendation Hierarchy Begin with a structure with constraints and intuitive meanings for users – major constraint to interoperability with impact to usability Complex coding and technical architecture required to support multiple access points for users – access point proliferation across domains increases complexity Can add facets at different class levels, but users become confused when explicitly implemented While it may appear to pull together like content at high level it can scatter content very quickly walking down the hierarchy Impacts business systems not recommended Network Basic approach surfaces commonality across domains which may be the most generic of concepts Cannot effectively be explicitly implemented across domains Too detailed for explicit implementation without a context Is best managed with overall classification scheme when implemented across domains Facet Supports extensive reference description of all kinds of objects Supports interoperability at the access point level Supports interoperability or distinction of classification Supports integration and harmonization at the network level No impact to business system Recommended

Interoperable Enterprise Functional Architecture Site Specific Searching Publications Catalog World Bank Catalog/ Enterprise Search Recommender Engines Personal Profiles Portal Content Syndication Browse & Navigation Structures Metadata Repository Of Bank Standard Metadata Reference Tables Topics, Countries Document Types Transformation Rules Data Governance Bodies Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract IRIS Doc Mgmt System Web Content Mgmt. Metadata Board Documents Metadata IRAMS Metadata JOLIS Metadata InfoShop Metadata Architecture Interoperability Semantic Interoperability

Technical Architecture for Interoperability Content Contributor End User Content Systems DELIVERY Metadata Management and Security Services ePublish PDS …. access rules Content Access Services Content Management Services view multilingual srch workflow create/del. check in/out retention schedule search syndication versioning declare Metadata browsing notification Business Activity Topic Class Scheme Content Integration and Archives Services relate Connector Concept extraction rules evaluator harmonize Adapter thesaurus Series Names monitors SAP (R/3, BW) Notes / Domino Archives Store Over Time Documents, Images, Audio, Data records Metadata warehouse logs People Soft iLAP Repositories Services Business Systems

Metadata as Faceted Taxonomy Identification/ Distinction Search & Browse Compliant Document Management Use Management Flat Taxonomy Hierarchical Taxonomy Network Taxonomy Faceted Taxonomy

Understand Behavior of facets Each facet in the enterprise structure may be understood to be an ontology There is an architecture, there are semantics – we have commitments to use both the architecture and the semantics Implementing the commitments, though, requires a different strategy for each facet – depending on how it behaves Back to taxonomies for a minute….

Types of Tools to Use Concept Extraction Classification & Clustering Identification/ Distinction Search & Browse Compliant Document Management Use Management Pattern Matching & Rule Based Capture Classification Concept Extraction & Clustering

Given your End Game,… What kind of interoperability do you need? What kind of an enterprise architecture do you need to support it? What do you have to work with? What kinds of tools do you need to build it? What kinds of tools do you need to sustain it?