EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation August.

Slides:



Advertisements
Similar presentations
CPIC Training Session: Enterprise Architecture
Advertisements

ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira.
Systems Analysis and Design Feasibility Study. Introduction The Feasibility Study is the preliminary study that determines whether a proposed systems.
1 Knowledge Management Session 4. 2 Objectives 1.What is knowledge management? Why do businesses today need knowledge management programs and systems.
Enterprise Web Content Management Path to developing a Competency Center Presented To: Presented By: Gilbane ConferenceBrian VanDeventer IT Manager, Web.
Chapter 3 Database Management
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
CATEGORIES OF INFORMATION There are three main categories of business information,and these are related to the purpose for which the information is utilized.
Environmental Terminology System and Services (ETSS) June 2007.
Fundamentals of Information Systems, Second Edition 1 Organizing Data and Information Chapter 3.
Systems Analysis & Design Sixth Edition Systems Analysis & Design Sixth Edition Toolkit Part 2.
Lecture Fourteen Methodology - Conceptual Database Design
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Data Modeling Introduction. Learning Objectives Define key data modeling terms –Entity type –Attribute –Multivalued attribute –Relationship –Degree –Cardinality.
Methodology Conceptual Database Design
Asset management guidelines
Association of Enterprise Architects International Committee on Enterprise Architecture Standards Building an Enterprise Architecture (EA) Knowledge Digest.
Tool support for Enterprise Architecture in System Architect Architecture Practitioners Conference, Brussels David Harrison Senior Consultant, Popkin.
Enterprise Architecture
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Developing Enterprise Architecture
GSIM Stakeholder Interview Feedback HLG-BAS Secretariat January 2012.
Understanding Data Analytics and Data Mining Introduction.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
ITEC224 Database Programming
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
@ ?!.
Copyright C.M. Mitchell Consulting 2005 Taxonomy 101 – Why is it so Important? Presented by: Carol Mitchell.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
August 3, 2010ETDD Architecture GroupPage 1 Enforcement Targeting & Data Division (ETDD) Architecture Scope, Accomplishments, Challenges.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Search Update April 1-3, 2009 Joshua Ganderson Laura Baalman.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Methodology: Conceptual Databases Design
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Conceptual Database Design
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Data Mining By Dave Maung.
Methodology - Conceptual Database Design
Data mining. Data mining, at its core, is the transformation of large amounts of data into meaningful patterns and rules.
Business Process Modeling for EPLC Angela Thomas Ryan Kahn
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
1 Value of Taxonomies in Knowledge Management Joe Schehr VP Knowledge Management and Technology Solutions LexisNexis.
Requirements Engineering Process
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
1 Pioneer Investments Legal and Compliance System Assessment Weekly Status Update June 23, 2005.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 10 Structuring.
Aligning Business Process Architecture and Enterprise Architecture: A Model Driven - Service Oriented Approach Chris Capadouca Business Solutions Architect.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
 The processes used for RE vary widely depending on the application domain, the people involved and the organisation developing the requirements.  However,
Managing Enterprise Architecture
Methodology Conceptual Databases Design
GEA CoP DRM Briefing for July 13 Meeting with Andy Hoskinson
Methodology Conceptual Database Design
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Achieving an Operational Office of Water Enterprise Architecture: FY Roadmap November 23, 2005 Achieving an Operational Office of Water Enterprise.
Methodology Conceptual Databases Design
EA Framework TOGAF is a framework - a detailed method and a set of supporting tools - for developing an enterprise architecture.
Presentation transcript:

EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation August 17, 2004

2 Purpose Describe an alternate method of “jumpstarting” EA baseline discovery using knowledge discovery tools and techniques. Discuss how to use knowledge discovery techniques to: "Unlock" EA knowledge buried in existing knowledge repositories (such as an agency website or intranet) Extract this knowledge into EA models Publish the resulting models to a GOTS or COTS EA repository product (such as EAMS, Popkin SA, Adaptive, or Metis) for further processing.

3 Agenda What is EA Knowledge Discovery? Why is this capability important? How do I implement this capability? Process Tools What are some of the limitations of automated EA Knowledge Discovery?

4 What is “EA Knowledge Discovery?” Knowledge Discovery: The “non-trivial extraction of implicit, unknown, and potentially useful information from data” * Identifies and extracts trends and patterns from data, and transforms them into useful and understandable information EA Knowledge Discovery: "Unlocking" EA knowledge buried in existing information collections (such as an agency website or intranet) Deriving EA models from unstructured or semi-structured text The resulting models can then be published to a GOTS or COTS EA repository product (such as EAMS, Popkin SA, Adaptive, or Metis) for further processing. * Definition obtained from

5 Why is This Capability Important? Assists in automating labor-intensive EA baseline discovery effort Decreases EA baseline discovery level-of-effort from staff-months to staff- weeks Leverages EA - related information buried in existing information sources (e.g., website or intranet) Reduces the need for "data calls" and face-to-face data collection interviews. Frees up your EA budget for more strategically important activities, e.g.: Model validation Target architecture development Gap analysis and migration planning Governance activities Relatively straightforward to implement: Numerous COTS products exist to support this capability

6 How Do I Implement This Capability? Step 1: Identify Suitable Info Source Step 2: Extract and Index Concepts Step 3: “Connect the Dots” between Related Concepts Step 4: “Tag” Concepts Using an EA Metamodel Step 5: Publish to an EA Repository Step 6: Review, Edit, and Validate

7 Step 1: Identify a Suitable Information Source… …Containing EA – related data, e.g.: Business areas, functions, processes, and events Business operating units, locations, stakeholders, and key personnel Important work products and data Information systems and technology Appropriate sources include: Agency website Enterprise portal or Intranet Example: PA PowerPort website and eGovernment portal

8 Step 2: Extract and Index Concepts Crawl the information resource(s) in question (e.g., website or intranet) using a "spider" Retrieve all documents For each document, build a concept index by parsing its text into a vector of phrases. Save the concept index to persistent storage (e.g., a database or knowledge base) Example: Concept index for PA PowerPort website, showing concepts, document frequencies, and term frequencies.

9 Step 3: “Connect the Dots” between Related Concepts Infer relationships between concepts using a process of concept correlation Concept correlation: percentage of documents in which two concepts co- occur Threshold established to determine whether strong relationship exists (i.e., > 90% concept correlation == EA model association) With entities (concepts) and relationships established, we now have the preliminary makings of a model Example: Concept index for PA PowerPort website, organized into a hierarchical model (taxonomy).

10 Step 4: “Tag” Concepts Using an EA Metamodel Programmatically tag each concept as an instance of an EA metamodel class Programmatically enforce constraints required by the metamodel Numerous techniques: Bayesian text classification Custom vocabularies (dictionaries, thesauri, etc.) Keyword – based similarity coefficients (e.g. Dice, Jaccard, cosine, etc.) Example: PA PowerPort concepts tagged as EA metamodel types.

11 Step 5: Publish to an EA Repository Populate EA repository with tagged concepts and associations using the appropriate data access API Popular EA repository products include: Metis Adaptive EAMS Popkin System Architect Example: An EAMS repository populated with EA models constructed from the PA PowerPort tagged concept index.

12 Step 6: Review, Edit, and Validate Review the resulting EA models for completeness, consistency, and accuracy Revise and edit as needed Validate completed models with stakeholders Example: A UML class diagram constructed from the tagged PA PowerPort concept index, opened for editing in Rational XDE.

13 COTS Vendors Providing Knowledge Discovery Tools and Capabilities Autonomy ClearForest Convera Inxight Stratify Verity

14 What are the Limitations of this Technology? Not a silver bullet: Labor investment still required to review, edit, and validate extracted models Knowledge discovery technology is usually fairly expensive to purchase and operate Knowledge discovery products require more “tuning” than one might think (e.g., maintaining stop word lists, etc.) This technique works best when used at the initial stages of an EA baseline discovery effort to help “jumpstart” the process

For More Information… Please contact Andy Hoskinson at