Presentation is loading. Please wait.

Presentation is loading. Please wait.

EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation August.

Similar presentations


Presentation on theme: "EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation August."— Presentation transcript:

1 EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation andy.hoskinson@unisys.com August 17, 2004

2 2 Purpose Describe an alternate method of “jumpstarting” EA baseline discovery using knowledge discovery tools and techniques. Discuss how to use knowledge discovery techniques to: "Unlock" EA knowledge buried in existing knowledge repositories (such as an agency website or intranet) Extract this knowledge into EA models Publish the resulting models to a GOTS or COTS EA repository product (such as EAMS, Popkin SA, Adaptive, or Metis) for further processing.

3 3 Agenda What is EA Knowledge Discovery? Why is this capability important? How do I implement this capability? Process Tools What are some of the limitations of automated EA Knowledge Discovery?

4 4 What is “EA Knowledge Discovery?” Knowledge Discovery: The “non-trivial extraction of implicit, unknown, and potentially useful information from data” * Identifies and extracts trends and patterns from data, and transforms them into useful and understandable information EA Knowledge Discovery: "Unlocking" EA knowledge buried in existing information collections (such as an agency website or intranet) Deriving EA models from unstructured or semi-structured text The resulting models can then be published to a GOTS or COTS EA repository product (such as EAMS, Popkin SA, Adaptive, or Metis) for further processing. * Definition obtained from http://www.acm.org/crossroads/xrds5-2/kdd.html

5 5 Why is This Capability Important? Assists in automating labor-intensive EA baseline discovery effort Decreases EA baseline discovery level-of-effort from staff-months to staff- weeks Leverages EA - related information buried in existing information sources (e.g., website or intranet) Reduces the need for "data calls" and face-to-face data collection interviews. Frees up your EA budget for more strategically important activities, e.g.: Model validation Target architecture development Gap analysis and migration planning Governance activities Relatively straightforward to implement: Numerous COTS products exist to support this capability

6 6 How Do I Implement This Capability? Step 1: Identify Suitable Info Source Step 2: Extract and Index Concepts Step 3: “Connect the Dots” between Related Concepts Step 4: “Tag” Concepts Using an EA Metamodel Step 5: Publish to an EA Repository Step 6: Review, Edit, and Validate

7 7 Step 1: Identify a Suitable Information Source… …Containing EA – related data, e.g.: Business areas, functions, processes, and events Business operating units, locations, stakeholders, and key personnel Important work products and data Information systems and technology Appropriate sources include: Agency website Enterprise portal or Intranet Example: PA PowerPort website and eGovernment portal

8 8 Step 2: Extract and Index Concepts Crawl the information resource(s) in question (e.g., website or intranet) using a "spider" Retrieve all documents For each document, build a concept index by parsing its text into a vector of phrases. Save the concept index to persistent storage (e.g., a database or knowledge base) Example: Concept index for PA PowerPort website, showing concepts, document frequencies, and term frequencies.

9 9 Step 3: “Connect the Dots” between Related Concepts Infer relationships between concepts using a process of concept correlation Concept correlation: percentage of documents in which two concepts co- occur Threshold established to determine whether strong relationship exists (i.e., > 90% concept correlation == EA model association) With entities (concepts) and relationships established, we now have the preliminary makings of a model Example: Concept index for PA PowerPort website, organized into a hierarchical model (taxonomy).

10 10 Step 4: “Tag” Concepts Using an EA Metamodel Programmatically tag each concept as an instance of an EA metamodel class Programmatically enforce constraints required by the metamodel Numerous techniques: Bayesian text classification Custom vocabularies (dictionaries, thesauri, etc.) Keyword – based similarity coefficients (e.g. Dice, Jaccard, cosine, etc.) Example: PA PowerPort concepts tagged as EA metamodel types.

11 11 Step 5: Publish to an EA Repository Populate EA repository with tagged concepts and associations using the appropriate data access API Popular EA repository products include: Metis Adaptive EAMS Popkin System Architect Example: An EAMS repository populated with EA models constructed from the PA PowerPort tagged concept index.

12 12 Step 6: Review, Edit, and Validate Review the resulting EA models for completeness, consistency, and accuracy Revise and edit as needed Validate completed models with stakeholders Example: A UML class diagram constructed from the tagged PA PowerPort concept index, opened for editing in Rational XDE.

13 13 COTS Vendors Providing Knowledge Discovery Tools and Capabilities Autonomy ClearForest Convera Inxight Stratify Verity

14 14 What are the Limitations of this Technology? Not a silver bullet: Labor investment still required to review, edit, and validate extracted models Knowledge discovery technology is usually fairly expensive to purchase and operate Knowledge discovery products require more “tuning” than one might think (e.g., maintaining stop word lists, etc.) This technique works best when used at the initial stages of an EA baseline discovery effort to help “jumpstart” the process

15 For More Information… Please contact Andy Hoskinson at andy.hoskinson@unisys.com


Download ppt "EA Knowledge Discovery Deriving EA Models from Unstructured and Semi-Structured Text Andy Hoskinson, Unisys Corporation August."

Similar presentations


Ads by Google