Automated Traceability of Key Success Factors through Lifecycle Documents Chuck Rehberg, Chief Scientist Semantic Insights™ (A division of Trigent Software, Inc.) 20-Nov-2011
Executive Summary Problem Addressed –The need to verify high-level requirements are transformed and reflected through all levels of project documents. Solution Proposed - A system that: –Given a list of Key Success Factors and a potentially large set of documents –Generates a “Traceability Report” mapping each Key Success Factor to Specific Statements in Specific Sections in Specific Documents Solution Requirements –Domain-specific Semantic Data (Dictionary, Ontology, Experiences, Language) –List of Key Success factors stated in natural language –Set of documents New Technologies Employed (recently patented or patent pending) –Natural Language Processing (non-statistical dictionary-driven, with WSD) –Meaning maps (multiple ways of saying the same thing) –Generation of focused high-speed rules-based document “readers” –Natural language report generation
The basic process Identify Key Success Factors –Key Success Factors include natural language statements of Mission, Goals and other Requirements, often taken directly from the initial program documents. Provide Domain-specific Semantic data –This includes: Dictionary, Ontology, Experience and Language Provide access to the document corpus to be automatically read and analyzed Generate the desired report
Sample “Key Success Factors” 1.Function as one unified DoD Enterprise, creating an information advantage for our people and mission partners. 2.Provide a rich information sharing environment in which data and services are visible, accessible, understandable, and trusted across the enterprise. 3.Provide an available and protected network infrastructure (the GIG) that enables responsive information-centric operations using dynamic and interoperable communications and computing capabilities. 4.Drive the fundamental concepts of net-centricity across all mission of the Department of Defense to ensure that all applicable DoD programs, regardless of Component or portfolio, comply with the DoD net-centric vision and enable agile, collaborative net-centric information sharing.
Unpacking the Semantics of “Key Success Factors” (KSF) Each KSF statement embodies a number of semantically distinct assertions. For example from the preceding list: –“Provide a rich information sharing environment in which data and services are visible, accessible, understandable, and trusted across the enterprise.” The System unpacks this KSF statement into these basic requirements: –Environment provides information. –Information is shared. –Services are understandable across the enterprise. –Services are accessible across the enterprise. –Services are visible across the enterprise. –Services are trusted across the enterprise. –Data are understandable across the enterprise. –Data are accessible across the enterprise. –Data are visible across the enterprise. –Data are trusted across the enterprise.
Providing Domain-specific Semantic data (continuing the previous example) Beyond the normal everyday meanings, you may need to specify domain-specific semantics for these terms: –Environment –Information –“share” as in “to share information” –Services –Enterprise –understandable –accessible –visible –Data Such semantic information includes: –Linguistic metadata such as “part of speech” and usage –An Ontology specifying relevant generalizations, specializations, composition, and relationships to other concepts. Note: The following basic demo uses only the predefined English dictionary and grows the initial Ontology dynamically
High-speed machine reading Readers –The system generates special purpose high-speed readers capable of quickly “reading” a large set of documents. –The goal of the reader is to identify statements which semantically overlap each of your Key Success Factors. –Domain-specific semantic information will be used to increase the accuracy of the results of the high-speed reader. Implications and Inferences –The system further uses domain-specific knowledge to find statements that imply support for your Key Success Factors.
Select Architecture Documents to Read Architecture/ Appendix B_Draft OV-5a_IEA xxxxxxxxxxxx.doc Appendix F_Draft GIG 2.0 Alignment with DoD IEA xxxxxxxxxx.doc AV-1_Initial Draft DoD IEA xxxxxxxxxxxxx.doc CV-1 (rev1)_Initial Draft DoD IEA xxxxxxxxxxxxx.doc CV-2_Initial Draft DoD IEA xxxxxxxxxxxx.doc DoD EIEA AV-1_Vxxxxxxxxxxxx.doc Draft Activity Decomposition Overview (OV-5a)_IEA xxxxxxxxxx.pptx Draft Document Framework Description_IEA xxxxxxxxxxxxx.doc Draft IE Capabilities Taxonomy (CV-2)_IEA xxxxxxxxxxxxxxxxx.doc Draft IE Capability Vision (CV-1)_IEA xxxxxxxxxxxxxxxxx.doc Draft IE Operational Concept (OV-1)_IEA xxxxxxxxxxxxxxxxx.doc Draft Integrated Dictionary (AV-2)_IEA xxxxxxxxxxx.xlsx Draft Integrated Document_IEA xxxxxxxxxxxxxxxxxxxxxxx.doc Draft Operational Viewpoint_IEA xxxxxxxxxxxxxxxxxxxx.doc Draft Overview and Summary (AV-1)_IEA xxxxxxxxxxxxxxxxxxx.doc Draft Updated EA Compliance Req_IEA xxxxxxxxxxxxxxxxxxx.doc Operational Context Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc OV-1_Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc OV-5a_Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc OV-6a_Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc Revised EA Compliance Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc
Report on a set of Architecture Documents showing mapping within one document. (Draft Integrated Document_IEA v2_Sep Deliverable_20110916.doc)
Report on a set of Architecture Documents showing all references to a selected KSF.
PDF Report follows same format as on-line report
Generated Bibliography Bibliography [ 1 ] 1321800012561.doc. Retrieved on 11/20/2011 10:58:38, from http://192.168.2.104/DoD_IT_Source/Architecture/Draft IE Operational Concept (OV- 1)_IEA xxxxxxxxxxxxx.doc [ 2 ] 1321799657677.doc. Retrieved on 11/20/2011 10:59:15, from http://192.168.2.104/DoD_IT_Source/Architecture/Draft Integrated Document_IEA xxxxxxxxxxxxxxxx.doc [ 3 ] 1321800012569.doc. Retrieved on 11/20/2011 11:18:59, from http://192.168.2.104/DoD_IT_Source/Architecture/Draft Operational Viewpoint_IEA xxxxxxxxxxxxxxxx.doc [ 4 ] 1321799650176.doc. Retrieved on 11/20/2011 11:20:46, from http://192.168.2.104/DoD_IT_Source/Architecture/OV- 6a_Initial Draft DoD IEA xxxxxxxxxxxxxxx.doc [...]
However, the information source may refer to new terms and new concepts SIRA combines both advanced linguistics and semantics to discover and learn newly encountered items Example: using linguistic placeholders like “the unknown thing” (#?#) This Investigation: Environment provides information. Services are understandable across the enterprise. Services are accessible across the enterprise. Services are visible across the enterprise. Services are trusted across the enterprise. Data are understandable across the enterprise. Data are accessible across the enterprise. Data are visible across the enterprise. Data are trusted across the enterprise. Becomes: #?# are accessible across the #?#. #?# are trusted across the #?#. #?# are understandable across the #?#. #?# are visible across the #?#. #?# provides information. The investigation now includes anything that asserts these relationships
Applied to a Portfolio of existing Systems Portfolio/ 03a_CDD for GFM DI Incr xxxxxxxxxxxxxxxxxx.pdf 2007-07-23 xxxxxxxxx CDD.doc a4542xx.pdf a5265xx.pdf Army DIMHRS CDD xxxxxxxx.pdf DAI CDD Appendices approved xxxxxxxxxxx.pdf Document-NECC CDD xxxxxxxxxxxxx.doc DRAFT NCES CDD xxxxxxxxxx.doc Final CDD draft xxxxxxxxx.doc GCSS-A_MS B_CDD_xxxxxxxxx.doc GCSS_FoS_MA_ICD_xxxxxxxxxx.pdf GIGMAICX.pdf Joint_JET-_NN_CDD_Appendix_xxxxxxxxxxxxxxxxxxx.doc Lightweight FSP CDD xxxxxxxxxxx.doc NSWCDD-MP-xxxxxxxxxxx.pdf Unmanned Systems ICD Draft xxxxxxx.doc
Over 2000 pages of source documents become 30 pages of points relevant to the investigation with bibliography and hyperlinks
The need for more accuracy By using “the unknown thing” (#?#) you can identify items which represent the specific kinds of items desired. For example (samples from previous report): –Environment Cloud Computing SOA –Information Survival Information potential or impending attack based on intelligence law enforcement and open source information Information on security relevant events –Services NECS Services –Enterprise The GI Analytical Environment GCSS-Army –Data User Profile However #?# can also identify items which may be outside our interest. Perhaps this is not relevant: –“Program-specific assessments from this literature will provide tailored information. ”
Enhance the Ontology to increase accuracy SIRA naturally uses your investigation and information sources to automatically extend the current Ontology However, many relationships such as generalization, specialization, instantiation, and composition are often not explicitly stated in the text. These may be required to identify relevant information. By adding semantic information to the Ontology, the system automatically extends the subsequent semantic research to include these concepts/terms (and their synonyms) where appropriate. In short, you can introduce a concept and corresponding terms, define the relevant semantic relationships and linguistic metadata, and begin using the term in your investigation right away.
Reporting and Queries As a result of machine reading, the information source documents are semantically index relative to the investigation. The system supports interactive queries of this semantic index. Report templates and content can be dynamically defined to render the results the query results. All semantic information can be exported “Key Success Factors” are just one example of an investigation. There is no restriction on the nature and content of an investigation.
24 Who we are: –Semantic Insights is the R&D division of Trigent Software, Inc. www.trigent.comwww.trigent.com –We focus on developing semantics-based information products that produce high-value results serving the needs of general users requiring little or no training. –Visit us at www.semanticinsights.comwww.semanticinsights.com Who we are
25 Chuck Rehberg As CTO at Trigent Software and Chief Scientist at Semantic Insights, Chuck Rehberg has developed patented high performance rules engine technology and advanced natural language processing technologies that empower a new generation of semantic research solutions. Chuck has more than twenty five years in the high-tech industry, developing leading-edge solutions in the areas of Artificial Intelligence, Semantic Technologies, analysis and large –scale configuration software.