Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan.

Slides:



Advertisements
Similar presentations
Lecture # 2 : Process Models
Advertisements

Multimedia Specification Design and Production 2013 / Semester 1 / week 7 Lecturer: Dr. Nikos Gazepidis
Chapter 2 – Software Processes
Information Systems Analysis and Design
CH02: Modeling the process and life cycle Process of developing software (organization and discipline in the activities) contribute to the quality of the.
Object-Oriented Analysis and Design
Information Retrieval in Practice
Requirements Specification
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition.
APPLICATION DEVELOPMENT BY SYED ADNAN ALI.
Copyright 2004 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Second Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
SDLC and Related Methodologies
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Overview of Search Engines
Copyright 2001 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter 1 The Systems.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.
Java Programming, 2E Introductory Concepts and Techniques Chapter 1 An Introduction to Java and Program Design.
UML - Development Process 1 Software Development Process Using UML (2)
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Lesson 7 Guide for Software Design Description (SDD)
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
CHAPTER TEN AUTHORING.
Copyright 2002 Prentice-Hall, Inc. 1.1 Modern Systems Analysis and Design Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 1 The Systems Development.
Lecture 7: Requirements Engineering
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
Systems Analysis and Design in a Changing World, Fourth Edition
Chapter 2 – Software Processes Lecture 1 Chapter 2 Software Processes1.
MedKAT Medical Knowledge Analysis Tool December 2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Methodology Review Chapter 7 Part 2: Design Methodology Object-Oriented Modeling and Design Byung-Hyun Ha
Introduction to OOAD and UML
Faculty Economics & Business EBS 2033 Systems Development Lecture 1 The Systems Development Environment Lecturer: Puan Asleena Helmi.
Information Retrieval in Practice
Chapter 1 The Systems Development Environment
Chapter (12) – Old Version
An Overview of Requirements Engineering Tools and Methodologies*
Chapter 1 The Systems Development Environment
Chapter 1: Introduction to Systems Analysis and Design
Search Engine Architecture
Modern Systems Analysis and Design Third Edition
Chapter ? Quality Assessment
Unified Modeling Language
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Gestion de la documentation pendant le cycle de développement
Modern Systems Analysis and Design Third Edition
Chapter 20 Object-Oriented Analysis and Design
Need for the subject.
Modern Systems Analysis and Design Third Edition
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
Chapter 1: Introduction to Systems Analysis and Design
Members: Keshava Shiva Sanjeeve Kareena
Chapter 1 The Systems Development Environment
Chapter 1: Introduction to Systems Analysis and Design
Practical Database Design and Tuning Objectives
Software Architecture & Design
Presentation transcript:

Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Agenda: Objective Project overview Design Principles Technology Stack Approach and Methodology Execution Framework Modules Covered Results

Objective The objective of this presentation is to understand the nuances of converting existing software documentation to an intelligent knowledge representation

Project Overview: Background Traditional development, deployment & maintenance of conventional software applications require higher quality with shorter time to market cycles to reap the benefits of customer delight. This involves a formal, explicit and conventional representation of the knowledge base shared across stakeholders Existing SDLC documents do not cater to any intelligent extraction and interpretation either for downstream applications or enhancements. There is a growing need for effective and efficient utilization of software artifacts to deliver enhanced traceability to changing future needs.

Challenges in the existing systems More than 90% of existing software documentation is in the form of text Knowledge Engineers create knowledge representations from the scratch making reuse and enhancements difficult to existing representations Existing Knowledge representation techniques require domain knowledge and have a steep learning curve. Difference in conceptualization of the domain model leads to inconsistencies in its representation

Design Principles Open Close Principle Software entities like classes, modules and functions should be open for extension but closed for modifications. Dependency Inversion Principle High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions.

Design Principles Contd.. Single Responsibility Principle A class should have only one reason to change. Liskov's Substitution Principle Derived types must be completely substitutable for their base types.

Technology Stack The architecture followed is a 2 tier architecture. Front-End : Java Back-end : Files

Development Hardware Processor: Intel(R) Core™ 2 Duo CPU 2.00 GHZ Memory(RAM) : 4 GB System type: 32-bit Operating System Tools used CoreNLP – Stanford package for Natural Language Processing(NLP) ConExp - Open Source for creation of Formal Concept Lattice.

Approach and Methodology Software prototyping (Incremental prototyping) methodology is used for development. The final product is built as separate prototypes. At the end the separate prototypes are merged in an overall design Steps are: a) Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d) Revision and Enhancement of the Prototype

Overall Architecture

Modules covered 1.Part Of Speech Tagging (POS) using a Maximum Entropy based Tagger algorithm 2.Lemmatization to reduce the relevant terms extracted by POS Tagging to their Lemma forms. 3.Named Entity Resolution(NER) using Conditional Random Fields(CRF) with Gibbs sampling for entity identification & extraction. 4. Parsing to determine the grammatical structure w.r.t Formal Parsed Grammar using a Factored m odel.

Modules covered contd…. 5. Co-reference Resolution by using tiers of deterministic models to determine the relative importance of different terms. 6. Querying and Manipulation of Natural Language Text 7. Formal Concept analysis to derive the relationship between the attributes & the objects and also between attributes 8.Conversion of formal concept lattice to XML for extraction of Knowledge representation.

Input Sources Software Engineering documents that are part of MIL STD 498 Software Development Standard are used as input consisting of: Computer Operation Manual (COM) Computer Programming Manual (CPM) Database Design Description (DBDD) Firmware Support Manual (FSM) Interface Design Description (IDD) Interface Requirements Specifications (IRS) Operational Concept Description (OCD) Software Centre Operator Manual(SCOM) Software Design Description (SDD) Software User Manual (SUM) Software Version Description (SVD)

Input Sources Contd.. Software Development Plan (SDP) Software Input/ Output Manual (SIOM) Software Installation Plan (SIP) Software Product Specification (SPS) Software Requirements Specification (SRS) System/Subsystem Design Description System/Subsystem Specification Software Test Description (STD) Software Test Plan Software Test Report (STR) Software Transition Plan (STrp)

Algorithm Step 1 : Tagger 1= POS_Tagging_Function(SRS ) Tagger 2= POS_Tagging_Function(SDD ) Tagger 3= POS_Tagging_Function(STD) Step 2: Lemma_Form1 = Lemma_construction(Tagger1) Lemma_Form2 = Lemma_construction(Tagger2) Lemma_Form 3= Lemma_construction(Tagger3) Step 3: NER1 =CRF_Gibbs_Function(Lemma_Form1 ) NER2 =CRF_Gibbs_Function(Lemma_Form2 ) NER3 =CRF_Gibbs_Function(Lemma_Form3 ) Step 4: Parse1 = Parser(NER1) Parse2 = Parser(NER1) Parse3 = Parser(NER1)

Input Sources Contd.. Step 5: CoRef1 = Coreference_Resolution(Parse1) CoRef2 = Coreference_Resolution(Parse2) CoRef3 = Coreference_Resolution(Parse3) Step 6: TREE_NODE= Query_Manipulation_function(CoRef1, CoRef2, CoRef3) Step 7: Concept_Lattice= FCA (context, concept,TREE_NODE) Step 8: XML_DOC = XML_Convert(Concept_Lattice)

Implementation Steps The algorithm is mapped to the following series of steps: Collection of existing software documents a) Software Requirements Specification(SRS) This document contains a set of use cases that describe system – user interaction & non functional requirements as design constraints and quality standards. b) Software Design Document (SDD) The SDD shows how the software system will be structured to represent software components, interfaces, and data necessary for the implementation phase. c) Software Testing Document (STD) It specifies the form of a set of documents for use in different stages of software testing

Implementation Steps contd… Extraction of relevant knowledge from the SRS, SDD, SDT by using a sequence of natural language processing steps as follows: POS tagging Lemmatization Named Entity Resolution Syntactic Parsing Coreference Resolution Input: SRS, SDD, STD Output: Annotated Text Corpora

Annotated SRS

Annotated SDD Annotated STD

Implementation Steps contd… Querying and Manipulation of annotated text corpora and conversion to tree data structures This step uses query manipulation tools to extract the relevant knowledge from the annotated text corpora. The verb subject, object and PP complement pairs are extracted and the syntactic dependencies between verb subject – verb- verb object and verb- PP complement are exploited to derive a meaningful hierarchical relationship Input: Annotated SRS, SDD, STD Output: Tree Data Structure Representation

Implementation Steps contd… Formation of Concept Lattice using Formal Concept Analysis The hierarchical information and syntactic dependencies obtained by NLP gives a relationship between the set of verbs that act as objects and the verb-subject, verb-object & verb-PP Complement act as the set of attributes. This relationship is written in the form of a matrix given as input to ConExp that transforms the matrix to a concept lattice. Input: Tree Data structure Representation Output: Formal Concept Lattice

Formal Concept Lattice

The top most element indicates the object that has no attributes The bottom most element indicates the object that has all attributes. The node in blue indicates the objects The node in orange depicts the attributes

Implementation Steps contd… Conversion of formal concept lattice to XML The set of all attributes and their values is extracted for each object. This provides an intermediate representation of the Concept hierarchy before it is transformed to a knowledge representation. Input: Formal Concept Lattice Output: XML Format

Implementation Steps contd… Pseudocode for Conversion of formal concept lattice to XML Let n be the total number of objects and m be the total number of attributes For j =1 to n For k= 1 to m For each object Ij and attribute Ak that is is an attribute of Ij, Form the XML element with head =Ij and list of attributes Ak

Conclusion Software documentation practices vary among different organizations. 53% of the organizations deliver consistent software to maintenance phase 16% update their documentation at all levels 53% of organizations have their user manuals consistent with system state 42% revise and modify regression test case repositories 11% achieve full traceability amongst system documents and only 5% have achieved traceability of change. On an average, a software Cost savings of % is expected to be achieved depending on the size and complexity of software documentation

Thank You