Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.

Similar presentations


Presentation on theme: "Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez."— Presentation transcript:

1 Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez

2 Outline  Why is there a need for a natural-language- based system for extracting information from documents  Alternative ways for extracting information from documents  System design and implementation details  Experimental Results

3 Motivation for SIFT  What is SIFT? SIFT stands for Specification Information From Text.  Various documents in Software Engineering are written in natural language.  Examples: Requirements and Specification Documents, User Manuals.  Software Engineering Documents tend to be written in a very particular way with specific sections and subsections, i.e., they are semi- structured.

4 What does SIFT do?  SIFT is essentially an automated testing tool  It extracts specification-level information, generates tests with that information and adds them to the set of existing test cases  The tests are then run to check that the system conforms to the documentation

5 Alternative ways for extracting information from documents  Use a controlled language for requirements specifications  Parse natural language texts about testing entirely and generate test scripts  Extract specific facts on system specifications, but no specific testable facts

6 What is unique about SIFT?  Extracts specific testable facts from semi- structured documents  Uses XML, which separates content information from presentation formats, to give the document a consistent structure  Does not pursue full-text understanding, thus avoiding issues related to the endless ways of saying the same thing

7

8 How to use SIFT  Identify concepts that can be extracted for testing  Examine a document to find out how it is organized and to find the different sentence types  Encode sentence types in a grammar  Create XML tags to give the document a consistent structure

9 XML tag examples

10 Example of how a sentence is processed  Natural-language specification: The maximum value you can specify with the BUFQUO argument is 65355  The parser translates this to a canonical form: The maximum value for BUFQUO is 65355 and a canonical form (maximum_value BUFQUO 65355)  Maximum_value BUFQUO 65355 is then mechanically converted into actual code, a test case, and added to the system

11 Example of a rule in a grammar  Suppose you have two structurally equivalent sentences: The box is on the counter. The glass is under the counter.  They would be translated into a rule in a grammar as follows: NounPhrase is Preposition NounPhrase

12 When can SIFT be used  Use on long-term projects where documentation will go through many versions  Use on semi-structured documents that are organized in a predefined way  Use on documents written in a consistent style  Use on domains that have many similar semantic entities (example: methods that have arguments)

13 Experimental Results  SIFT was used to extract information from an operating system’s reference manual  The total number of tests identified by the developers was 174  SIFT was able to find 25 or 14% of the 174

14 Final thoughts  It is only a proof-of-concept testing tool, but it has potential to save developers time on trivial test cases  I think the natural-language approach is error- prone and costly because people may not follow a consistent writing style  Deciding on a standard template that limits the choices of structure in a document might be more useful, since people will be forced to follow the standard and it is less likely that tests will be missed because of an inconsistent writing style


Download ppt "Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez."

Similar presentations


Ads by Google