Evolving Trace Links across Versions of a Software System in Safety-Critical Domain Mona Rahimi , Jane Cleland-Huang DePaul University.

Evolving Trace Links across Versions of a Software System in Safety-Critical Domain
Mona Rahimi , Jane Cleland-Huang DePaul University

Motivation Safety Critical Systems and rigorous Safety analysis
Trace Link Quality degrades as system evolves Maintaining trace links is arduous, error-prone and costly More towards adoption of agile and the big freeze problem Ongoing analysis of the traceability coverage Reuse the existing traceability information as the software evolves Constantly retain the safety of the system Safety-Critical Systems as we know are those which failure could result in loss of life, sever damage to property or the environment. The examples can be pace makers, auto-pilot systems in aircrafts and infusion pumps. Therefore in these systems, rigorous safety analyses is required to approve their safety, using techniques such as failure modes, fault tree analysis and assurance cases. And despite the fact that traceability provides critical support for numerous software engineering activities including safety analysis, the quality of these trace links degrade as the system changes and evolves during time. And as we know maintaining these trace links specially in safety critical projects can be difficult, error-prone and costly process. As a result, traceability data is not trusted. In a study, Mader et al studied 10 different submissions for FDA medical device approval and identified 9 traceability problems that affected regulators ability to evaluate products safety. Traceability data is even much less trusted as the system evolves over time. This problem of trace link maintenance grows even larger nowadays that the trend is more towards the adoption of agile methods. so for all these reasons, there exists the need for even more automated trace link maintenance which can also facilitate a continual evaluation of safety in the system.

Assurance Case To automatically generate:
Using documented artefacts and trace links between them to automatically construct candidate assurance cases . Detect the initial impact point and propagate the change. 4. Decide if the new argument is still convincing for the safety of the system or not? change change A means to structure the reasoning that the system is safe. Now imagine that we have a safety case and we know a safety case provides a mean to structure the reasoning that the system works as it is expected. Or this argument can even be a fault tree Let’s assume we have such an argument which reasons that a system is safe, then over time a change happens in the system. This Change can be anything: it can be adding new requirement to a system, modifying a constraint value, adding a new class to the source code, promoting a method to a class in the source code or even introducing new artefact type to the system. So now depending on the extent of the change, we might require to either modify some parts of this existing argument, or reconstruct the whole argument reusing the trace information we already had or even add another new argument to evaluate the safety of the system after the change. So in one sentence we need to analyze the change impact on the existing safety case and reconstruct new convincing argument to prove that the new system is still safe even after the change. What if this could happen automatically? Having a safety case, a change happens, then here is the new safety case? Cool huh?!! Then in order to do that we need to: Automate the construction of safety cases using the documented artefacts of the system and trace links between them Second we need to detect the initial impact point in the system and propagate the change Automatically update the trace links after the change and then use these new trace links to construct a new safety model And finally we need to decide if this new argument is still convincing for the safety of the system after the change or not? To start, we focused on automatically generate new trace links after change is introduced. 3. Automatically generate new trace links after the change and then use to construct the new assurance case.

Calculus of Change change change Constant debridement of stale links as the change happens. Trace Slice vs. Trace Matrix Continually having fresh links all time In other words we divide the big problem into a smaller problem and we call it “Constant debridement of stale links”. The term “debridement” in medical terms refers to removing the dead tissues and in our problem debridement refers to an engine which is capable of removing trace links to a pool of untrusted links as soon as a link gets suspect due to a change. This pool of untrusted trace link evidence can be later used to construct trusted trace links. So the general idea is a clear separation of trusted and untrusted trace links. And as you see in the left side of the image, in this work we propose the use of Trace Slices instead of traditional Trace Matrices. These trace slices can be easily generated from a set of matrices, and we believe theses sliced-size chunks provide more clear information for the certifiers and they fit better to the notion of assurance cases. So in one sentence the idea is to constantly have Fresh links through calculus of change or a dynamic environment that continually separates the trusted trace links from untrusted ones and remove the suspicious links associated with supporting trace information to a pool. And this pool of untrusted links can be later used to construct trusted links. Jane Cleland-Huang , Mona Rahimi, Patrick Mader: Achieving lightweight trustworthy traceability. SIGSOFT FSE 2014:

Trace Link Evolver Trace Link Evolver(TLE): A solution to automatically evolve requirements-to-code trace links. Detect change scenarios using 3 open source tools(srcML, java-callgraph and Refactoring Crawler), Information Retrieval techniques and finally our written scripts in Java. Each change scenario corresponds to a set of trace link evolution heuristics. In order to achieve our long-term goal, calculus of change across all structured artifacts, we start with presenting TLE(Trace Link Evolver). TLE automatically evolves trace links between requirements and source code as changes are introduced to the system, to start we focused on source code because developers frequently refactor and change code without updating the links. As you can see in the image, we take an initial set of requirements, source code, and trace links, as well as modified versions of requirements and source code. We then identify change scenarios that have occurred between the two versions. We use 3 open source tools, (srcML, Java call-Graph and Refactoring Crawler)Information retrieval techniques plus our own heuristics written in Java to identify the change scenarios. Each of these refactoring/change scenarios is then associated with a corresponding set of trace link evolution heuristics which determines which links exactly have to be created and which links have to be decayed. Mona Rahimi, William Goss, Jane Cleland-Huang: Evolving requirements-to-code trace links across versions of a software system. FSE 2015: under revision

Detecting Change Scenarios
We need to understand rules of change or change scenarios so that we can debride trace links accurately. For example, just knowing that a new class has been added as a change is not enough to evolve trace links accurately. We need to exactly know the scenario behind each change. For example here you see 6 different scenarios which can be applied to a newly added class, part (a) represents the case that this newly added class Ci+1 has been added because a new functionality is introduced in part (b) it has been extracted from an old class in the older version, in part (c) the newly added class is the merging result of two old classes? In part (d) it is a promoted old method ? In part (e) and (f) it is an extracted subclass or superclass. So depending on which of these change scenarios is the case, the heuristics to evolve the trace links can vary. So after TLE identified the exact change scenario, it will automatically apply the corresponding set of trace link evolution heuristics and evolve the links. We also have a similar set of change scenarios for deleted classes, added, deleted or modified methods. Understand change scenario to evolve trace links accurately 6 different scenarios for a newly added class in new version Similar scenarios for deleted class, added/deleted/modified method

Change Scenarios Change Scenarios Focused only on those refactorings:
would be likely to trigger changes to trace links more familiar to the programmers. Focused on detecting these 24 scenarios Although its primary purpose is to help developers to improve the structure of the code but refactoring is also a good vocabulary for describing the changes that might happen in the source code. Among over 70 refactorings Martin Fowler lists in his popular book we particularly focused only on identifying those refactorings which would be likely to trigger changes to trace links and are also more familiar to the programmers. So we came down to identifying these 24 change scenarios in the source code that you can see in this table including New functionality, obsolete functionality, merged methods, rename class and etc Mona Rahimi, William Goss, Jane Cleland-Huang: Evolving requirements-to-code trace links across versions of a software system. FSE 2015: under review

Scenario 3: New class is the merging result of
Heuristics to Determine Change Scenarios Scenario 1: New class added new functionality Rule 10 Rule 40 Scenario 3: New class is the merging result of two old classes Rule 7 Rule 15 Rule 23 Rule 29 Rule 46 OK how do we identify these 24 change scenarios in the source code? we defined total of 49 artifact properties to get assessed. These Artifact properties are categorized into 6 different groups. First Group: are the rules which examine the lower level changes such as if a class has been added, deleted or a method has been modified. Group 2: Checks for a link or similarity between classes. Group 3: Checks for a link or similarity between classes and requirements. Group 4: checks for methods in the classes. Group5: checks for association between classes. Group 6: checks for the existence of classes, methods and requirements. Then each of these 24 change scenarios is defined through a subset of these 49 properties. For example the first 6 scenarios are the 6 different scenarios for newly added class I discussed earlier in my slides. All these 6 scenarios have the first property true saying: there exist a class in new version i+1 which this class did not exist in the older version i. Then we want to identify which of these 6 scenarios is the case for this newly added class. For example to assess scenario 1 which was the case that this new added class has been added because of a new functionality, then we say rule number 10 and rule number 40 need to hold. Rule 10 says if the class is added due to new functionality then there should be a similarity between the newly added class and the newly added requirements and rule 40 says there should not exist any other old class which is similar to this new class. If these rules hold then we say this is 1 scenario is recognized. behind the change Another Example: to assess scenario 3 which was the case that this new added class is the merged results of two old classes, then we say rule number 7, 15, 23, 29 and 46 need to hold. Rule 7 states that two old classes should exist which are similar to the newly added class. Property 15 states that requirements which are similar to the newly added class are a subset of the union of requirements of any of two old classes. Property 23 states in case of this scenario, there should also exist two old classes for which their methods union is a superset of all methods in the new class. Property 29 states that classes associated with the new class must be a subset of all associations of two old classes. Finally Property 46 states that the two identified old classes no longer exist in the new version. If all properties are found to be true, then Change Scenario S3 is recognized and we update the trace links according to our trace link evolution heuristics I will explain in the next slide.

Link Evolution Heuristics
Scenario 1 is detected: Create trace links to the newly added requirements which are similar to it. Scenario 3 is detected: Create trace links to the requirements already linked to the two old merged classes and debride/decay the links to the two old classes. Finally, we define Link Evolution Heuristics for all the 24 change scenarios we detect in the source code. These heuristics specify which links exactly should be created and deleted if a specific scenario is detected. For example, for the same scenario1 and scenario 3 I mentioned earlier. If scenario 1 is detected which was new class due to new functionality, then we evolve trace links between the newly added class and the newly added requirements which are similar to it. When I speak of similarity I am talking about the algebraic model VSM(vector space model) used in information retrieval. Which represents text documents as vectors of identifiers and calculate the similarity of vectors based on cosine of the angel between them. If scenario 3 is detected which was the case that this new added class is the merged results of two old classes, then we create trace links between the newly added class and the requirements already linked to the two old merged classes and we debride or decay the links to the two old classes also. Mona Rahimi, William Goss, Jane Cleland-Huang: Evolving requirements-to-code trace links across versions of a software system. FSE 2015: under revision

Evaluation(Controlled Experiment)
Datasets: 11 revisions of two java application Domain Analysis App: 237 LOC DOTS File Generator: 446 LOC Baseline: VSM, LDA or LSI Metrics: Recall, Precision and F2 Generated Links with VSM: 0.69, 0.35 and 0.55 Evolved Links with TLE: 0.96, 0.92 and 0.95 We evaluated TLE through conducting a controlled experiment, and also by using it across two versions of an open-source project. In the controlled experiment, our first evaluation, we designed an experiment that we used Trace Link Evolver(TLE) to evolve trace links for 11 revisions of two java application. The First application or dataset was (Domain Analysis Application) includes 237 lines of code and reads a set of domain-related files and a set of general documents and performs natural language part-of-speech tagging using Qtag and then output list of domain-specific noun phrases. The second dataset(DOTS File Generator) includes 446 lines of code and reads a set of topics generated by LDA and use Vector Space Model(VSM) to compute similarities between topics and generates a graph structure by GraphViz. We recruited 11 Java developers and asked them to spend 90 minutes and modify either of the applications. Then we used TLE to evolve trace links for the revised version and compared the results to the base line. Our baseline was the state-of-practice which is to use Information retrieval techniques such as VSM, LDA or LSI. We selected VSM because has been shown to perform better across a broad spectrum of datasets. The metrics we used were recall, precision and F2 measure and you see the average recall of 0.69 for the baseline and 0.96 for TLE, the average precision of … Mona Rahimi, William Goss, Jane Cleland-Huang: Evolving requirements-to-code trace links across versions of a software system. FSE 2015: under review

Evaluation(Open Source Project)
To evaluate TLE in a larger scale we selected Apache Cassandra database system which is a java-based open source distributed database management system and is designed to handle large amount of data. We selected two sequent versions 1.0 and 1.1 of Cassandra including 486 and 536 classes. We selected 24 of these classes in the older version which had significant changes. We then associated these 24 classes to features of Cassandra using the documentations of the system. As a control we also selected 5 random classes with no significant changes between two versions. We use TLE to evolve the links to the features for the newer version. As you see in the table the two first columns show the original classes and the associated features the 3rd column shows the new or modified classes in the new version. In the 4th column we show the description of change and the 5th column shows the detected change scenario by TLE. And the 6th column shows the link action associated with the detected scenario. For example first row says class cfoutputformat is responsible for loading data into Cassandra from a Hadoop job and TLE tells us the new added class bulkoutputformat has been extracted from the old class cfoutputformat which is scenario 2. And it creates a new link from the original feature to this new class. To evaluate the result we searched the Cassandra commit logs and release notes and found the exact relationships between two classes and the explanation of this change. For example for this case the reason for the extracted class was to improve throughput of big jobs. And as you can see for each single trace link evolution has been recognized as correct.

Conclusion and Future Work
Ultimate goal is to automatically generate safety cases after change is introduced. To achieve this, we propose a calculus of change to update trace links continually. To start, we implemented TLE to evolve trace links between source code and requirements after change happens. We evaluated TLE through a controlled experiment and also across two versions of Apache Cassandra. Future Work: Using evolutionary algorithms to detect scenarios of change TLE over multiple versions of a software. TLE against other programming languages. To ultimately be able to generate safety cases automatically for safety critical systems after any change happens in the system, we proposed a calculus of change which automatically debrides the decayed trace links and also creates new ones after the changes. As a proof of concept, we implemented a Trace Link Evolver or TLE for evolving traces between source code and requirements when a change happens. We illustrated the efficiency of TLE in a controlled experiment and also against Apache Cassandra database management system There exist numerous future works one is that as the distance between versions of a software grows it gets difficult to determine the path of refactoring therefore we want to investigate the use of evolutionary algorithms in detecting the change scenarios. We want to apply TLE over multiple versions of a software and to evaluate it against other programming languages.

Evolving Trace Links across Versions of a Software System in Safety-Critical Domain Mona Rahimi , Jane Cleland-Huang DePaul University.

Similar presentations

Presentation on theme: "Evolving Trace Links across Versions of a Software System in Safety-Critical Domain Mona Rahimi , Jane Cleland-Huang DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evolving Trace Links across Versions of a Software System in Safety-Critical Domain Mona Rahimi , Jane Cleland-Huang DePaul University.

Similar presentations

Presentation on theme: "Evolving Trace Links across Versions of a Software System in Safety-Critical Domain Mona Rahimi , Jane Cleland-Huang DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback