Dependency Tracking in software systems Presented by: Ashgan Fararooy
Related Papers Supporting Software Evolution Analysis with Historical Dependencies and Defect Information ( ICSM 2008) A Flexible Framework to Support Collaborative Software Evolution Analysis ( CSMR 2008 ) Mining Software Repositories for Traceability Links ( ICPC 2007 ) Tracking Objects to Detect Feature Dependencies ( ICPC 2007 ) Software Repositories: A Source for Traceability Links ( TEFSE-GTC 2007 ) Mining Version Archives for Co-changed Lines ( ICSE 2006 ) Understanding Semantic Impact of Source Code Changes: an Empirical Study
Mining Version Archives for Co-changed Lines Thomas Zimmermann, Sunghun Kim, Andreas Zeller, E. James Whitehead Jr. (ICSE 2006)
Abstract Files, classes, or methods have frequently been investigated in research on co-change Present a first study at the level of lines Annotation Graph which captures how lines evolve over time More fine-grained software evolution information (based on lines)
Overview Co-Change: items that are changed together, are related to each other Any granularity: modules, files, classes, methods What about more fine-grained items: blocks, lines …
Co-Change in More Fine-Grained Items Seemed infeasible Hard to identify across different versions Line numbers are not suitable identifiers SCM systems annotation feature is not enough Line content is not a good identifier either
Annotation Graph Definition: – A multipartite graph where each part corresponds to one version of a file – Within each part/version every line is represented by a single node – Edges between node indicate that a line originates from another: by modification / movement – Node labels (e.g. bold node) indicate a changed line
Annotation Graph
Construction : – One needs to compare all subsequent revisions of a file – Using the GNU diff tool For computing textual differences – The diff tool returns a list of regions (“hunk”s) that differ in the two files
Annotation Graph Three different kinds of changes: – Modifications Result in a complete bipartite subgraphs – Additions Do not result in any edges Positions of the following lines are updated – Deletions The same effect as in addition
Annotation Graph Computation : – Creates nodes for each revision and each line – Two approaches 1- Forward-Directed 2- Backward-Directed
Annotation Graph Computation (Forward-Directed Algorithm): – Iterate over all pairs of subsequent revisions – For each pair compute the differences (hunks) – Process the hunks to create edges Exactly one edge between unchanged lines (nodes) For modified lines all possible edges For inserted and deleted lines no edges – Label the nodes of the later revision in modifications and additions
Annotation Graph Problem : – Changes that modify large parts of a file – Results in a large number of edges – Not reasonable for evolution analysis
Annotation Graph – Treat large modifications as combined deletions and additions – No creation of edges in the annotation graph
Annotation Graph Recognizing Large Modifications :
Annotating Lines Comparison – Most SCM systems have annotating features for each line providing the latest change information – Annotation graphs can be used to get such information – Furthermore, they provide information on all past changes
Life Cycle of Lines Investigated the life cycle of lines for the Eclipse Project – How frequently are lines changed Computed for each line the change count The number of distinct revisions in its annotation – How many developers change a line – What are the most frequently changed lines
Finding Related Lines – Computed related lines using frequent pattern mining – Used transaction ids instead of revision ids – Used Apriori algorithm – Inferred useful association rules
Thank you