University of Waterloo Four “interesting” ways in which history can teach us about software Michael W. Godfrey * Xinyi Dong Cory Kapser Lijie Zou Software.

Slides:

Advertisements

Similar presentations

Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)

Advertisements

1 Deciding When to Forget in the Elephant File System University of British Columbia: Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross.

SOFTWARE MAINTENANCE 24 March 2013 William W. McMillan.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.

Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.

1 Keeping Track: Coordinating Multiple Representations in Programming Pablo Romero, Benedict du Boulay & Rudi Lutz IDEAs Lab, Human Centred Technology.

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

Software Engineering Tools and Methods Presented by: Mohammad Enamur Rashid( ) Mohammad Rashim Uddin( ) Masud Ur Rahman( )

Introduction and Overview to Mining Software Repository Zoltan Karaszi zkaraszi (at) kent.edu MS/PHD seminar (cs6/89191) November 9th,

G51FSE Version Control Naisan Benatar. Lecture 5 - Version Control 2 On today’s menu... The problems with lots of code and lots of people Version control.

Evolution Patterns of Open-Source Software Systems and Communications Review Report By Haroon Malik.

Git: Part 1 Overview & Object Model These slides were largely cut-and-pasted from tutorial/, with some additions.

1 Software Maintenance and Evolution CSSE 575: Session 8, Part 2 Analyzing Software Repositories Steve Chenoweth Office Phone: (812) Cell: (937)

Fermi Large Area Telescope (LAT) Integration and Test (I&T) Data Experience and Lessons Learned LSST Camera Workshop Brookhaven, March 2012 Tony Johnson.

1 CSE 2102 CSE 2102 CSE 2102: Introduction to Software Engineering Ch9: Software Engineering Tools and Environments.

University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.

Dependency Tracking in software systems Presented by: Ashgan Fararooy.

Software Engineering CS3003

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.

Software Engineering CS3003 Lecture 3 Software maintenance and evolution.

1 Copyright ©2004 TAC. 2 T-WorMS Adding Sanity to Your Process Jamie L. Mitchell CTO TAC.

Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.

Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.

CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.

J2EE architecture analysis using relational algebra Michael W. Godfrey SWAG, University of Waterloo

Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?

Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,

University of Waterloo How does your software grow? Evolution and architectural change in open source software Michael Godfrey Software Architecture Group.

1 Evaluating Code Duplication Detection Techniques Filip Van Rysselberghe and Serge Demeyer Lab On Re-Engineering University Of Antwerp Towards a Taxonomy.

Object Oriented Reverse Engineering JATAN PATEL. What is Reverse Engineering? It is the process of analyzing a subject system to identify the system’s.

Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK AICA 2004, Benevento,

11 Version Control Systems Mauro Jaskelioff (originally by Gail Hopkins)

Adoption Participants in the adoption group Heiko Kern Parastoo Mohagheghi Manuel Wimmer Juha Pärssinen Juha-Pekka Tolvanen Laurent Safa Sven Braun Gerardo.

Copyright © 2015 NTT DATA Corporation Kazuo Kobori, NTT DATA Corporation Makoto Matsushita, Osaka University Katsuro Inoue, Osaka University SANER2015.

Evolution in Open Source Software: A Case Study

Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.

1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.

Extraction Tools and Relational Database Schemas for CVS, SVN, and Bazaar Revision Control Systems.

P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.

1 Experience from Studies of Software Maintenance and Evolution Parastoo Mohagheghi Post doc, NTNU-IDI SEVO Seminar, 16 March 2006.

CSER / CASCON 1999 Exchange formats: Some problems, a few results, and a cool name Michael Godfrey Ivan Bowman and others … University of Waterloo.

Tijs van der Storm Continuous integration and Minimization of dependencies.

Aiding Comprehension of Cloning Through Categorization Cory Kapser and Michael W. Godfrey Software Architecture Group School of Computer Science, University.

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

Connecting Architecture Reconstruction Frameworks Ivan Bowman, Michael Godfrey, Ric Holt Software Architecture Group University of Waterloo CoSET ‘99 May.

The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.

1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell

DevCOP: A Software Certificate Management System for Eclipse Mark Sherriff and Laurie Williams North Carolina State University ISSRE ’06 November 10, 2006.

Software Test Plan Why do you need a test plan? –Provides a road map –Provides a feasibility check of: Resources/Cost Schedule Goal What is a test plan?

University of Waterloo Exploring Structural Change and Architectural Evolution Qiang Tu and Michael Godfrey Software Architecture Group (SWAG) University.

Acacia and CppETS Michael W. Godfrey Software Architecture Group (SWAG) University of Waterloo.

Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana

Exploring Software Evolution Using Spectrographs Jingwei Wu, Richard C. Holt, Ahmed Hassan School of Computer Science University of Waterloo Waterloo ON.

1 March 10, Project Planning William Cohen NCSU CSC 591W March 10, 2008.

Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.

David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K

CBCD: Cloned Buggy Code Detector

Program comprehension during Software maintenance and evolution Armeliese von Mayrhauser , A. Marie Vans Colorado State University Summary By- Fardina.

Modelling and Extracting the Build-Time Architectural View

Data Analysis in Particle Physics

: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein

TRUST:Team for Research in Ubiquitous Secure Technologies

CSE 303 Concepts and Tools for Software Development

Secrets from the Monster: Extracting Mozilla’s Software Architecture

Model Comparison: A Key Challenge for Transformation Testing and Version Control in Model Driven Software Development Yuehua Lin, Jing Zhang, Jeff Gray.

Recommending Adaptive Changes for Framework Evolution

A Case Study of Variation Mechanism in an Industrial Product Line

Chapter 5 File Systems -Compiled for MCA, PU

Data Warehouse and OLAP Technology

Presentation transcript:

University of Waterloo Four “interesting” ways in which history can teach us about software Michael W. Godfrey * Xinyi Dong Cory Kapser Lijie Zou Software Architecture Group (SWAG) University of Waterloo *Currently on sabbatical at Sun Microsystems

1.Longitudinal case studies of growth and evolution Studied several OSSs, esp. Linux kernel: –Looked for “evolutionary narratives” to explain observable historical phenomena Methodology: –Analyze individual tarball versions –Build hierarchical metrics data model –Generate graphs, look for interesting lumps under the carpet, try to answer why

1.Longitudinal case studies of growth and evolution Source code Metrics data Analysis scripts MS Excel Extraction / analysis Exploration

2.Case studies of origin analysis Reasoning about structural change –(moving, renaming, merging, splitting, etc.) –Try to reconstruct what happened –Formalized several “change patterns” e.g., service consolidation Methodology: –Consider consecutive pairs of versions: Entity analysis – metrics-based clone detection Relationship analysis – compare relational images (calls, called-by, uses, extends, etc) –Create evolutionary record of what happened what evolved from what, and how/why g y x z V old f y x z V new ???

2.Case studies of origin analysis Source code ER model Metrics data cppx / Understand / Beagle Extraction / analysis Exploration

3.Case studies of code cloning Motivation: –Lots of research in clone detection, but more on algorithms and tools than on case studies and comprehension What kinds of cloning are there? Why does cloning happen? What kinds are the most/least harmful? Do different clone kinds have different precision / recall numbers? Different algorithms? –Future work: track clone evolution Do related bugs get fixed? Does cloned code have more bugs? Methodology: 1.Use CCFinder on source to find initial clone pairs. 2.Use ctags to map out source files into “entity regions” –Consecutive typedefs, fcn prototypes, var defs –Individual macros, structs, unions, enums, fcn defs 3.Map (abstract up) clone pairs to the source code regions

3.Case studies of code cloning Methodology: 4.Filter different region kinds according to observed heuristics –C struct s often look alike; parameterized string matching returns many more false positives without these filters than, say, between functions. 5.Sort clones by location: –Same region, same file, same directory, or different directory 6.… and entity kind: –Fcn to fcn –structures ( enum, union, struct ) –macro –heterogeneous (different region kinds) –misc. clones 7.… and even more detailed criteria: –Function initialization / finalization clones, … 8.Navigate and investigate using CICS gui, look for patterns –Cross subsystem clones seems to vary more over time –Intra subsystem clones are usually function clones

3.Case studies of code cloning Source code CCFinder Extraction / analysis Exploration ctags CICS gui Taxonomized clone pairs Custom filters and sorter

4.Longitudinal case studies of software manufacturing-related artifacts Q: How much maintenance effort is put into SM artifacts, relative to the system as a whole? Studying six OSSs: –GCC, PostgreSQL, kepler, ant, mycore, midworld All used CVS; we examined their logs We look for SM artifacts ( Makefile, build.xml, SConscript ) and compared them to non-SM artifacts

4.Longitudinal case studies of software manufacturing-related artifacts Some results: –Between 58 and 81 % of the core developers contributed changes to SM artifacts –SM artifacts were responsible for 3-10% of the number of changes made Up to 20% of the total LOC changed (GCC) Open questions: –How difficult is it to maintain these artifacts? –Do different SM tools require different amounts of effort?

4.Longitudinal case studies of software manufacturing-related artifacts CVS repos Metrics data Analysis scripts MS Excel Extraction / analysis Exploration

Dimensions of studies Single version vs. consecutive version pairs vs. longitudinal study Coarsely vs. finely grained detail Intermediate representation of artifacts: –Raw code vs. metrics vs. ER-like semantic model –Navigable representation of system architecture; auto- abstraction of info at arbitrary levels

Challenges in this field 1.Dealing with scale “Big system analysis” times “many versions” Research tools often live at bleeding edge, slow and produce voluminous detail 2.Automation Research tools often buggy, require handholding Often, hard to get automated multiple analyses.

Challenges in this field 3.Artifact linkage and analysis granularity Repositories (CVS, Unix fs) often store only source code, with no special understanding of, say, where a particular method resides. (How) should we make them smarter? e.g., ctags and CCfinder 4.[Your thoughts?]