1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /

Slides:



Advertisements
Similar presentations
Integration of MBSE and Virtual Engineering for Detailed Design
Advertisements

Database System Concepts and Architecture
DETAILED DESIGN, IMPLEMENTATIONA AND TESTING Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.
Object-Oriented Software Development CS 3331 Fall 2009.
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
Hydra (A General Framework for Formalizing UML with Formal Languages for Embedded Systems*) *from the Ph.D. thesis of William E. McUmber Software Engineering.
1 An Architecture for Distributing the Computation of Software Clustering Algorithms 2001 Working Conference on Software Architecture (WICSA'01). Brian.
ARCHITECTURAL RECOVERY TO AID DETECTION OF ARCHITECTURAL DEGRADATION Joshua Garcia*, Daniel Popescu*, Chris Mattmann* †, Nenad Medvidovic*, and Yuanfang.
Crunch: Search-based Hierarchy Generation for State Machines Mathew Hall University of Sheffield.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Software Architecture Design Instructor: Dr. Jerry Gao.
1 An Evolutionary Algorithm for Query Optimization in Database Kayvan Asghari, Ali Safari Mamaghani Mohammad Reza Meybodi International Joint Conferences.
CMSC 132: Object-Oriented Programming II Nelson Padua-Perez William Pugh Department of Computer Science University of Maryland, College Park.
SWE Introduction to Software Engineering
Automated Changes of Problem Representation Eugene Fink LTI Retreat 2007.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Problem Spaces and Search Fall 2008 Jingsong.
CSC230 Software Design (Engineering)
Chapter 12: Simulation and Modeling Invitation to Computer Science, Java Version, Third Edition.
Data Structures and Programming.  John Edgar2.
TEACHING PROGRAMMING BY ITERATIVE DEEPENING Dr. Mark Lee | School of Computer Science, University of Birmingham
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Automated Grading system for computer Programming Courses
Introduction to High-Level Language Programming
 A set of objectives or student learning outcomes for a course or a set of courses.  Specifies the set of concepts and skills that the student must.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
Chapter 12: Simulation and Modeling
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
OpenAlea An OpenSource platform for plant modeling C. Pradal, S. Dufour-Kowalski, F. Boudon, C. Fournier, C. Godin.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
UNDERSTANDING DYNAMIC BEHAVIOR OF EMBRYONIC STEM CELL MITOSIS Shubham Debnath 1, Bir Bhanu 2 Embryonic stem cells are derived from the inner cell mass.
Developing Contemporary Canonical Software Courses Summer Program Overview Rise Research Group at Drexel.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Generative Middleware Specializations for Distributed, Real-time and Embedded Systems Institute for Software Integrated Systems Dept of EECS, Vanderbilt.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Ioana Sora, Gabriel Glodean, Mihai Gligor Department of Computers Politehnica University of Timisoara Software Architecture Reconstruction: An Approach.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
Generic API Test tool By Moshe Sapir Almog Masika.
Fuzzy Genetic Algorithm
Illustrations and Answers for TDT4252 exam, June
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
Testing “The process of operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some.
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
AI ● Dr. Ahmad aljaafreh. What is AI? “AI” can be defined as the simulation of human intelligence on a machine, so as to make the machine efficient to.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
© SERG Reverse Engineering (Interconnection Styles) Interconnection Styles.
Data Structures Using C++ 2E
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
SSQSA present and future Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics Faculty of Sciences University of Novi Sad
Software Clustering Using Bunch
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
21/1/ Analysis - Model of real-world situation - What ? System Design - Overall architecture (sub-systems) Object Design - Refinement of Design.
Methodology Review Chapter 7 Part 2: Design Methodology Object-Oriented Modeling and Design Byung-Hyun Ha
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
Building Valid, Credible & Appropriately Detailed Simulation Models
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
MULTI-OBJECTIVE APPROACHES FOR SOFTWARE MODULE CLUSTERING C. Kishore10121D2509 Presented by Guide: Head of the Department: Mr A.Srinivasulu, M.Tech.,(Ph.D.)
Brian Mitchell - Drexel University MCS680-FCS 1 Case Study: Automatic Techniques For Software Modularization int MSTWeight(int.
Chapter 12: Simulation and Modeling
Object-Oriented Analysis & Design
MultiRefactor: Automated Refactoring To Improve Software Quality
Weaving Abstractions into Workflows
Scale-Space Representation of 3D Models and Topological Matching
Model Base Validation Techniques for Software
MECH 3550 : Simulation & Visualization
2001 IEEE International Conference on Software Maintenance (ICSM'01).
Presentation transcript:

1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell / Department of Computer Science, College of Engineering Drexel University Philadelphia, PA, USA

Drexel University Software Engineering Research Group (SERG) 2 Understanding Large Systems is HARD Example: RedHat Linux 7.1 Kernel 1,400 modules, 2.5M LOC System 350K modules, 30M LOC Languages: > 19 (including scripting) [ Manual Analysis is Tedious and Error Prone Source Code Analysis Approaches Create Large Repositories Software Clustering Approaches Create Abstract Representations (1) (2) (3)

Drexel University Software Engineering Research Group (SERG) 3 Software Clustering Software clustering simplifies program maintenance and program understanding The abstract views produced by software clustering techniques can be used to help developers fix defects or add features to existing software systems

Drexel University Software Engineering Research Group (SERG) 4 Software Clustering Environments Bunch Tool Requires a Representation... …A Clustering Algorithm… …A way to Represent Results… Other Tools …And a way to Compare Results… f(x) Bunch works by partitioning a software graph and uses a fitness function called MQ to evaluate the quality of individual partitions

Drexel University Software Engineering Research Group (SERG) 5 Software Clustering Techniques A variety of techniques for software clustering have been studied by the reverse engineering community: Source code component similarity (or dissimilarity) Concept Analysis Subsystem Patterns Implementation-Specific Information My Research Contribution Was Applying Search Techniques to the Software Clustering Problem, and Improving the State of Practice for Evaluating Software Clustering Results

Drexel University Software Engineering Research Group (SERG) 6 Problem: There are too many partitions to search all of them… 1 = 1 2 = 2 3 = 5 4 = 15 5 = 52 6 = = = = = = = = = = = = = = =        otherwisekSS nkkif S knkn kn,11,1, 11 A 15 Module System is about the limit for performing Exhaustive Analysis The number of partitions (ways to cluster a system) of a software graph grows very quickly, as the number of modules in the system increases…

Drexel University Software Engineering Research Group (SERG) 7 Applying Heuristic Search Techniques To The Software Clustering Problem Source Code Analysis Tools MDG Source Code void main() { printf(“hello”); } AcaciaChava M1 M2 M3 M5M4 M6 M7M8 Software Clustering Search Algorithms “GOOD” MDG Partition M1 M2 M3 M5M4 M6 M7M8 SEARCH SPACE Set of All MDG Partitions M1 M2 M3 M5M4 M6 M8M7 M1 M2 M3 M5M4 M6 M8M7 Total = 4140 Partitions Hill Climbing Genetic Algorithm Simulated Annealing Note that a “good” Partition may not be an optimal solution

Drexel University Software Engineering Research Group (SERG) 8 Software Developed as Part of my Ph.D. Research Bunch: An Automatic Clustering Tool CRAFT: A Reference Decomposition Generator Both tools also have a documented API to support integration into other tools

Drexel University Software Engineering Research Group (SERG) 9 Bunch Example The MDG The Random Start Point A Solution JUnit is a Unit Testing Framework for Java (FrameworkPackage Shown Below) MQ = MQ = Assert TestCase TestResult CompFailureTestFailure Assert TestCase (My Dissertation Discusses Several MQ Measurements)

Drexel University Software Engineering Research Group (SERG) 10 Clustering Large Software Systems Efficiently Our goal was to cluster large and interesting systems in a reasonable amount of time: Linux Kernel: >1,000 modules in ~ 90 seconds Swing Framework: > 450 classes in ~ 20 seconds Kerberos: > 500 modules in ~35 seconds Other Popular Systems Examined: Xerces, Apache HTTP Server, Jigsaw HTTP Server, Mozilla, Ant … Overall we examined over 50 reference systems during the course of my Ph.D. research Since the source code analysis and clustering activities are separated, Bunch can cluster software developed in any programming language.

Drexel University Software Engineering Research Group (SERG) 11 Research into Evaluating Software Clustering Results Most software clustering results are evaluated subjectively For a limited set of well-studied systems a reference is available, but for many systems no benchmark decomposition exists for comparison WCRE’01: Paper described the CRAFT system to generate a reasonable reference decomposition by highlighting similarities in a collection of software clustering results One important aspect of evaluation is being able to compare software clustering results to each other ICSM’01: Paper introduced 2 measurements to determine similarity: MeCl and EdgeSim

Drexel University Software Engineering Research Group (SERG) 12 What’s Been Done Since Completing my Ph.D. Research Applying a formal Architectural Constraint Language (ISF) to software clustering results to reverse engineer the software architecture of a system Modeling the Search Landscape to better understand why Bunch produces consistent results given the size of the search space Integration of Bunch’s software clustering services into the RePortal online reverse engineering portal ( Support for GXL as both input and output representation into Bunch

Drexel University Software Engineering Research Group (SERG) 13 Additional Research Opportunities Identified in my Thesis Improved Visualization Services Clustering the Dynamic Behavior of Systems Clustering Distributed and Heterogeneous Systems Investigating other Heuristics Appropriate for Clustering Software Systems Investigating other Representations of Systems being Clustered

Drexel University Software Engineering Research Group (SERG) 14 Summary Application of search techniques to the software clustering problem Developed software clustering algorithms and software to cluster large and interesting systems efficiently Developed software and techniques to improve the state of practice for evaluating software clustering results

Drexel University Software Engineering Research Group (SERG) 15 Recognition Special Thanks To: My Advisor: Dr. Spiros Mancoridis My Committee: Dr. J. Johnson, Dr. C. Rorres, Dr. A. Shokoufandeh, Dr. R. Chen, and Dr. L. Perkovic (former member) My Sponsors: AT&T Research, Sun Microsystems, DARPA, NSF, US Army Bunch Project Contributors: D. Doval, M. Traverso, S. Mancoridis Dr. E. Gansner & Dr. R. Chen (AT&T Labs - Research) for test data and validation of Bunch’s clustering results. The gang at the SERG lab…

Drexel University Software Engineering Research Group (SERG) 16 Questions / More Information Reverse Engineering Drexel Bunch – Software Clustering Tool CRAFT – Benchmark Generation Tool RePortal – Online Reverse Engineering Portal Where to Download & Evaluate