Software Clustering.

Slides:



Advertisements
Similar presentations
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Advertisements

Design Concepts and Principles
Chapter 13 Design Concepts and Principles
Solutions to Review Questions. 4.1 Define object, class and instance. The UML Glossary gives these definitions: Object: an instance of a class. Class:
Introduction To System Analysis and Design
1 SYSTEM and MODULE DESIGN Elements and Definitions.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
The Re-engineering and Reuse of Software
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
OO Analysis and Design CMPS OOA/OOD Cursory explanation of OOP emphasizes ▫ Syntax  classes, inheritance, message passing, virtual, static Most.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
Chapter 10 Architectural Design
Lesson 7 Guide for Software Design Description (SDD)
1/19 Component Design On-demand Learning Series Software Engineering of Web Application - Principles of Good Component Design Hunan University, Software.
“Enhancing Reuse with Information Hiding” ITT Proceedings of the Workshop on Reusability in Programming, 1983 Reprinted in Software Reusability, Volume.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
SOFTWARE DESIGN.
SOFTWARE DESIGN Design Concepts Design is a meaningful engineering representation of something that is to be built It can be traced to a customer’s requirements.
Software Design Deriving a solution which satisfies software requirements.
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Design Concepts and Principles Instructor: Dr. Jerry Gao.
Design Concepts By Deepika Chaudhary.
Software Engineering Principles. SE Principles Principles are statements describing desirable properties of the product and process.
Design Principle & Patterns by A.Surasit Samaisut Copyrights : All Rights Reserved.
1 CMPT 275 High Level Design Phase Modularization.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Software Design Process
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
Software Design Chapter 11. Purposes of Design Design transforms the specification into a design. The end purpose of the design is to produce detail and.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
Week 6: Software Design HNDIT Software Engineering Software Design Learning Outcomes  Understand the activities involved in the Design process.
Design Engineering 1. Analysis  Design 2 Characteristics of good design 3 The design must implement all of the explicit requirements contained in the.
Principles of Programming & Software Engineering
Chapter 1: Software design
7. Modular and structured design
CompSci 280 S Introduction to Software Development
Basic Concepts in Software Design
Types for Programs and Proofs
Object-Oriented Software Engineering Using UML, Patterns, and Java,
which satisfies software requirements
Lecture 9- Design Concepts and Principles
Basic Concepts in Software Design
Software Design Mr. Manoj Kumar Kar.
Software Quality Engineering
MultiRefactor: Automated Refactoring To Improve Software Quality
Lecture 2 of Computer Science II
Abstract descriptions of systems whose requirements are being analysed
Chapter 10: Process Implementation with Executable Models
Chapter 19: Interfaces and Components
CIS 375 Bruce R. Maxim UM-Dearborn
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
Objective of This Course
CS223: Software Engineering
Multiple Aspect Modeling of the Synchronous Language Signal
Lecture 9- Design Concepts and Principles
Chapter 13 Quality Management
CSE403 Software Engineering Autumn 2000 Design (Overview)
An Introduction to Software Architecture
Chapter 19: Interfaces and Components
Chapter 19: Interfaces and Components
Architecture Recovery
Chapter 2. Problem Solving and Software Engineering
Coupling Interaction: It occurs due to methods of a class invoking methods of other classes. Component Coupling: refers to interaction between two classes.
Interfaces and Components
Evaluating Software Clustering Algorithms
Chapter 19: Interfaces and Components
Presentation transcript:

Software Clustering

Understanding the Structure of Programs is Difficult Developers create sophisticated applications that are complex and involve a large number of interconnected components. Result: Program understanding is difficult Goal: Use automated techniques to help developers understand the structure of software systems. 12/8/2018 COSC6431

Common Problems Creating a good mental model of the structure of a complex system. Keeping a mental model consistent with changes that occur as the system evolves. These problems are exacerbated by: non-existent or inconsistent design documentation high rate of turnover among IT professionals Assumption: Understanding the structure of a systems software is valuable for maintainers. 12/8/2018 COSC6431

Solutions Automatic: Use software clustering techniques to decompose the structure of software systems into meaningful subsystems. Subsystems help developers navigate through the numerous software components and their interconnections. Manual: Use notations such as UML to specify the software structure. 12/8/2018 COSC6431

A Software Clustering Primer Directed graphs are commonly used to represent the structure of software. Assume that this graph consists of a finite set of components (nodes): classes, modules, files, packages, etc. and relationships (edges) between components: inherit, import, include, call, instantiate, etc. Problem: How do we partition the nodes of the graph into clusters (subsystems)? 12/8/2018 COSC6431

Software Clustering Challenges There are many ways to partition a graph into clusters. How do we create efficient algorithms to find partitions of the graph that are representative of a system’s structure? How do we distinguish between “good” partitions, and “bad” partitions? 12/8/2018 COSC6431

How Hard is this Problem? If every partition of the graph is considered, the number of partitions that will need to be investigated is: The above recursive equation grows exponentially with respect to the number of nodes (n) in the graph (each partition 1kn clusters). Sn,k for some values of n: 1=1; 5=52; 10=115,975; 15=1,382,958,545; 20=51,724,158,235,372 12/8/2018 COSC6431

Some solutions Enumerating every possible partition of the software structure graph is not practical. Heuristics can be used to reduce the number of partitions: Searching algorithms Knowledge about the source code Names,directory structure, designer input Remove entities that provide little structural value Libraries, omnipresent nodes Result is sub-optimal, but often adequate. 12/8/2018 COSC6431

Why is clustering useful? Helps new developers create a mental model of the software structure. Especially useful in the absence of experts or accurate design documentation. Helps developers understand the structure of legacy software. Enables developers to compare the documented structure with the automatically created (actual) structure. 12/8/2018 COSC6431

Example (before) 12/8/2018 COSC6431

Example (after) Main Error handler Tokenizer Parser Rewriter Semantics Generator Dictionary 12/8/2018 COSC6431

Modern Relevance of Software Clustering Clustering has been studied for many years in the fields of mathematics, science and engineering. Clustering research in software engineering increased because of Y2K and the ‘webifying’ of legacy systems. New clustering approaches have been developed, and classical clustering techniques have been modified to work with software structures. 12/8/2018 COSC6431

Creating Clusters at Design Time Parnas (1972) – Information Hiding Hide program “secrets” behind interfaces A manual form of clustering Object Oriented Design (Booch, 1994) Objects group (cluster) related data and operations that act upon the data. Booch suggests principles that are commonly used in clustering research: Abstraction Encapsulation Hierarchies & Modularity 12/8/2018 COSC6431

Software Clustering Research Clustering Procedures/Functions into Modules Clustering Modules/Classes into Subsystems Evaluating clustering algorithms Measuring distance between partitions Algorithm stability 12/8/2018 COSC6431

Clustering Techniques There are many different clustering techniques, but they all need to consider (Wiggerts, 1997): Representation: The entities and relationships to be clustered Similarity: What determines the degree of similarity between the software entities Algorithms: Algorithms that use the similarity measurement to make clustering decisions 12/8/2018 COSC6431

Representation There are many choices based on the desired granularity of recovered system design Entities may be variables/procedures or modules/classes. What types of relationships will be considered? Will the relationships be weighted? 12/8/2018 COSC6431

Similarity Similarity measurements are used to determine the degree of “similarity” between a pair of entities Different types: Association coefficients: Based on common features that exist (or do not exist) between a pair of entities Most common type of similarity measurement Distance measures: Measure of the degree of dissimilarity between entities. 12/8/2018 COSC6431

Example Similarity Measurement Classical similarity measurements: Entity j 1 a: Number of common features in entity i and entity j 1 a b b: Number of features unique to entity j Entity i c: Number of features unique to entity i c d d: Number of features absent in both entity i and entity j Anquetil et. al. (1999) compared the Simple and Jaccard algorithms and found that overall the Jaccard algorithm produced better results. 12/8/2018 COSC6431

Agglomerative hierarchical algorithm Start by creating one cluster for each object Join the two most similar objects into one cluster Continue joining the two most similar objects/clusters until everything is in one cluster What you get is a dendrogram… 12/8/2018 COSC6431

Dendrogram example A B C D E 0.2 C D E 0.4 Similarity A B 0.5 D E 0.7 12/8/2018 COSC6431

Cut height By choosing to “cut” the dendrogram at a particular height, we can create a partition of the set of objects, e.g. a cut height of 0.45 in the previous example would give us 3 clusters Finding an appropriate cut height is a tough problem Heuristics, such as the number of clusters, are usually employed 12/8/2018 COSC6431

Update rule How to determine the similarity between two already formed clusters (or an object and a cluster) Many possibilities Minimum of all pair-wise similarities Maximum of all pair-wise similarities Weighted or unweighted averages 12/8/2018 COSC6431

Data Bindings Hutchens & Basili (1985) A data binding classifies the similarity between two procedures based on the common variables that are within the static scope of the two procedures. Useful for clustering procedures and variables into modules. Uses hierarchical clustering algorithms to form clusters from the data bindings. Addressed several aspects of clustering Stability Consistency between a clustered view and a designers view 12/8/2018 COSC6431

Machine Learning Schwanke (1991) Arch is a semi-automatic clustering technique that is based on using machine learning to maximize cohesion and minimize coupling between software components. Maverick analysis is a unique feature of Arch where misplaced procedures are relocated to more appropriate modules. Maverick procedures share many features with procedures in other modules. 12/8/2018 COSC6431

Concept Analysis Lindig & Snelting (1997) Used for clustering procedures and variables into modules. A concept is defined as C=(P,V), where P is a set of procedures V is a set of variables All procedures in P use only variables in V All variables in V are only used by procedures in P A set of concepts can be represented as a lattice. The lattice can be transformed into a “tree-like” structure to form the modules. 12/8/2018 COSC6431

Example All procedures below a lattice node V1 V2 V3 V4 V5 V6 V7 V8 V3,V4 P1 X X P2 X X X V1,V2 P1 V5 P2 V6,V7,V8 P3 P3 X X X X X P4 X X X X X X P4 All procedures below a lattice node use the variables in the node All variables above a lattice node are used by the procedures in the node 12/8/2018 COSC6431

The Rigi Tool Müller et. al. (1992) Clusters are subsystems (collections of modules) Rigi – a semi-automatic clustering tool Clustering based on heuristics such as measuring the relative strength between subsystems Interconnection Strength (IS) measurement Other interesting research aspects: Omnipresent modules Use of module and directory names to make clustering decisions (further researched by Anquetil et. al.) 12/8/2018 COSC6431

Automatic Clustering Choi & Scacchi (1990) Goal is to automatically restructure (cluster) legacy systems. Build resource flow graph (RFG) Nodes are modules. An edge is placed from node A to node B if module A provides one or more resources to module B. Clustering approach is based on partitioning the RFG by finding articulation points in the graph. 12/8/2018 COSC6431

Data Mining Clustering Montes de Oca & Carver (1994) Apply data mining techniques that have been developed for databases to software clustering Data mining can find non-trivial relationships between elements in a database. Software Clustering can find non-obvious relationships between source code components. Data mining can find interesting relationships in databases without upfront knowledge of the objects being studied Developers who want to cluster are typically not familiar with the structure of the system. 12/8/2018 COSC6431

Data Mining Clustering Montes de Oca & Carver (1994) Data mining techniques are designed to work with a large amount of information efficiently Most clustering tools are very slow because of the complexity of the software clustering problem. 12/8/2018 COSC6431

Optimization-based Clustering Mancoridis et. al. (1998) “Treat automatic clustering as an optimization problem” Automatic clustering technique is implemented as a Java tool called Bunch. Bunch is fully automatic, but can exploit designer knowledge when it is available. Partitions a Module Dependency Graph into a subsystem hierarchy. Like Arch, Bunch attempts to maximize cohesion and minimize coupling. 12/8/2018 COSC6431

Using Names of Source Files Anquetil & Lethbridge (1999) Anquetil and Lethbridge did research on using the names of source files to determine similarity. Technique includes dictionary lookup and substring analysis. Using file names produced good results for the systems that were studied. 12/8/2018 COSC6431

Subsystem patterns Tzerpos & Holt (2000) Subsystems must be familiar to the developers Good names are important Subsystems need to have a relatively small number of contents (otherwise further decomposition is required) More details to follow… 12/8/2018 COSC6431