Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell

Similar presentations


Presentation on theme: "1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell"— Presentation transcript:

1 1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell bmitchell@drexel.edu or http://www.mcs.drexel.edu/~bmitchel Department of Computer Science College of Engineering Drexel University Philadelphia, PA, 19104 USA

2 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 2 Understanding Large Systems is HARD Example: RedHat Linux 7.1 Kernel 1,400 modules, 2.5M LOC System 350K modules, 30M LOC Languages: > 19 (including scripting) [http://www.dwheeler.com/sloc] Manual Analysis is Tedious and Error Prone Source Code Analysis Approaches Create Large Repositories Software Clustering Approaches Create Abstract Representations (1) (2) (3)

3 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 3 Software Clustering Bunch Tool Requires a Representation... …A Clustering Algorithm… …And a way to Represent Results… Researchers Have Examined Many Different Approaches for Software Clustering

4 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 4 Search-Based Software Clustering with Bunch Bunch Uses Metaheuristic Search Algorithms for Software Clustering

5 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 5 Bunch Example The MDG The Random Start Point The Solution

6 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 6 Evaluating Bunch’s Results Observation: Bunch produces similar results This is desirable, but This is unexpected considering the use of metaheuristic search algorithms Some evaluation has been done “Good Enough” via empirical studies Similarity Analysis [WCRE01,ICSM01] Comparing to spectral clustering techniques [WCRE02] We were intrigued to investigate why Bunch’s results are consistently similar Bunch Produces A “Family” of Related Results

7 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 7 The Search Landscape Search Landscape Modeler Structural LandscapeSimilarity Landscape What are some common properties, if any, in the MDG partitions? How similar are the contents of the MDG partitions? MDG Bunch Tool Clustering Results Cluster a System Many Times, Look for Patterns in the Clustering Results that Provide Insight into the Search Space Can Modeling the Search Space be useful for Evaluation?

8 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 8 The Structural Landscape – What do we Expect? The Structural Landscape is Modeled using a Series of Views MQ vs Number of Clusters Intra- Edge Density MQ Value Number of Clusters We expect to see a relationship between MQ and the number of clusters. Both MQ and the number of clusters in the partitioned MDG should not vary widely across clustering runs. We expect a good result to produce a high percentage of intraedges (edges that start and end in the same cluster) consistently. We expect repeated clustering runs to produce similar MQ results. We expect that the number of clusters remains relatively consistent across multiple clustering runs. Comparing Bunch’s Final Results against the Initial Random Partitioned MDG

9 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 9 The Similarity Landscape – What do we Expect? a bc CLUSTER Other Clusters  edges (Intra-Edges)  edges (Inter-Edges) 1.Create a counter C for each edge, initialize to zero 2.Cluster a system many times, For each run: For each edge, Increment C if is an Intraedge 3.After all Runs, determine P which is the percentage of times that each appeares as an Intraedge None Low MediumHigh Aggregate the P based on the level of agreement LARGE Dissimilarity MODERATE Dissimilarity NOT Similar VERY Similar Our Expectations

10 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 10 Case Study System Name Number Modules Number Relations Description Telnet2881Terminal Emulator PHP62191Internet Scripting Language Bash92901Unix Terminal Environment Lynx1481,745Text-Based HTML Browser Bunch220764Software Clustering Tool Swing4131,513Standard Java User Interface Framework Kerberos 55583,793Security Services Infrastructure We also looked at 6 randomly generated MDGs

11 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 11 Structural Landscape (1) The independent samples were ordered by MQ to highlight some relationships that would not be obvious otherwise.

12 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 12 Structural Landscape (2)

13 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 13 Structural Landscape (3) – Random MDGs

14 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 14 Structural Landscape (4) – Random MDGs

15 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 15 Structural Landscape - Observations There was significant commonality across the clustering results Many desirable aspects A lot of commonality between the random and open source systems Some additional variability in the MQ vs Cluster Size relationship for the random MDGs More variability in the clustering results for the random graphs with higher edge densities

16 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 16 Similarity Landscape (1) 100 90 80 70 60 50 40 30 20 10 0 ZeroLowMediumHigh 35 61 51 12 47 32 14 30 22 21 54 35 7 13 9 34 27 12 25 18 0 28 6 Open Source Systems Random MDGs

17 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 17 Similarity Landscape (2) 100 90 80 70 60 50 40 30 20 10 0 ZeroLowMediumHigh 35 61 51 12 37 25 14 30 22 21 54 38 7 13 9 24 19 12 25 18 9 28 18 Open Source Systems Random MDGs - Low Random MDGs - High 29 47 36 24 38 32 28 35 32 0

18 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 18 Observations – Similarity Landscape Open Source systems exhibited expected trends High dissimilarity and high similarity Low medium similarity Random MDGs had much higher medium similarity, and almost no high-similarity We think that this might be due to isomorphism in the clustering results  Why: The variability in the number of clusters with similar MQ that we observed from the structural landscape

19 Drexel University Software Engineering Research Group (SERG) http://serg.mcs.drexel.edu 19 Conclusions Ideally evaluation can be performed by comparing Bunch’s results to a benchmark Not possible – Graph partitioning is NP-Hard Empirical feedback indicates that the results are “good enough” Up to this point and time no investigation has been performed on why Bunch produces consistent results The Search Landscape model provided a lot of intuition into Bunch’s behavior We examined both the structural and similarity aspects of the search landscape The Search Landscape approach seems appropriate for modeling other metaheuristic search algorithms


Download ppt "1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell"

Similar presentations


Ads by Google