Things Structural Clones Tell that Simple Clones Don’t Hamid Abdul Basit 1
Software Clones Simple clones – the same or similar code fragments Structural clones – higher-level, larger similarities – Similarity of code and similarity of structure 2
Simple clones 3
4 Structural Clones
CreateTask.BL ValidateTask() Task.DB AddTask() Task Table accesses Project.DB AddProject() Project Table executes CreateProject.BL ValidateProject() accesses executes visualizes User Interface Business Logic Database CreateTask CreateProject CreateTask.UI CreateTask() CreateProject.UI CreateProject() Collaborative structural clone Structural Clones 5
When are structural clones useful? showing a bigger picture of similarity situation - the forest from the trees Finding refactoring opportunities Architecture recovery, program understanding and maintenance – Structural clones often represent application domain or design concepts Re-engineering for reuse – The bigger the clones the better for reuse Some benefits for plagiarism detection 6
7 A structure is a graph A D C B xy z w Entities {A,B,C,D} Relationships {w,x,y,z}
Entities Physically defined – Code fragments, files, web pages, directories Semantically defined – Methods, classes, packages Conceptually defined – Components, sub-systems 8
Relationships Physical co-location – Same file, same directory Runtime – Message passing – Hyperlink between web pages Design level – Inheritance – Association – Composition 9
e 8b e 6b x w z x z e 7b x y z e 8d y e 6d y e 7d y e 8c e 6c e 7c e 8a S8 e 6a S6 e 7a S7 e 4b x y z e 5b x y z e 4d y e 5d y e 4c e 5c e 4a S4 e 5a S5 Structural Clones 10
Higher level similarities are composed of lower level similarities Can be recovered by finding repeating configurations of lower level similarities 11 Observation
Detecting Structural Clones 12 Simple clones in files clone patterns ,4,8,10,11,12 1,4,7,8,10,11,12 2,5,9,13,15
Detecting Structural Clones 13 File Analysis
Detecting Structural Clones 14 Directory Analysis
Detecting Structural Clones 15 File Level Structural Clone Across Directories
a1 b1 c1 a2 b2 c2 F1 F2 Simple Clone Structure (SCS) Across Files 16
F1 F2 File Clone Class (FCC) 17
File Clone Structure (FCS) Across / Within Directories 18 F1 X1Z2 F2 X2 Y1Y2 D1 D2 Z1
STRUCTURAL CLONES DETECTED BY CLONE MINER Simple Clone Structures (SCS) Across Methods Simple Clone Structures (SCS) Across Files Method Clone Classes (MCC) Method Clone Structures (MCS) Across Files File Clone Classes (FCC) File Clone Structures (FCS) Across Directories File Clone Structures (FCS) Within Directories 19
In earlier work We hypothesized the benefits of structural clones – Re-engineering for reuse, architecture recovery, … Defined structural clones Implemented Clone Miner – structural clone detector Did initial empirical evaluation 20
In work presented here How frequent are the different types of structural clones? Are structural clones more meaningful for program understanding and design recovery than simple clones? What is the value added by structural clone detection in identifying refactoring opportunities? 21
Case Study Systems 22 Apache Ivy Apache Ant Columba Dnsjava Javax-Swing ds/index.html JFreeChart ANTLR DrJava FreeCol JEdit JHotDraw
Apache -IvtApache-AntColumbaDnsJavaJavax-SwingJFreechartAntlr Tokens Files Lines Of Code Methods AntlrDrJavaFreeColJEditJHotdrawAVERAGETOTAL Tokens Files Lines Of Code Methods Systems’ Overview 23
AVERAGETOTAL SCC SCC Instances Average Length of SCC Instance48 Methods Containing SCC % Methods Containing SCC22 Files Containing SCC % Files Containing SCC53 Directories Containing SCC51504 % Directories Containing SCC83 Simple Clone Classes (SCC) 24
% Methods Containing SCC 25
How frequent are the different types of structural clones? 26 1
Structural Clones are Frequent Simple clones tend to occur in groups – 56% of simple clones are within structural clones There are less structural clones than simple clones 27
Simple Clone Classes (SCC) and Simple Clone Structures (SCS) 28
AVERAGETOTAL SCS SCS Instances Average Instances In SCS3 Average Token Count (ATC) in SCS87 Average % Cover(APC) in SCS9 SCC Covered By SCS % SCC Covered By SCS54 Methods Containing SCS % Methods Containing SCS14 Files Containing SCS % Files Containing SCS48 Directories Containing SCS48467 % Directories Containing SCS80 Simple Clone Structures (SCS) Across Files 29
% SCC Covered By SCS Across Files 30
AVERAGETOTAL SCS SCS Instances Average Instances In SCS3 SCC Covered By SCS % SCC Covered By SCS55 Methods Containing SCS % Methods Containing SCS15 Files Containing SCS % Files Containing SCS36 Directories Containing SCS41413 % Directories Containing SCS72 Simple Clone Structures (SCS) Within Files 31
% Files Containing SCS Within Files 32
AVERAGETOTAL SCS SCS Instances Average Instances In SCS3 Average Token Count (ATC) in SCS55 Average %Cover(APC) in SCS53 SCC Covered By SCS % SCC Covered By SCS86 Methods Containing SCS % Methods Containing SCS21 Files Containing SCS % Files Containing SCS52 Directories Containing SCS50492 % Directories Containing SCS82 Simple Clone Structures (SCS) Across Methods 33
% SCC Covered By SCS Across Methods 34
Method Clone Classes (MCC) and Method Clone Structures (MCS) 35
AVERAGETOTAL MCC MCC Instances Average Instances In MCC2 SCC Covered By MCC % SCC Covered By MCC23 Methods Covered By MCC % Methods Covered By MCC7 Files Containing MCC % Files Containing MCC31 Directories Containing MCC39385 % Directories Containing MCC66 Method Clone Classes (MCC) 36
AVERAGETOTAL MCS22230 MCS Instances58612 Average Instances In MCS2 MCC Covered By MCS45484 % MCC Covered By MCS4 Methods Covered By MCS % Methods Covered By MCS2 Files Containing MCS36384 % Files Containing MCS6 Directories Containing MCS10106 % Directories Containing MCS23 Method Clone Structures (MCS) Across Files 37
File Clone Classes (FCC) and File Clone Structures (FCS) 38
AVERAGETOTAL FCC10112 FCC Instances29314 Average Instances In FCC3 SCC Covered By FCC69755 % SCC Covered By FCC5 Files Containing FCC28307 % Files Containing FCC4 Directories Containing FCC10110 % Directories Containing FCC23 File Clone Classes (FCC) 39
AVERAGETOTAL FCS440 FCS Instances1098 Average Instances In FCS2 FCC Covered By FCS438 % FCC Covered By FCS40 Files Containing FCS12116 % Files Containing FCS2 Directories Containing FCS764 % Directories Containing FCS16 File Clone Structures (FCS) Across Directories 40
AVERAGETOTAL FCS886 FCS Instances25267 Average Instances In FCS2 FCC Covered By FCS883 % FCC Covered By FCS66 Files Covered By FCS20218 % Files Covered By FCS3 Directories Containing FCS664 % Directories Containing FCS11 File Clone Structures (FCS) Within Directories 41
Are structural clones more meaningful for program understanding and design recovery than simple clones? 42 2
Improved Program Understanding and Design Recovery Analysis is more qualitative than quantitative – anecdotal evidences of interesting examples of various types of structural clones Larger program parts recovered as clones from a system are expected to be more meaningful than smaller ones High level structural clones like FCC and FCS appear to be a very useful tool for design recovery because of their size, highlighting the design level similarities between various parts of the system 43
FCC Examples from Apache-Ant TCPC FTP.java458652% FTPTaskMirrorImpl.java457663% TarFileSetTest.java45190% ZipFileSetTest.java45190% 44
TCPC ColonialAIPlayer.java513076% StandardAIPlayer.java508452% ReportCargoPanel.java68274% ReportNavalPanel.java68669% FCC examples from FreeCol 45
FCC examples from DrJava 46 File NamesTCPC BackSlashTest.java205084% SingleQuoteTest.java205088%
FCC examples from JFreeChart 47 File NamesTCPC CombinedDomainCategoryPlot.java100850% CombinedDomainXYPlot.java100552% CombinedDomainXYPlot.java118261% CombinedRangeCategoryPlot.java118571% CombinedRangeXYPlot.java118263%
FCC Example from Javax-Swing 48 File NamesTCPC MultiButtonUI.java64489 % MultiColorChooserUI.java64489 % MultiComboBoxUI.java79587 % MultiDesktopIconUI.java64489 % MultiDesktopPaneUI.java64489 % MultiFileChooserUI.java83375 % MultiInternalFrameUI.java64489 % MultiLabelUI.java64489 % MultiListUI.java70873 % MultiMenuBarUI.java64489 % MultiMenuItemUI.java64488 % MultiOptionPaneUI.java73787 % MultiPanelUI.java64489 % MultiPopupMenuUI.java70279 % MultiProgressBarUI.java64489 %
FCC Example from Javax-Swing 49 File NamesTCPC MultiScrollBarUI.java64489 % MultiScrollPaneUI.java64489 % MultiSeparatorUI.java64489 % MultiSliderUI.java64489 % MultiSpinnerUI.java64489 % MultiSplitPaneUI.java86280 % MultiTabbedPaneUI.java70874 % MultiTableHeaderUI.java64489 % MultiTableUI.java64489 % MultiTextUI.java79853 % MultiToolBarUI.java64489 % MultiToolTipUI.java64489 % MultiTreeUI.java85961 % MultiViewportUI.java64489 % MultiRootPaneUI.java64489 %
DirectoryTC PCFile Name renderer/category/78651 %GradientBarPainter.java renderer/xy/78651 %GradientXYBarPainter.java renderer/category/5377 %BarPainter.java renderer/xy/5376 %XYBarPainter.java FCS Example from JFreeChart 50
51 Clones 77%, 53 tokens Clones 51%, 786 tokens Clones 77%, 53 tokens Clones 51%, 786 tokens GradientBarPainter.java BarPainter.java GradientXYBarPainter.java XYBarPainter.java Implements FCS Example from JFreeChart
FCS Within Directory in Columba 52 File NamesTCPC DownAction.java24765 % UpAction.java24763 % NextMessageAction.java19962 % PreviousMessageAction.java19962 % NextUnreadMessageAction.java22971% PreviousUnreadMessageAction.java22971 %
What is the value added by structural clone detection in identifying refactoring opportunities? 53 3
Better Refactoring Help Analysis of structural clones is helpful in locating places where high-level duplication is present that can be restructured or refactored. We can use the Form Template Method refactoring to unify similar methods that follow the same high-level algorithm but have implementation variations, with the Template Method design pattern Simple clones appear as candidates for the Extract Method refactoring, however, MCS Across Files could also indicate possible applications of Extract Super Class refactoring 54
Template method design pattern 55
traverse(){ backtrack=false; cur.i++; cur.next=n; checkSent(true); } init(); cur = findNext(); jumptToNext(); serialize(cur) send(cur); updatePos(-1); processGraph(){ setEdge(cur); cur.next=cur.i; checkSent(true); } init(); cur = findNext(); jumptToNext(); serialize(cur) send(cur); updatePos(1); A structural clone suitable for Form Template Method Refactoring (a toy example) 56
Target for Form Template Method refactoring from Javax-Swing protected void layoutVScrollbar(JScrollBar sb) { Dimension sbSize = sb.getSize(); Insets sbInsets = sb.getInsets(); int itemW = sbSize.width – (sbInsets.left+bInsets.right); int itemX = sbInsets.left; boolean squareButtons = DefaultLookup.getBoolean( scrollbar, this, "ScrollBar.squareButtons", false); int decrButtonH = squareButtons ? itemW : decrButton.getPreferredSize().height; int decrButtonY = sbInsets.top; int incrButtonH = squareButtons ? itemW : incrButton.getPreferredSize().height; int incrButtonY = sbSize.height - (sbInsets.bottom + incrButtonH); int sbInsetsH = sbInsets.top + sbInsets.bottom; int sbButtonsH = decrButtonH + incrButtonH; float trackH = sbSize.height – (sbInsetsH + sbButtonsH); float min = sb.getMinimum(); float extent = sb.getVisibleAmount(); float range = sb.getMaximum() - min; float value = getValue(sb); int thumbH = (range <= 0)? getMaximumThumbSize().height : (int)(trackH * (extent / range)); thumbH = Math.max(thumbH, getMinimumThumbSize().height); thumbH = Math.min(thumbH, getMaximumThumbSize().height); int thumbY = incrButtonY - thumbH; if (value < (sb.getMaximum() – sb.getVisibleAmount())) { float thumbRange = trackH - thumbH; thumbY = (int)(0.5f + (thumbRange * ((value – min) / (range - extent)))); thumbY += decrButtonY + decrButtonH; } int sbAvailButtonH = (sbSize.height - sbInsetsH); if (sbAvailButtonH < sbButtonsH) { incrButtonH = decrButtonH = sbAvailButtonH / 2; incrButtonY = sbSize.height - (sbInsets.bottom + incrButtonH); } decrButton.setBounds(itemX, decrButtonY, itemW, decrButtonH); incrButton.setBounds(itemX, incrButtonY, itemW, incrButtonH); int itrackY = decrButtonY + decrButtonH; int itrackH = incrButtonY - itrackY; trackRect.setBounds(itemX, itrackY, itemW, itrackH); if(thumbH >= (int)trackH){ setThumbBounds(0, 0, 0, 0); } else { if ((thumbY + thumbH) > incrButtonY) { thumbY = incrButtonY - thumbH; } if (thumbY < (decrButtonY + decrButtonH)) { thumbY = decrButtonY + decrButtonH + 1; } setThumbBounds(itemX, thumbY, itemW, thumbH); } protected void layoutHScrollbar(JScrollBar sb) { Dimension sbSize = sb.getSize(); Insets sbInsets = sb.getInsets(); int itemH = sbSize.height – (sbInsets.top + sbInsets.bottom); int itemY = sbInsets.top; boolean ltr = sb.getComponentOrientation().isLeftToRight(); boolean squareButtons = DefaultLookup.getBoolean( scrollbar, this, "ScrollBar.squareButtons", false); int leftButtonW = squareButtons ? itemH : decrButton.getPreferredSize().width; int rightButtonW = squareButtons ? itemH : incrButton.getPreferredSize().width; if (!ltr) { int temp = leftButtonW; leftButtonW = rightButtonW; rightButtonW = temp; } int leftButtonX = sbInsets.left; int rightButtonX = sbSize.width - (sbInsets.right + rightButtonW); int sbInsetsW = sbInsets.left + sbInsets.right; int sbButtonsW = leftButtonW + rightButtonW; float trackW = sbSize.width – (sbInsetsW + sbButtonsW); float min = sb.getMinimum(); float max = sb.getMaximum(); float extent = sb.getVisibleAmount(); float range = max - min; float value = getValue(sb); int thumbW = (range <= 0)? getMaximumThumbSize().width : (int)(trackW * (extent / range)); thumbW = Math.max(thumbW, getMinimumThumbSize().width); thumbW = Math.min(thumbW, getMaximumThumbSize().width); int thumbX = ltr ? rightButtonX - thumbW : leftButtonX + leftButtonW; if (value < (max - sb.getVisibleAmount())) { float thumbRange = trackW - thumbW; if( ltr ) { thumbX = (int)(0.5f + (thumbRange * ((value - min) / (range - extent)))); } else { thumbX = (int)(0.5f + (thumbRange * ( (max - extent - value) / (range - extent)))); } thumbX += leftButtonX + leftButtonW; } int sbAvailButtonW = (sbSize.width - sbInsetsW); if (sbAvailButtonW < sbButtonsW) { rightButtonW = leftButtonW = sbAvailButtonW / 2; rightButtonX = sbSize.width - (sbInsets.right + rightButtonW); } (ltr ? decrButton : incrButton).setBounds (leftButtonX, itemY, leftButtonW, itemH); (ltr ? incrButton : decrButton).setBounds(rightButtonX, itemY, rightButtonW, itemH); int itrackX = leftButtonX + leftButtonW; int itrackW = rightButtonX - itrackX; trackRect.setBounds(itrackX, itemY, itrackW, itemH); if (thumbW >= (int)trackW) { setThumbBounds(0, 0, 0, 0); } else { if (thumbX + thumbW > rightButtonX) { thumbX = rightButtonX - thumbW; } if (thumbX < leftButtonX + leftButtonW) { thumbX = leftButtonX + leftButtonW + 1; } setThumbBounds(thumbX, itemY, thumbW, itemH); } 57
Candidates for Form Template Method from JFreeChart ClassesMethods GroupedStackedBarRendererStackedBarRendererdrawItem() XYAreaRendererXYAreaRenderer2drawItem() LineAndShapeRendererScatterRenderergetLeagendItem() DataAxisPeriodAxisvalueToJava2D() ComparableObjectSeriesXYSeriesadd() CategoryPlotXYPlotreadObject() BarRendererLevelRendererdrawItem() DataAxisNumberAxisrefreshTicksHorizontal() XYBubbleRendererXYLineAndShapeRenderergetLeagendItem() MiddlePinNeedlePinNeedledrawNeedle() ComparableObjectSeriesXYSerieshashCode() AreaRendererBoxAndWhiskerRenderergetLegendItem() XYAreaRendererXYStepAreaRendererdrawItem() CompassPlotPiePlotsetSeriesNeedle() ColumnArrangementFlowArrangementarrangeNF() LogAxisNumberAxisselectVerticalAutoTickUnit() PaintMapStrokeMapequals() XYLineAndShapeRendererXYShapeRendererdrawSecondaryPass() MinuteSecondparseMinute() 58
CombinedDomainCategoryPlot.javaCombinedDomainXYPlot.java CombinedDomainCategoryPlot getGap setGap add remove getSubplots findSubplot zoomRangeAxes calculateAxisSpace draw setFixedRangeAxisSpaceForSubplots setOrientation getDataRange getLegendItems getCategories getCategoriesForAxis handleClick plotChanged equals clone CombinedDomainXYPlot getPlotType setOrientation getDataRange getGap setGap add remove getSubplots calculateAxisSpace draw getLegendItems zoomRangeAxes findSubplot setRenderer setFixedRangeAxisSpace setFixedRangeAxisSpaceForSubplots handleClick plotChanged equals clone Extract Superclass candidate from JFreeChart 59
Extract Super Class and Structural Clones 60 The two files contain a number of similarly named methods, but only through structural clone analysis we could find those methods that are also significantly similar in contents
Another Candidate for Extract Superclass from JFreeChart 61 FlowArrangementColumnArrangement FlowArrangement add arrange arrangeFN arrangeFR arrangeFF arrangeRR arrangeRF arrangeRN arrangeNN arrangeNF clear equals ColumnArrangement add arrange arrangeFF arrangeRR arrangeRF arrangeNN arrangeNF clear equals
More Candidates for Extract Superclass from JFreeChart 62 BarRendererLevelRenderer LogAxisNumberAxis AbstractCategoryItemRendererAbstractXYItemRenderer MilliSecondSecond PaintMapStrokeMap CombinedDomainXYPlotCombinedRangeXYPlot MinuteSecond CombinedRangeCategoryPlotCombinedDomainCategoryPlot CombinedRangeCategoryPlotCombinedRangeXYPlotCombinedDomainCategoryPlot DefaultIntervalXYDatasetDefaultXYDatasetDefaultXYZDataset
Conclusions Structural clones often represent important design concepts Structural clone detection becomes an aid in design recovery Structural clones show a bigger picture of code duplication and guides to the correct refactoring technique applicable in the given situation 63
Future Work Classification of structural clones Detection of other types of structural clones Better integration of structural clones with architecture recovery and re-modularization techniques Better visualization of structural clones Management of structural clones with meta- programming Re-engineering for reuse 64
65 Thank you