Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Objects Using Cluster and Concept Analysis Arie van Deursen Tobias Kuipers CWI, The Netherlands.

Similar presentations


Presentation on theme: "Identifying Objects Using Cluster and Concept Analysis Arie van Deursen Tobias Kuipers CWI, The Netherlands."— Presentation transcript:

1

2 Identifying Objects Using Cluster and Concept Analysis Arie van Deursen Tobias Kuipers CWI, The Netherlands

3 Motivation Legacy code incomprehensible –Lack of structure Case: >100,000 LOC Banking System –Cobol + VSAM data files Customer wanted OO redesign Data central to the system

4 General Plan Find interesting data –Data selection –Candidate attributes Find interesting functionality –Program selection (procedure) –Candidate methods Combine the two –Candidate classes

5 Input Selection Domain related v. Implementation specific Persistent data stores –Only records written to/read from file –Refine by CRUD (Create/Read/Update/Delete) –Records too big for one class Analysis of Program Call Graph –high fan-out: control-programs –high fan-in: low-level technical

6 Combining Data & Functionality Cluster analysis -- technique for finding groups in data –Relies on metrics to compare distance between data items Concept analysis -- for finding groups too –Relies on maximal subsets of data items sharing a set of features

7 Cluster Analysis Calculate distance (similarity) number between all data items (record fields) Use clustering to find hierarchy

8 Dendrogram 01 Name Title Initial Prefix

9 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode

10 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode Distance is 1

11 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode City Distance is 1

12 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode City Street

13 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode City Street

14 Dendrogram 01 Name Title Initial Prefix Number Nb-Ext Zipcode City Street

15 Dendrogram from Real Data 0 1 2 Amount AccountOfficeName BankCity IntAccount OfficeType PaymentKind RelationNr ChangeDate TitleCd Prefix Initial ZipCd CountyCd StreetNr MortSeqNr MortNr City Street Name

16 Concept Analysis Relies on maximal subsets of data items sharing a set of features Concept analysis finds a lattice

17 Concept Lattice  All Variables top bottom P1 P2 P3 P4  Set of features Set of items (field names)

18 Concept Lattice  top P1 Name Title Initial Prefix P4 Number Nb-Ext Zipcode Street City P1 P2 P3 P4  bottom All Variables

19 Concept Lattice  top P1 Name Title Initial Prefix P4 P1 P2 P3 P4  P3 P4 Street P2 P4 City Number Nb-Ext Zipcode Street City All Variables bottom

20 Concept Lattice  top P1 Name Title Initial Prefix P4 P1 P2 P3 P4  P3 P4 Street P2 P4 City All Variables Number Nb-Ext Zipcode Street City bottom

21 Real Concept Lattice A B C D E F 1 2 3 4 G H I J K L 5 M N O P 6 QRS T U V W X 7 8 9 10 11 1213 14

22 Concluding Remarks Variable Selection - Input filtering Records are natural starting point in data- intensive applications –Legacy/Cobol domain Records are too big: Decompose them Cluster analysis v. Concept analysis

23 Cluster v Concept Analysis Multiple partitionings –Clustering does not show all possibilities Items in multiple groups Features and clusters –Origin of cluster decision is lost Concept more efficient computationally Clustering needs more filtering

24 Questions

25 Current Approaches Subsystem classification techniques –Survey, Lakhotia 97. Don’t work for Cobol, Cimitile 99 Record as data part of a class –Newcomb & Kotik (‘95) take level 01 records, Fergen et al (94) compare structure of records for reuse Manual Methodology –Sneed (‘92) provides manual methodology for migration of code, Sneed & Nyári (‘95) derive ‘OO’ documentation from legacy.


Download ppt "Identifying Objects Using Cluster and Concept Analysis Arie van Deursen Tobias Kuipers CWI, The Netherlands."

Similar presentations


Ads by Google