Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy

Similar presentations


Presentation on theme: "1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy"— Presentation transcript:

1 1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy batini@disco.unimib.it

2 2 Goal and contents Use of an existing repository of schemas, representing relevant info managed in Central Public administration. –500 main databases –500 conceptual schemas organized in a Repository –5.000 entities and 10.000 attributes Produce the corresponding repository for a group of regional local administrations in Piedimont –450 relational schema available –About 15.000 relational tables Human resources available –2 person/years Heuristics and related methodology Experiments Recent developments

3 3 Organization of the Central and Local Public Administration in Italy

4 4 Organization of Central and Local Public Administration in Italy Central PA –50 Ministeries and other Agencies Local PA –21 Regions –More than 100 Provinces –More than 8.000 municipalities

5 5 Organization of the central PA Repository

6 6 An example of a repository in the small FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head Company Sales Production Born CITY Department structure

7 7 An example of a repository in the small FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head Company Sales Production Born CITY Department structure integration WARR ITEM ORDER FLOOR DEPARTM EMPLOYEE CITY SELLER CLERK ENGINEER Gest Lav. PURCH In WARE Loc Man In Of of Head Born Type

8 8 An example of a repository in the small FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head Company Sales Production ITEM ORDER DEPARTEMPLOYEECITY SElLER Man PURCH In Of Acq Born ITEM DEP D-E In EMP-DATA ORD-DATA Man Acq CITY Department structure integration abstraction WARR ITEM ORDER FLOOR DEPARTM EMPLOYEE CITY SELLER CLERK ENGINEER Gest Lav. PURCH In WARE Loc Man In Of of Head Born Type abstraction

9 9 An example of a repository in the small DEP EMP Man CITY Born DEP EMP-DATA D-E FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEMORD EMPL SELLER PUR In Of Acq ITEMORD-DATA EMP-DATA In Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head ITEM DEP EMP-DATA D-E ProducT ITEM DEPARTEMPLOYEE Product Company Sales Production ITEM ORDER DEPARTEMPLOYEECITY SElLER Man PURCH In Of Acq Born CITY Born ITEM DEP D-E In EMP-DATA ORD-DATA Man Acq CITY Department structure integration view abstraction WARR ITEM ORDER FLOOR DEPARTM EMPLOYEE CITY SELLER CLERK ENGINEER Gest Lav. PURCH In WARE Loc Man In Of of Head Born Type abstraction

10 10 Views not represented FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head Company Sales Production ITEM ORDER DEPARTEMPLOYEECITY SElLER Man PURCH In Of Acq Born ITEM DEP D-E In EMP-DATA ORD-DATA Man Acq CITY Department structure integration abstraction WARR ITEM ORDER FLOOR DEPARTM EMPLOYEE CITY SELLER CLERK ENGINEER Gest Lav. PURCH In WARE Loc Man In Of of Head Born Type abstraction

11 11 Only some abstractions represented FLOOR DEP EMP Man In Head CITY BornITEMORD EMP SELLER PUR WARE Loc.. In Of Acq ITEM DEP EMP CLERKENGIN WARR Prod. Head Company Sales Production Born ITEM DEP D-E In EMP-DATA ORD-DATA Man Acq CITY Department structure integration WARR ITEM ORDER FLOOR DEPARTM EMPLOYEE CITY SELLER CLERK ENGINEER Gest Lav. PURCH In WARE Loc Man In Of of Head Born Type abstraction

12 12 Sparse approach SI12345678 SI123SI456SI78 S1S2S3S4S5S6S7S8

13 13 Structure of the Central PA Repository Social securityJusticeEnvironmentHealth

14 14 Structure of the Central PA Repository Social securityJusticeEnvironmentHealth Abstract Schemas 50 Basic Schemas 500

15 15 COMMUNICATION AND TRANSPORTSPRODUCTION LABOUR EDUCATION HABITAT BUILDING CULTURESOCIAL HEALTH SECURITYJUSTICEDEFENCE FOREIGN AFFAIRS SOCIAL INSURANCECERTIFICATION INTEGRATED DIAGRAM OF 1 st LEVEL PA DATABASE INTEGRATED DIAGRAM OF 2 nd LEVEL PA DATABASE INTEGRATED DIAGRAM OF 3 rd LEVEL PA DATABASE SERVICES GENERAL SERVICES DIRECT SERVICES SOCIAL AND ECONOMIC SERVICES LAND REGISTRY SOCIAL SECURITY FOREIGN RELATIONS IN ITALY ITALIAN RELATIONS ABROAD LEGAL ACTITITIES URBAN CRIMINALITY INTERNAL SECURITY ASSISTANCE HAELTH SERVICE CULTURE HABITAT CULTURAL HERITAGE LABOUR MARKET FARM COMPANIES INDUSTRIAL COMPANIES TRANSPORTS SOCIAL SERVICES ECONOMIC SERVICES FUND TRANSFER TO LOCAL BODIES FOR PUBLIC ACTIVITIES EXPENSES CHAPTER STATISTICS SUPPORT RESOURCES FINANCIAL RESOURCES INSTRUMENTAL AND REAL ESTATE RESOURCES HUMAN RESOURCES PROTOCOL COLLECTIVE BODY TAX OFFICE CUSTOMS HOUSE RESOURCES INTRUMENTS MOTOR VEHICLES REAL ESTATE EMPLOYEES TRAINING DELEGATIONS 2/93 2/12 8/293 6/69 3/182 3/30 2/89 3/59 2/65 37/336 3/75 3/66 9/118 4/36 6/53 10/76 6/7 6 6/130 5/56 6/155 3/134 8/213 10/100 9/118 3/53 9/112 10/178 The whole repository of schemas

16 16 Individual Document Legal person Subject Property Place The top level schema of the repository

17 17 Input knowledge for the production of the repository of local conceptual schemas Logical schemas Conceptual schemas Local Public Administration Central Public Administration Abstract schemas Basic schemas Repository of local Conceptual schemas

18 18 Conjecture (1) and strategy (2) 1. Knowledge appearing in the abstract schemas of the Central PA Repository should appear unchanged also in the Local PA Repository 2. Knowledge appearing in the basic schemas of the Central PA Repository should be changed/updated according to the knowledge appearing in the local logical schemas

19 19 Using a more compact representation Abstract schemas Basic schemas Generalization hierachies of -Individual -Legal person -Document -Place -Property

20 20 A fragment of the generalization hierarchy for Individual Individual Employment Unemployed Employed Dependant Autonomous In search of employment Retired State pension retired Private pension retired Early retired Disability retired Education ……..…

21 21 Input knowledge for the production of the Repository of local conceptual schemas Central Public Administration Local Public Administration Conceptual schemas Logical schemas Abstract schemas Basic schemas Generalization hierachies of -Individual -Legal person -Document -Place -Property Repository of local Conceptual schemas

22 22 The two phases of the methodology Automatic local schema construction Draft schema Final schema Manual step Domain expert

23 23 The methodology at a glance Phase 1 –1. Extract entities –2. Add generalizations –3. Extract relatioships –4. Add relationships related to integrity constraints Phase 2: Expert domain step

24 24 Step 1: Extract entities Inputs Generalization hierachies of -Individual -Legal person -Document -Place -Property Relational local PA schemas Output Draft schema

25 25 Step 1: Extract entities ….. Tables and attributes Generalization hierachies

26 26 Step 1: Extract entities ….. Tables and attributes Generalization hierachies E1

27 27 Step 1: Extract entities ….. Tables and attributes Generalization hierachies E1 E2

28 28 Step 1: Extract entities ….. Tables and attributes Generalization hierachies E1 E2 E3

29 29 Step 1: Extract entities Generalization hierachies E1 E2 E3 Tables and attributes E1 E2 E3 …..

30 30 Step 2: Add generalizations Inputs Generalization hierachies of -Individual -Legal person -Document -Place -Property E1 E2 E3 Draft schema Output New draft schema

31 31 Add generalizations Tables and attributes E1 E2 E3 E1 E2 E3 …..

32 32 Step3: Extract relationships Inputs E1 E2 E3 Draft schema Social securityJusticeEnvironmentHealth Basic schemas of the central PA repository Output New draft schema

33 33 Extract relationships E1 E2 E3

34 34 Extract relationships E2 E1 E2 E3

35 35 Extract relationships E2E1 E2 E3

36 36 Extract relationships E2E1 E3 E1 E3 E2 E1

37 37 Extract relationships E2E1 E3 E1 E2 E3

38 38 Step 4: Add relationships related to integrity constraints Inputs E1 E2 E3 Draft schema K3 K2 Referential integrity constraints Output Final draft schema

39 39 Add relationships related to integrity constraints ….. Tables and attributes E1 E2 E3 K3 K2 E1 E2 E3

40 40 Experiments

41 41 Experiments on 9 databases in 3 areas Domain/Type of administration RegionProvince Municipality Territory xxx Business x Health x

42 42 Relevant qualities of the process: correctness Correctness of the conceptual schema with respect to the “true” one, i.e. the schema that could be obtained directly by the domain expert through a traditional analysis or else a reverse engineering activity. Correcteness is measured with an approximate indirect metrics, corresponding to the percentage of new/deleted concepts in the schema produced by the expert at the end of step 5 in comparison with concepts produced in the semi automatic steps 1-4.

43 43 Relevant qualities of the process: completness Completeness of the conceptual schema with respect to the corresponding reengineered logical schema. Completeness is measured by the percentage of tables that are catched in steps 1-5, in comparison with the total number of tables, after excluding tables not carrying relevant information, such as redundant tables, tables of codes, etc.

44 44 Results Correctness: more than 80% Completness: only 50% of tables are catched. Completeness decreases significantly when the referential integrity constraints are not documented or partially documented. Another cause of reduced completeness is the static nature of generalization hierarchies used in step 1, and the unequal semantic richness in representing related top level concepts. For instance, in the initial Subject hierarchy, 20 concepts represent individuals, while only 3 represent legal persons. An improvement we are applying concerns their incremental update with abstract concepts generated by the domain expert in the process

45 45 Resources For a basic/abstract schema of the central PA repository  ½ person month For a basic schema of the local PA repository  1 person day

46 46 Present developments

47 47 Heuristics for abstract schemas Level 1 Level 2 Level 3 Level 4 Initial schema Enriched schema

48 48 Heuristics for abstract schemas - 1 Level 1 Level 2 Level 3 Level 4 Enriched schema

49 49 Heuristics for abstract schemas - 2 Level 1 Level 2 Level 3 Level 4 Enriched schema

50 50 Heuristics for abstract schemas - 3 Level 1 Level 2 Level 3 Level 4 Enriched schema

51 51 Heuristics for abstract schemas - 4 Level 1 Level 2 Level 3 Level 4 Enriched schema

52 52 Heuristics for abstract schemas - 5 Level 1 Level 2 Level 3 Level 4 Enriched schema

53 53 Heuristics for abstract schemas - 6 Level 1 Level 2 Level 3 Level 4 Enriched schema

54 54 Heuristic for abstract schemas - 7 Level 1 Level 2 Level 3 Level 4 Enriched schema

55 55 Individual Italian citizen Document Business Registry act Legal person Grant Concession rule Project budget Procedure Source Canceled grant Paid off grant Awarded grant Subject Individual Italian citizen Document Business Registry act Legal person Rule Subject Individual Italian citizen Document Business Legal person Rule Subject Abstract schemas obtained from the basic schema

56 56 Strategies ofr building abstract local schemas Strategy 1: Abstraction step followed by an integration step Strategy 2: Abstraction/integration performed together Actual LPA repository Step 1Step 2 Actual LPA repository

57 57 Leftover

58 58 The structure of the cooperative architecture Basic services Transport services Basic services Transport services Administration 1 Processes Administration 1 Processes Exported data Exported services Internal applications Internal DBs Exported data Exported services Internal applications Internal DBs Administration 1 Processes Administration 1 Processes

59 59 Experiments results Step# of tables extracted % of tables extracted Create entities17230 Add constraints21941 Domain expert check 27551


Download ppt "1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of Milano Bicocca, Italy"

Similar presentations


Ads by Google