Presentation is loading. Please wait.

Presentation is loading. Please wait.

Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005.

Similar presentations


Presentation on theme: "Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005."— Presentation transcript:

1 Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005

2 Union Catalog Architecture 2 Overview The Union Catalog is a sophisticated mechanism that supports the integration of disparate libraries into a single environment. By environment, we mean a unified User view, rather than a single database or a merged index.

3 Union Catalog Architecture 3 Overview The following will be discussed in this session: Union catalog structure Union catalog vs. Unified catalog Equivalency Merge

4 Union Catalog Architecture 4 A Unified Catalog Usually, a Union catalog involves a catalog where all Equivalent records are merged into one new record. In this scenario, the original records are not saved, and the index is built on the merged version of the records. Obviously, the merged record must include information about its different parts to allow navigation from the record to remote resources.

5 Union Catalog Architecture 5 Unified Catalog Drawbacks Match and Merge is preformed on load time, record by record. This is a slow process when additional resources are added. A new resource may not be available until the slow load process is completely finished. Updating a record is complex, since it may require more than just updating its merged record. This is true because the equivalence relation is not necessarily transitive.

6 Union Catalog Architecture 6 Unified Catalog Drawbacks Merging becomes even more problematic if the merge algorithm suggests that not all data is preserved for every source record. In such a case, any match and merge process must re- access all remote resources to retrieve all original records. It is also impossible to update the unified catalog with a standard Cataloging GUI.

7 Union Catalog Architecture 7 ALEPH Union Catalog Contributors A Import Load / Catalog New/Update/Delete B Create Equivalence C Merge“Just in Time” Equivalence Table (Z120) IndicesOriginal Records Unified Catalog Structure – Virtual Approach

8 Union Catalog Architecture 8 Union Structure – Level A Records are stored as distinct entities in the database. Records can be loaded from an external resource or cataloged with the ALEPH Cataloging module. Records from an external resource can hold an identifier to the external resource to allow simple updating or navigation to an external resource. Indices are created using the standard ALEPH indexing scheme.

9 Union Catalog Architecture 9 Union Structure – Level B An Equivalence table is created by mapping each record to its equivalent records. The equivalence relation is not necessarily transitive. This table can be recreated any time, leaving the records intact.

10 Union Catalog Architecture 10 Union Structure – Level C Result sets will be de-duplicated to contain only one record per group of equivalents. Browse lists will de-duplicate their counters to count only one record per group of equivalents. User View uses on-the-fly Merge to present a single record that is built from a group of equivalents. The Merge algorithm can vary from user to user.

11 Union Catalog Architecture 11 It is simple to update a record by unlinking it from the Equivalence table and marking it as “New.” This action breaks all existing connections in the group. A new record is simply inserted as equivalent only to itself. In all cases, the data of each record stays intact in the database. Virtual Approach Advantages

12 Union Catalog Architecture 12 A separate job runs on all equivalency tables marked as “New.” The job assures that records in a group are evaluated for their real equivalency. It takes no longer to load external resources here than it does to load and index in ALEPH. Virtual Approach Advantages

13 Union Catalog Architecture 13 The worst-case effect of update, insert, or delete is that between the time a record is updated, until the time that equivalency entries are (re)created, the group of equivalent records appears as non-equivalent. There is 100% uptime. Virtual Approach Advantages

14 Union Catalog Architecture 14 The same uptime considerations apply if the match algorithm is to be changed. Changing the merge algorithm has absolutely no effect, since it is executed “just in time.” Virtual Approach Advantages

15 Union Catalog Architecture 15 An equivalency table is created for each record in the database, and points to itself. Pool selection: The equivalency search is minimized to a certain number of candidates. This is usually done on a direct index, such as ISBN, ISSN, or LCCN, and is therefore relatively fast. If the number of candidates exceeds a certain limit, the record itself will be considered as the only candidate. Equivalency Table Creation

16 Union Catalog Architecture 16 Final match: The equivalent records from the pool are found. Matching and conflicting fields are searched. Matching adds a positive weight, while conflicts add a negative weight. The total weight is checked against a threshold. Equivalency Table Creation

17 Union Catalog Architecture 17 When both stages are complete, each record has a Z120 record, holding the numbers of all equivalent records. Z120 is never empty. It holds the record’s own number if no equivalencies are found. Both the pool selection program and the match program are table-defined, not hard-coded Equivalency Table Creation

18 Union Catalog Architecture 18 Merge When a user wants to view a record, a merge is done on all the records in its equivalency table, combining them into a single display. No merged record actually exists in the database. This is a virtual display created on request.

19 Union Catalog Architecture 19 Merge A merged record display is built by taking the “basic” fields from the preferred record and adding other fields from each of its equivalent records. The preferred record is selected by assigning weights to all the equivalent records based on table-defined criteria, and the top weight wins. The merge program is also table-defined.

20 Union Catalog Architecture 20 Implementation The union_global_param tables defines the programs (algorithms) used for different Union Catalog tasks. ! 1 2 3 4 !!!!!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!! USM90 B candidate_prog union_candidate_cdl USM90 B match_prog union_match_cdl USM90 B preferred_prog union_preferred_cdl USM90 B merge_prog union_merge_aleph USM90 B normalize_prog union_normalize_cdl

21 Union Catalog Architecture 21 !!!!!-!!!!!!-!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!-!!! LDR F05-01 EQUAL d -10 LDR F17-01 NOT-EQUAL 1,2,3,4,5,7,8,u,z 001 100## PRESENT 001 110## PRESENT 001 111## PRESENT 001 130## PRESENT 001 The table defines a value for each field. All values are added according to the specifications in the middle columns. The record with the highest value is selected as the preferred record. Preferred Table – An Example

22 Union Catalog Architecture 22 Match Table – An Example !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!-!-!!!> date exact match + 200 date within 2 - 025 date mismatch - 250 short title match + 450 full title match + 600 full title occur within + 350 full title mismatch - 600 full title keywords + 450 full title keywords order + 050 260b exact match + 100 260b occur within + 100 260b mismatch - 025 The accumulative sum will be compare against a defined threshold

23 Union Catalog Architecture 23 Match Table – An Example Different fields are compared to determine whether two records match. For each field, if a match is found, the plus value is added to the total match weight. Otherwise, the minus value is subtracted from the total matched weight. The threshold in the first line defines the weight above which two records are considered a match.

24 Union Catalog Architecture 24 Workflow Illustration Single BIB record BIB ’ s pool of candidates BIB ’ s pool of matched records (= equiv table) queue of new/updated records Resources Contributors

25 Union Catalog Architecture 25 “Union Catalog” - On top of Bibliographic + Holdings database “Union View” - On top of ALEPH 500 administrative database Two Types of Union Catalogs

26 Union Catalog Architecture 26 SOURCE 1SOURCE 2SOURCE 3 UNION CATALOG JUMP Normalize records Bibliographic and Holdings Database

27 Union Catalog Architecture 27 When records are loaded from various resources, fixes are done to normalize their structure and data. Checks could be performed prior to the load so that incompatible records are rejected. Bibliographic and Holdings Database

28 Union Catalog Architecture 28 Jump to original View in union holdings Bibliographic and Holdings Database

29 Union Catalog Architecture 29 ADM 1ADM 2ADM 3 BIB 2 BIB 3 Librarian View Union Catalog - User View BIB 1 ALEPH 500 Database

30 Union Catalog Architecture 30 Records are managed in standard ALEPH 500 in a single BIB and ADM library, but separately per sub-library or administrative unit. The Staff User view does not change from an administrative GUI prospective. A user (patron) has a unified view on the PAC. ALEPH 500 Database

31 Union Catalog Architecture 31 ALEPH 500 Database

32 Union Catalog Architecture 32 ALEPH 500 Database

33 Union Catalog Architecture 33 ALEPH 500 Database


Download ppt "Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005."

Similar presentations


Ads by Google