flexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica CERN ETT-DH
Hector Sanchez 26 February CERN Contents Introduction Project overview: definition, scenarios, architecture, technology Main features Benefits & results
Hector Sanchez 26 February CERN Introduction Link in the scope of FlexElink Stored vs. generated links Link managers Reference to the fulltext version or a Internet resource related to a certain bibliographic record (not necessarily an URL) Generated links reduce considerably maintenance Know when to create a link and build them from bibliographic data Link SetLink, GoDirect, Dynamic Format
Hector Sanchez 26 February CERN Project goals New link management tool Improvement of the formatting tool Integration of already existing LM technologies used at CDS Be able to adapt to new situations and needs Independent of the formatter Work over different types of inputs Cover all possible formatting functions needed Reduce maintenance Avoid ‘harcode’ maintenance Make it easy to use for CDS clients
Hector Sanchez 26 February CERN Scenario 1: Brief formats Output: Original XML record with its HTML version Input: Bunch of records in OAI MARC XML cv3t5 flexElink ‘CERN MARC’ OAI MARC XML cxtm OAI MARC XML* SQL und... und... h HTML Bibliographic DB ALEPH Consultation DB MySQL
Hector Sanchez 26 February CERN Scenario 2: Detailed formats Output: HTML version to be displayed or PHP to be saved to a file Input: record in OAI MARC XML CDS search flexElink OAI MARC XMLHTML page Links to fulltext & references PHP file setlink output Pre-generated references inclusion Consultation DB MySQL
Hector Sanchez 26 February CERN Architecture overview Record Separator Variable Extractor Behavior Processor Link Manager Web configuration interface Extraction rules Link repository Behavior repository individual record internal variables solve links Text output input records admins
Hector Sanchez 26 February CERN Technology OO analysis and design Implementation tools 100% open source & freeware Component based delegation & collaboration lead to a more de-coupled and re-usable software Almost any part of the system can be substituted, modified or extended without affecting the rest
Hector Sanchez 26 February CERN Main features: Internal variables Maps the values in the input OAI MARC XML records into internal variables This mapping can be configured using the Extraction Rules Tells the extraction module which values to extract from the input and to which variables it has to map them Makes the rest of the configuration independent of the input Developed for OAI MARC XML but it can be adapted to other input types (DB) by specialising the extraction module
Hector Sanchez 26 February CERN Main features: Internal Variables OAI MARC XML extraction rules example SCAN Racah, Giulio Guignard, G editor 11 author name editor fields Variable: author Value #0field: name Racah, Giulio Value #1field: name Guignard, G field: editor editor SCAN Racah, Giulio Guignard, G editor 11 SCAN Racah, Giulio Guignard, G editor 11 SCAN Racah, Giulio Guignard, G editor 11
Hector Sanchez 26 February CERN Main features: Behaviours Behaviour: Describes how the input has to be processed in order to achieve desired output Support for multiple behaviours Behaviour Condition 1 Actions Condition 2 Actions Condition: Expression that makes associated actions to be applied only if it’s TRUE for the current input record data Action: Set of statements that describes how the output has to be built (e.g. formats) if the corresponding condition is accomplished Conditions and actions are expressed using the Evaluation Language
Hector Sanchez 26 February CERN Main features: Evaluation Language Specially designed for FlexElink Context-free grammar Extensible via User Defined Functions (UDFs) Operations that are defined in PHP Simple Knowledge Base management Allows interaction with the Link manager Re-usability of expressions through Formats Enables the access to internal variables
Hector Sanchez 26 February CERN Main features: Behaviours Simple behaviour example Behaviour: SIMPLE $909C0.b=”27” “”=“” “ ” $245.a ” ” forall($0248.a){ rep_prefix(“ – “) $0248.a separator("; ") } “ ”$245.a” ” forall($100.a){ rep_prefix(“– Authors: “) $100.a separator("; ") } UDFs 100.a author name 245.a title 0248.a standard ref Internal Variables 909C0.b base #
Hector Sanchez 26 February CERN Main features: Link Manager Generates links from stored rules These rules are also expressed using the Evaluation Language Supports different types of link solving External linking Just generate the link from the rules Internal linking The link is always a file, it checks the existence, access, formats, etc Can be extended: The LM is just a framework to which new linking logic can be added Independent of the formatter It has no access to Internal Variables, receives data as parameters
Hector Sanchez 26 February CERN Main features: Link Manager Example: simple link definition and access from EL Generation of records with already solved fulltext links “ ” $245.a “ ” link(“FULLTEXT”, $base, $categ, $id) { “ Fulltext access: ” forall($link){ “ [“ $link.format_id “] ” } else{ “No link found” } FULLTEXT link definition Link manager call
Hector Sanchez 26 February CERN Benefits More modular and specialised CDS Search The OO approach eases the maintenance and allows future extensibility Only one way of configuring formats and links All the configuration is kept in a DB and separated of the logic Possible to generate different configuration views Search Engine doesn’t know anything about linking or formatting flexElink Search Engine formats links format/link config users query results
Hector Sanchez 26 February CERN Results It’s already being successfully used for Pre-generated CDS Search BRIEF formats On-the-fly creation of CDS Search DETAILED formats HTML pages of the fulltext extracted references Speed optimisation (test over 15’000 records) BRIEF format creation (average): 0.05 sec/record DETAILED format creation (average): 0.15 sec/record Testing for future replacement of GoDirect and SetLink GoDirect: ‘automatically’ migrated 91% of journals Setlink: Ready for defining new fulltext rules