Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC

Similar presentations


Presentation on theme: "© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC"— Presentation transcript:

1 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk Anne G S Asserson Research Department University of Bergen anne.asserson@fa.uib.no INTEREST INTERoperation for Exploitation, Science and Technology

2 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 2 Authors Anne Asserson UiB Keith G Jeffery STFC-RAL

3 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 3 Structure Background The Hypothesis Conclusion Remote Wrapper Local Wrapper Catalog Catalog Plus Pull (ERGO2++) Full CERIF Harvesting

4 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 4 Background: GL Grey literature is important but is only a small component of the total research information environment and must be seen in context of the overall research process Grey literature is a product To understand the product need to have information on the sources and the process i.e. the research context Do not try to obtain information through a fog backwards from GL metadata Get it moving forwards through the research process then much GL metadata derived directly and consistently

5 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 5 Background: Access Interoperation: homogeneous access to distributed heterogeneous information –Query against schema (of user) –Translation to other schemas (of sources) –Answer reconciled to original schema (of user) –If common interoperation format n interfaces –If not n(n-1) interfaces Utilise one common interoperation format [Character set, language, syntax, semantics] The alternative is google-like where the end-user has to do the translations and reconciliations This does not scale

6 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 6 Background: Metadata Grey literature repositories can be interoperated without CERIF-CRIS using OAI- PMH and DC (OAISTER) Grey Literature Repositories provide better recall and relevance when interlinked via CERIF-CRIS – research context formal syntax, declared semantics Metadata –Schema, Navigational, Associative {descriptive, restrictive, supportive} The key to everything is quality metadata –input validation, query/retrieval, relationship linking, INTEROPERATION

7 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 7 PROJECTORGUNIT SkillsCV General Facility Particular Equipment Contact Results Publication Results Patent Results Product Service Funding Programme Event Classification Prize/Award PERSON CERIF: EU Recommendation to Member States Background

8 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 8 Result Publication Instance Diagram Person A Publication X OrgUnit O OrgUnit M OrgUnit N Project P member employee Part of owns IPR author Project leader Metadata in CERIF- CRIS much richer than usual repository

9 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 9 CERIF- CRIS + Repositories at 1 institution CRIS Research Context [projects, persons, organisational units funding, products, patents, publications facilities, equipment, events] OA Repository (hypermedia) Documents e-Research repository Datasets and Software OAI- PMH Various protocols End-User CERIF

10 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 10 ….and multiple institutions CRIS OA repository e-Research repository CRIS OA repository e-Research repository CRIS OA repository e-Research repository End-User Institution AInstitution BInstitution C

11 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 11 Hypothesis Comparison of possible architectures for interoperation of grey repositories –(of publications or data and software) Leads inexorably to === CERIF should be used either : –as the native storage format, –as the storage format of a derived data warehouse (transformed copy of the CRIS) –as the export format converted from the CRIS native format using a wrapper.

12 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 12 Remote Wrapper user dispatcher addresses receiver addresses Query convertor answer convertor Query schema receiverdispatcher addresses Query form receiver answer convertor Query schema dispatcher network integration schemas Presentation convertor Presentation form >>> LAN Query convertor

13 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 13 Remote Wrapper the user needs only web browser and simple query form the host has to write query converter the host has to write answer (XML?) converter (to a specific XML DTD?) the query expressivity is very limited the user client has to write an integrator for the answers

14 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 14 Local Wrapper integration user Query convertor Presentation convertor Query schemas Query form Presentation form schemas dispatcherreceiver addresses receiverdispatcher addresses receiverdispatcher addresses network >>>> LAN

15 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 15 Local Wrapper each host has only to supply and update its schema to the client (all clients if there is not a central query server) each host has no software to provide except receiver and dispatcher the client (if it is a central service) has a very large workload if there is no central service then each client has to have all schemas supplied and updated the client software has to include a complex query refiner the client software has to include multiple complex query converters the client software has to include a complex answer integrator the client software has to include a presentation converter (complexity depends on specification of presentation required and complexity of the answer structure)

16 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 16 Catalog Retrieve phase by user User phase1 Hit list CERIF Metadata Catalog Query form Query CERIF Metadata Catalog receiver convertor Query (standard) schema dispatcher CRIS network loader Construction phase from each host LAN user

17 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 17 Catalog simple query on union catalog (which may be centralised or replicated) possibly not all required entities and attributes in catalog effort to populate catalog; requires converter at each host to supply CERIF metadata

18 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 18 Catalog Plus Pull (ERGO2++) User phase1 Hit list processing CERIF Metadata Catalog receiverdispatcher addresses receiverdispatcher addresses network Query form Query dispatcherreceiveraddresses Unique id query User phase2 >>>> LAN Presentation form

19 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 19 Catalog Plus Pull (ERGO2++) advantage of simplicity as for catalog-only architecture advantage of additional information provision disadvantage that additional information is heterogeneous (unless converted to CERIF export data model) disadvantage of hosts having to maintain entries representing their database content in the CERIF metadata catalog

20 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 20 Full CERIF user dispatcherreceiveraddresses Query receiverdispatcher addresses Query receiverdispatcher addresses network Query formPresentation form >>>> LAN

21 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 21 Full CERIF very simple and easy to use for the end-user each host has to either run a full CERIF model database or provide a full CERIF model version of the host database

22 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 22 Harvesting (construction phase) Crawling robot Catalog of documents with associative descriptive metadata Html pages converter CRIS non-CRIF CRISs >>> CRIS Html pages network

23 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 23 Harvesting (search phase) User phase1 Hit list processing Harvester Associative descriptive metadata catalog receiverdispatcher addresses receiverdispatcher addresses network Query form Query dispatcherreceiver addresses URL query User phase2 Html pages from CRIS network LAN Presentation form

24 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 24 Harvesting The host has to provide a copy of the database as webpages to be available to the search robot and subsequent accesses based on clicks from URL of metadata. The query is based on existence of term(s); constraining by entity or attribute is not possible (without sophisticated xml form processing). The results are unstructured and one page at a time (click on URL in metadata catalog to see page); this inhibits statistical processing or report generation. It is easy to implement and maintain (although the database may be ~2 weeks out of date) and has a familiar interface for many WWW users.

25 © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 25 Conclusion To interoperate grey repositories link to a CRIS Best: Full CERIF architecture Else: wrap CRIS to interoperate using CERIF


Download ppt "© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC"

Similar presentations


Ads by Google