Presentation is loading. Please wait.

Presentation is loading. Please wait.

NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1.

Similar presentations


Presentation on theme: "NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1."— Presentation transcript:

1 NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1

2 2 Overview of technical infrastructure EO as a network of data providers – descriptive metadata EO as a network of data providers – usage statistics Added value services Publication lists Enriched metadata Full-text searching Multilinguality Collaboration with RePEc EO gateway and portal Plan

3 Meresco Metadata Harvester Objects HTTP Crawler Metadata Lucene EO portal Homemade - FOSS Exporter engine Homemade - FOSS Logs OAI-PMH RSS/Atom Other portals SRU RePEc SRU Enrichment service OAI-PMH DIDL / MODSSWUP

4 4 Descriptive metadata exchange format Desired EO functionalityTechnical decision Facetted search&find experienceNormalized/normalizable metadata APA formatted citationsGranular metadata Publication list per EO authorUnambiguous identification of authors Full text indexing/searchingUnambiguous links to full texts Enrichment of metadata (JEL, datasets, citations) Extensible metadata format

5 5 DIDL – XML container structure that can hold semantically distinct metadata Descriptive, object files (by-ref), splash page, enriched metadata Based on existing container structure defined by SurfShare MODS (3.2) – granular descriptive metadata Based on existing metadata structure defined by SurfShare DAI – Unambiguous identification of authors National or institution-unique persistent identifier Continuous aim of standardization at a level that surpasses the NEEO project NEEO adaptations fed back to SurfShare Descriptive metadata exchange format

6 DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Item[1..∞] (of type descriptiveMetadata) Descriptor/type (« descriptiveMetadata ») Component/Resource -- representation by value (XML) Item[0..∞] (of type objectFile) Component/Resource -- representation by ref. (URL) Descriptor/modified Descriptor/Identifier (persistent identifier) Descriptor/modified Descriptor/type (« objectFile ») Descriptor/Identifier (persistent identifier) Descriptor/modified Item[0..1] (of type humanStartPage) Component/Resource -- representation by ref. (URL) Descriptor/type (« humanStartPage ») EO descriptive metadata model Publication is described as a complex (compound) object – persistent identifier Aggregation of 3 types of components – descriptiveMetadata (MODS) – objectFiles – humanStartPage Extensible – additional items can be stored within the complex object MODS contains DAI of EO author Semantic Web - Linked Data – OAI-ORE ready

7 7 Central EO gateway DIDL and MODS application profiles Vocabularies in DIDL and MODS Technical guidelines for project partners All documentation is OA available Partner solutions: home-made or with external support ARNOhome-made Dspacehome-made, AtMire Eprintshome-made, ECS-University Of Southampton FedoraMETS/MODS -> DIDL/MODS DigiToolMETS/MARC -> DIDL/MODS All original partners + 2 new partners Descriptive metadata exchange format

8 8 Aim: sustainable solution for big network with many partners Decentralized Admin file FormatXML-RDF | FOAF + NEEO-specific vocabulary Decentralized file sits on local web server of project partner Content- information of institution : name, description,... - OAI baseURL + OAI sets to harvest - EO authors: DAI, photograph, full name, affiliation EO gateway HTTP gets and validates at regular intervals Used for - information in EO portal screens - publication lists (match on DAI) - automated harvesting process Decentralized registry service

9 9 Usage statistics – EO use case EO use case: present download rates through EO portal per publication, scholar, institution Normalization of exchange format and communication protocol OAI-PMH exchange of SWUP OpenURL ContextObjects (Scholarly Works Usage Community Profile) Special considerations: Enryption of IP address of requester (MD5) Filtering out robot requests (list of 50 regular expressions) Filtering out double clicks Similar initiatives come together at Knowledge Exchange workshop, Berlin 29-30 March 2010 JISC (Usage Statistics Review project), Pirus2, SurfSure, Counter, Mesur, OA-Statistik, Economists Online

10 10 Usage statistics – implementation status Central EO Gateway – DoDoCo (Document Download Counter) PMH harvesting of SWUP ContextObjects into SQL database Enrichwith information on item, scholar, institution Web servicelevel (item, scholar, institution) + date range Technical guidelines for project partners (OA available) Partners Implementation - for all major IR platforms - solution for Combined Log Format web logs Registration through Admin file 7 original + 1 new partner Not enough data available Not visible through EO portal yet, although DoDoCo software is ready

11

12 12 Publication lists Per DAI of authors who are registered in Admin file SRU extract publications from EO gateway and Format APA+ in HTML with links to full text in EO partner repository with links to publisher sites (through OpenURL resolution) APA in PDF APA in RTF RIS BibTex Added value services

13 13 Enriched descriptive metadata JEL classification Enrichment service (ES) gets records to be enriched from EO, over SRU ES creates enrichment record(s), using text mining technology ES makes enrichment record(s) available to EO, over OAI-PMH EO harvests enrichment records from ES and integrates into original record EO reuses enrichment information in its services: index & present Bibliographic references Through collaboration with RePEc/CitEc Visible through EO portal Added value services

14 14 Full-text search service Process Full-text indexer component in Meresco fetches relevant records from EO Gateway over SRU Follow links to PDF object files Text is extracted from PDF, and added to record through SRU Update EO can now index & present Prototype exists Not yet fully deployed in EO portal Added value services

15 15 Multilinguality (EN, FR, GE, ES) Complete EO portal interface JEL classification MLIA functionality in EO portal Student thesis – Prof. Bouillon (Univ. Of Geneva -- multilingual information processing department ) (uncustomized) Systran and Google Translate show equivalent results Contacts with CACAO (also through Europeana) comes as a complete portal solution, not as an add-in for existing portals like EO Considerations: Lingua franca in economics = EN NEEO = NOT research project in linguistics, aim: reuse best existing technology  Use “Google Translate” for translation of queries Added value services

16 16 Harvesting metadata from RePEc into EO AMF to DIDL/MODS mapping Push metadata from EO to RePEc “RePEc:ner” archive, with separate series for each EO institution According to agreed-upon reviewed ReDIF format  Admin file directives in order to limit overlap Contribute to LogEc Reuse CitEc data in EO portal Collaboration with RePEc

17 17 Gateway – metadata store and search engine Choice between Summa, SOLR/Lucene, Meresco Open source solution, based on Lucene search engine Support available from software developers (CQ2 company) Has proven its qualities in the past (DARENet) Portal First version: home-made Final version: outsourced design to private company HTML, CSS, JavaScript, all images EO gateway and portal


Download ppt "NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1."

Similar presentations


Ads by Google