Presentation on theme: "Aggregation as a tactic - to support discovery Peter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations."— Presentation transcript:
aggregation as a tactic - to support discovery Peter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations in Scholarly Communication (OAI7) University of Geneva, 23 June 2011
RDTF Vision: The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision: UK researchers and students will have easy, flexible, and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable Making content more discoverable both by people and machine via a mixed economy of technological solutions. The Discovery Initiative aims to: Engage stakeholders across libraries, archives and museums Build critical mass of open content to inspire others to participate Encourage development of purposeful aggregations and compelling applications - mashing at the macro-level Exemplify what can be done across domains to free data and explore how to make that data work harder No one-size fits all solution! Context
Key concept in RDTF Vision is aggregation, directly or represented through metadata – to unlock the online & digital riches held in our organisations Regard aggregation as intervention to exploit the telematic opportunity for things [that] are 'remote, digital & published - a phrase derived from an IASSIST conference in 1990 exploring what it meant with the Internet if we regarded all [content] as remote and published. The Web in mid-1990s simplified and thus improved Unfortunately, even now, much which is online and on the Web is badly or inadequately published … We have to improve, re-interpreting what it means to be well-published aggregation as a tactic - a phrase coined to end an an impasse during a meeting to discuss technical aspects of the RDTF Vision statement to identify stakeholder groups
The term aggregation is used a lot in computer science for: objects … assembled or configured together to create a more complex object UML, IBM aggregating resources based on … properties. … they are owl:sameAs and their other properties can be intermixed. For purposes of RDTF aggregation means: an assembly of data sources –more than a collection of objects (image banks, data services, catalogues, activity data) – related or otherwise for machine-as-user – independent of presentation layer However aggregation is not a goal nor an end in itself - It is an intervention to be used for a twofold strategic purpose: improvement - merge & match, customisation and consumption, multiple output formats, reduce duplication of effort discoverability – via promiscuous or well-dressed metadata through e.g. Google or tailored services
Digital Library has mixed parentage - a re-mix of the document tradition & the computation tradition approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like … [Content Provider speak] approaches based on uses of formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures). [Developer speak] Prof. Michael Buckland, Presidential Address, American Society for Information Science, JASISs 50th (1998) http://people.ischool.berkeley.edu/~buckland/asis62.html Language & Perspectives
EDINA - develops and delivers JISC-sponsored national online services –adding value to data and content Digimap Collections (OS mapping; SeaZone; BGS) NewsfilmOnline (various; digitised with JISC £) UK Access Management Federation (institutions; authentication) Data Library – move from support to middle folk Research data support for Edinburgh researchers Research data management guidelines, training, OER materials Edinburgh DataShare – open data repository RADAR – Researching A Data Asset Registry Maybe as middle folk - c.f. those who deal in middleware sometimes having the role of creator and supplier of some service sometimes being the user of what others supply inter-operator Perspectives … as provider
Perspective … as aggregator: developing and delivering JISC-sponsored aggregation services JISCMediahub - links to collections & hosted content (c. 1m resources) CultureGrid; First World War Poetry; Films of Scotland; Getty images (all content searchable and viewable within JISC Media Hub) GoGeo! - metadata registry for spatially-referenced data Geodoc Metadata creation tool, ShareGeo Open SUNCAT – serials union catalogue: 80 libraries metadata/links to full text, download MARC records (& XML & SUTRS - Simple Unstructured Text Record Syntax - data exchange format widely used in Z39.50) PEPRS - e-journal preservation registry jointly led by EDINA with the ISSN International Centre metadata registry of available back copy e-journals - aggregated from preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS)
Some RDTF-related projects @ EDINA GOgeo Linked Data (GOLD) – triplify INSPIRE compliant metadata to – improve discoverability of metadata records via search engines SUNCAT : Exploring Open [bibliographic] Metadata (working with OKF to open up data sent by contributing libraries – convert to RDF) Sharing OpenURL Activity Data - monthly usage data: date & time; anonymised IP address/inst. ID; title; author; ISSN, DOI Uses – article/journal recommendations, publishers reviewing what content is of interest to specific communities, innovative services to meet users needs CHALICE – Use data mining to extract placenames from the English Place Name Survey to create a UK historic gazetteer published as Linked Data & link it to the Geonames ontology on the semantic web. AddressingHistory – Geo-parsing of Scottish Post Office Directories, API onto digitised content, output in XML, CSV, JSON 3 further case studies on other EDINA services illustrating how other collections can benefit from the same techniques.
The end is the start of a new beginning … In earlier web time we had the MODELS user-verbs: Discover -> Locate -> Request -> Access (Deliver) Dempsey, Russell & Murray (1999) http://www.ukoln.ac.uk/dlis/models/publications/utopia/ http://www.ukoln.ac.uk/dlis/models/publications/utopia/ where Access was the end game for us middle folk even if the beginning & part of a deeper process for researchers, students … Now there is call for more than bilateral & negotiated interoperability, where Access is the beginning for developers and for other services RDF/Linked Data enables information to be shared in a more Web-friendly way RDF/Linked Data enables structure and content of those data sources to be explicit - vocabularies, ontologies, relationships Exposing the complexity and relationship in the underlying data, hanging the insides on the outside!
The treasures are on show inside, but … 10 Centre Pompidou
… and so to summarise.. Early web approaches focused on making content accessible for humans hiding the complexity and relationship in the underlying data paying attention to the user interface: HCI & GUI; Usability and Accessibility However to ensure content gets noticed it must be made easier for machines to understand by: exposing the complexity and relationship in the underlying data having in mind the machine-as-user: API as well as HCI Aggregation should be seen as intervention, with strategic purpose: 1.to engage in value-added improvement of content 2.to enhance the discoverability of that which is aggregated to be a focus of attention (thro promiscuous metadata!) If it is with RDF, then thats good dont make a fuss if not Publish RDBMS schemas, catalogue records, codebooks, and ancillary or related content in multiple, machine-readable formats
The Many Minds principle the coolest thing to do with your data will be thought of by someone else Using data as the building platform Jo Walsh & Rufus Pollock (2007-05-17). Open Data and Componentization. XTech 2007 (slide 14)Open Data and Componentization "Benefits of freeing data are many, arguably being the most relevant one the Many Minds principle: therell always be someone that will find out a way to reuse data that you wouldnt have even figured. José Manuel Alonso, Notes from the 5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites, organized by the Open5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites University of Catalonia, School of Law and Political Science, and held in Barcelona, Spain, on July 6th and 7th, 2009.
Stuart.Macdonald@ed.ac.uk Peter.Burnhill@ed.ac.uk http://edina.ac.uk/ Repository Fringe 2011 – call for participants: http://www.repositoryfringe.org/ THANK YOU CC BY-NC-ND 2.0 - image by enggul courtesy of Flickr – http://www.flickr.com/photos/enggul/2361808668/