Presentation is loading. Please wait.

Presentation is loading. Please wait.

GETTING METADATA TO WORK HARDER: re-use, standardisation and streamlining, a data archive perspective ……………………………………………………….………………………………..................................................................................................

Similar presentations


Presentation on theme: "GETTING METADATA TO WORK HARDER: re-use, standardisation and streamlining, a data archive perspective ……………………………………………………….……………………………….................................................................................................."— Presentation transcript:

1 GETTING METADATA TO WORK HARDER: re-use, standardisation and streamlining, a data archive perspective ……………………………………………………….……………………………… LUCY BELL ………………………………………... MANAGEMENT INFORMATION MANAGER UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………………………... THE VALUE OF CATALOGUING, CIG 2012, UNIVERSITY OF SHEFFIELD 10 – 11 SEPTEMBER 2012

2 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Introduction recent changes to 45 years worth of cataloguing and indexing – and indexing practices changes are large, wide-ranging – and still underway! we hope they will both enhance the users experience and create organisational efficiencies

3 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Themes the UK Data Archive: what it is current practice: metadata schema and tools used at the Archive recent internal initiatives generally: the problems we encountered; the solutions we have employed next steps

4 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The UK Data Archive based at the University of Essex since 1967 curator of the largest collection of digital data in the social sciences and humanities in the UK holds several thousand datasets relating to society, both historical and contemporary, making these available via its services: UK Data Service from October 2012 previously, the Economic and Social Data Service (ESDS) it is a place of national deposit for The National Archives / (www.esds.ac.uk)www.data-archive.ac.ukwww.esds.ac.uk

5 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The UK Data Archive: current cataloguing standards the Archive provides access to over 5000 digital data collections all of these items are catalogued at study level, and many at variable level using the de facto standard data cataloguing schema, DDI (Data Documentation Initiative, see currently, the Archive uses: DDI 2.1 (now known as DDI-C, for codebook) the Humanities and Social Science Electronic Thesaurus (HASSET), © University of Essex, based on UNESCO internally-controlled authority lists and CVs

6 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE HASSET multidisciplinary thesaurus developed to support the UK Data Archive collection coverage in the core subject areas of social science disciplines uses standard hierarchical relationships: TT (top term); BT (broader term); NT (narrower term); RT (related term) etc. role of HASSET in the Archive is twofold: used internally for indexing studies and series with HASSET terms also a separate product licensed to others

7 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Significant recent metadata/indexing developments 1.May – October 2010: a review was carried out of the UK Data Archives resource discovery tools. 2011: a project was started to apply the reviews results to the Archives resource discovery applications onwards: work was started to move from the DDI-C to DDI-L (for lifecycle) metadata schema. 3.June 2012 – January 2013: SKOS-HASSET, a JISC- funded project is being undertaken to apply SKOS to HASSET and to test its automated indexing capacity

8 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Shared requirements… it became clear that most of these initiatives were all pointing at one thing: The need for more controlled - and harder-working - metadata

9 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE 1. Resource discovery review How do researchers find data? trends in information-seeking behaviour show that users prefer simple, Google-like interfaces… …but which still return acutely-focused and highly- relevant results. the look and feel of the interfaces should be simple but the results must achieve academic rigour.

10 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Result of the review: the metadata conundrum for data services to produce simple interfaces - which still return highly-relevant results - metadata are required which are both: extremely powerful increasingly invisible a conceptual shift has taken place: the work to focus searches has moved behind the interface

11 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The previous Archive search context HASSET and other CVs may be used in the majority of search and browse activities. 21 interfaces

12 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The vision: use CVs to enhance the users experience We wanted: a single search interface the ability to move seamlessly from one type of resource to another: via faceted browsing and directly from within each resource type This required: cross-referencing data collections with publications, with research outputs, with support guides, with case studies using metadata Many controlled vocabularies!

13 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The result: single faceted search/browse interface We are moving from this: To this:

14 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Facets needing controlled vocabularies Some were already in a fit state: Depositor (existing authority list) Country (existing authority list) Others needed mapping to high levels: Subject categories (116 categories mapped to 21 top terms) Many were populated with freetext: Observation unit Spatial unit Kind of data Time method

15 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Freetext to controlled vocabularies mapping Mapping freetext values to controlled values (all metadata held in SQL tables) Same principles for all: Obtain dump of metadata and manipulate in Excel Identify CV to be used Use Google Refine to identify existing, similar, freetext entries Re-export into Excel and apply mapping (at item level or, if possible, at value level) CVs to be used in the future So far, has taken 2 staff members, working c.0.4 FTE 4 months to clean 3 elements

16 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The mappings Spatial unit Previous Archive project, U.Geo, had created a spatial unit CV 653 unique values, now mapped to 194 This has now been used for all items:

17 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The mappings Unit of observation 183 unique values, now mapped to 11, using DDI CVG recommended list: Individuals Organizations Families/households Housing Units Events/Processes Geographic Units Time Units Text units Groups Objects Other

18 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The mappings Kind of data 294 unique values, now mapped to 7: Alpha-numeric Audio GIS Image Numeric Textual Video

19 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE The mappings More to come…. Method of data collection Access/restrictions (Secure data; standard access conditions etc.) Method of access (Explore online or download) Faceted search/browse will be released as a beta in late 2012 More development will occur during its beta phase following user feedback

20 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE 2. Metadata schema: DDI-C to DDI-L Simultaneously, the Archive has been preparing for the move from DDI-C to DDI-L DDI-C is similar to a traditional metadata schema DDI-L is more flexible – to the benefit of users: permits data as well as metadata to be encoded captures survey lifecycles gives users a fully-rounded view of a survey from inception to results broad and flexible, allowing groupings to be made – re-use is key to support all this, it requires CVs to be used in several elements (the DDI Alliance Controlled Vocabularies Group is working on these)

21 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE 3. CVs for organisational efficiency: SKOS application JISC project: SKOS-HASSET 8 months (June 2012 – January 2013) part of the JISC Research Tools Programme Multi-disciplinary project team: Information Scientists, Data/text Mining Programmer, Linguist, RDF specialist, Developers

22 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE SKOS-HASSET three aims: apply SKOS to HASSET – making the thesaurus more flexible improve its online presence test its automated indexing capabilities; corpora: questions questionnaires abstracts publications

23 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE SKOS-HASSET Progress so far: SKOS has been applied to HASSET Texts prepared for the automated indexing case study Gold standard of manual indexing of questions is taking place TF/IDF, KEA and WEKA all being used for term extraction – work underway Next steps: SKOS product licensing

24 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE SKOS-HASSET Communication: SKOS-HASSET blog: Project web site: projects/skos-hassethttp://www.data-archive.ac.uk/find/our- projects/skos-hasset Webinar planned for the winter User guidance Please contribute, give feedback!

25 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Developments… to issues … to improvements For users: the faceted search/browse interface exposed a lack of standardisation in the underlying metadata …freetext terms have been used over 45 years; these are now being standardised...rich freetext metadata has not been lost the move from DDI-C (DDI 2.1) to DDI-L (DDI 3.1) brings in a conceptually different type of schema to the users benefit… …but which also requires more controlled vocabularies

26 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Developments… to issues … to improvements For us: Applying more CVs will provide efficiencies:...the Archive wants to introduce an online deposit form for its depositors which will include CV dropdowns...create more ways of suggesting terms for the cataloguers SKOS gives the opportunity to work more flexibly with the thesaurus …automated indexing using CVs is being tested...SKOS will allow for easier future thesaurus development

27 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Analysis and reporting and future acquisitions decisions supported The future: analysis and reporting enhanced Management Information

28 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Conclusion we all NEED metadata so that we can find stuff there is too much stuff (or not enough bodies) to create all the metadata ourselves in time these days searchers/users often expect the applications to do the work for them use the tools at our disposal to make this happen by: employing more CVs where appropriate sharing and using RDF-enabled CVs and, crucially, continuing the creation of quality-assured metadata using fewer resources

29 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE Conclusion JISC Intrallect report; quotation from Vic Lyte: A new researcher wishing to approach scholarly inquiry to determine the impact of global warming on penguin populations in South Antarctica doesnt walk up to a Librarian and shout Penguins!. (Duncan, C. & Douglas, P., (2009). Automatic metadata generation: use cases and tools/priorities. Intrallect (for JISC): 2009)

30 ……………………………………………………………………………………………………………………………….…………………………….. …………………………………………………………………………………………………………………………………………………………..… UK DATA ARCHIVE CONTACT UK DATA ARCHIVE UNIVERSITY OF ESSEX WIVENHOE PARK COLCHESTER ESSEX CO4 3SQ ……..……………………………….….. T +44 (0) E


Download ppt "GETTING METADATA TO WORK HARDER: re-use, standardisation and streamlining, a data archive perspective ……………………………………………………….……………………………….................................................................................................."

Similar presentations


Ads by Google