Presentation on theme: "Introduction to ESDS Qualidata: Creating and delivering re-usable qualitative data Libby Bishop and Louise Corti ESDS Qualidata RC33 Amsterdam August 2004."— Presentation transcript:
Introduction to ESDS Qualidata: Creating and delivering re-usable qualitative data Libby Bishop and Louise Corti ESDS Qualidata RC33 Amsterdam August 2004
Qualitative data collections data from National Research Council (ESRC) individual and programme research grant awards data from classic social science studies other funders/sources focus on DIGITAL Collections, but also facilitate paper-based archiving
Types of qualitative data diverse data types: in-depth interviews ; semi-structured interviews; focus groups; oral histories; mixed methods data; open- ended survey questions; case notes/records of meetings; diaries/ research diaries multimedia: audio, video, photos and text (most common is interview transcriptions) formats: digital, paper, analogue audio- visual
Classic datasets Peter Townsend – Poverty, old age and Katherine Buildings Paul Thompson – oral history and Edwardians, social mobility Mildred Blaxter – Mothers and Daughters National Social Policy and Social Change Archive
Diverse uses for existing data enrich context description how was it really done documentation of methods –team discussions about coding –what, exactly, is semi-structured? augment data you collect –historical comparative case –expand sample size datasets for teaching
Are data always re-usable? restrictions on secondary analysis accessible coherent format –medium –layout processing before delivery
Good archiving = good research thorough documentation well organised and labelled files major stages of research recorded consent, copyright and related issues clarified
Characteristics of a good archived research collection intellectual content extensive raw data created supporting documentation consent transcription identifiers removed data listing
Intellectual content builds on previous research addresses new issues innovative approach to discipline innovative approach to qualitative methodology
Extensive raw data types of research data assembled –in-depth interviews –focus groups –field notes/participant observation –case study notes images and sound recordings range of material – broad focus
Describing qualitative data Full catalogue record Data listing (ID, biog details, date of interviews, media, format, transcript details) Online PDF User Guide Use/ processing notes Archival listing for large collections
Supporting documentation examples –funding application –description of methodology –communication with informants on confidentiality –coding schemes/themes –technical details of equipment –interview schedules –end of award report –documentation from CAQDAS software packages, e.g. analytical memos –bibliographies, resulting publications Anything that adds insight or aids understanding and re-use
USER GUIDE HERE
User Read File UK DATA ARCHIVE DOCUMENTATION Policing, Cultural Change and 'Structures of Feeling in Post-War England, Access conditions Until 1 May 2008, the depositor's permission must be sought for access - please contact Qualidata at UKDA for further details. Users should note that no access at all is permitted to the Metropolitan Police Commissioner's interview transcript (int54) until 31st January Conversion of data and documentation formats All 65 interview/focus group transcript files were converted to both MS WORD 97 and rich text formats. Both the MS WORD 97 and the rich text files are available to users. The hard copy documentation was scanned and is available as a one volume Acrobat PDF user guide. Anonymisation Some limited edits have been made to interview transcripts during processing to protect the identity of respondents. Care has been taken to ensure that this does not compromise the quality of information available. Data and documentation problems There are some spelling mistakes in the interview transcriptions, (left in situ due to limited processing resources), and the format transfer to Word has produced odd characters within the files in a very few cases. These issues should not present problems for secondary users. Notes from data delivery and post-order corrections
Transcribing research 1 integrated into the ongoing research full transcriptions or summaries avoid stockpiling costs and benefits –self transcription –internal team transcription –external transcription
Transcribing research 2 budget –estimated number of interviews x 4 hours x 60 minute tape x hourly salary examples of good and bad full transcriptions –consistent layout –speaker tags –line breaks –header with identifier other details –checked for errors
Example of good transcription LP:And how long have you lived in this house? 4G:This house? Four years past April. LP:And you said you were in, was it Ferrier? 4G:Ferrier Gardens. LP:For twenty years? 4G:Twenty-four years. Twenty-two doon the stair, and two years up the stair.
Identifiers removed confidentiality respected anonymisation? problems of anonymisation –applied too weakly –applied to strongly –timing –potential for distortion –examples user undertakings appropriate and sympathetic
Listing research data contents key elements –general –specific to project template approach point of entry
DATA LIST HERE – EDWARD?
Value of data properly prepared for re-use widely disseminated and accessible suitable formats for use and preservation coherent data and methodology appropriate for CAQDAS packages
Preparing qualitative data for sharing Sharing requires standards –XML mark-up Processing steps: Scan Optical character recognition (OCR) Proof Format XML mark-up
XML mark-up enables Access to content and structure –Speaker tags –Coded textual/audio data –Links to contextual documentation Audio files; fieldnotes; photos; analytical annotations etc –Links to other sources via geo-referencing Micro data; aggregate statistics; maps; census data etc. Data providers to publish to online systems, such as ESDS Qualidata Online Meet needs of researchers requesting a standard they can follow Encourage more qualitative data analysis software (CAQDAS) companies to pursue XML-outputs based on this standard
How we get from tifs to…
…XML mark-up ready for online … My father was, in the daytime he was a boilermaker on the old North Staffordshire Circular Railway and then every night he played in the theatre orchestra. … And he 'd to go to had got to be at work at six the next morning! Cornet player.
Word doc created from OCR
Issues in scanning and OCR Scanning done at 300 dpi, grey scale OCR varies hugely with quality of original, special challenges include (but are not limited to): –Character recognition –Stray marks on page –Missing words –Interviewers notes –Creative character interpretation: section breaks, font changes, footnotes, super- and sub- scripts, and so on. Partially automated with macros, but much judgement (clerical and research) still required
Final Word file (human and Excel readable)
Using Excel macros to create XML transcript
Current final product: basic XML mark-up I would rather nae ken if I had cancer. I told my man that, I says "If I have cancer, don't tell me". I mean you might hae an idea yourself, but I wouldnae like to be telt. I told him that. And how has your own health been over the years? Och, up an' doon, y'ken.
Need for publishing tools Once XML schema is more developed, next step is to develop publishing tools to automate as much of mark-up as possible Currently using simple scripts to find and mark and ; much work still done manually Looking into options for automatic mark-up of some components (e.g. natural language processing and information extraction) Would like to work closer with CAQDAS suppliers to ensure use of similar mark-up semantics