Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language and Computation Group Day 07 Oct 2011

Similar presentations


Presentation on theme: "Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language and Computation Group Day 07 Oct 2011"— Presentation transcript:

1 Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language and Computation Group Day 07 Oct 2011 http://lac.essex.ac.uk/lacday2011

2 What is ESDS? Economic and Social Data Service national data archiving and dissemination service (since January 2003) access and specialist support for key economic and social data resources to UK Higher and Further Education users brings together centres of expertise in data creation, dissemination, preservation and use in Manchester and Essex managed by the UK Data Archive (established in 1967); jointly supported by Economic and Social Research Council (ESRC) & Joint Information Systems Committee (JISC)

3 http://www.esds.ac.uk

4 ESDS in numbers 6,000 datasets in the collection 230 new datasets added each year over 22,000 registered users approximately 60,000 downloads worldwide p.a. 3,000+ user support queries

5 Data collections we hold Through our dedicated services we provide access to: surveys government data aggregate statistics censuses international data longitudinal data qualitative data - multimedia data sources historical data

6 ESDS Linguistics data offers YearData offers 20042 20056 200611 200715 200813 200910 20107 2011 (Jan) 3 Total67 From ESRC grants 19 accepted rest unable to accept (due to confidentiality or size reasons) or referred to more suitable archives (e.g. Oxford Text Archive, CHILDES/Talkbank database) increase in depositing after researcher self- archive (UKDA-Store) launch

7 ESDS data holdings on linguistics & related fields 40 main catalogue data collections with language and linguistics subject category, accessible from the main ESDS Data Catalogue (14 qualitative, 18 quantitative, 8 historical) all qualitative studies comprising of in-depth interview transcripts or audio recordings can be used as corpus material or data sources for secondary analysis e.g. Family Life And Work Experience Before 1918 (Edwardians) (SN 2000), Pioneers interview collections Family Life And Work Experience Before 1918 (Edwardians) (SN 2000)Pioneers interview collections 13 UKDA-Store data collections with ‘linguistics’ as the primary discipline.

8 Examples of ESDS data collections with subject term “Language and Linguistics” 6228Discourse of the School Dinners Debate, 2004-2008 6402Urban Classroom Culture and Interaction, 2005-2007 6790Dynamic Variability in Speech: a Forensic Phonetic Study of British English, 2006-2007 6259Identities in Neighbour Discourse: Community, Conflict and Exclusion, 2004-2006 5271British Migrants in Spain: the Extent and Nature of Social Integration, 2003-2005 6127Linguistic Innovators: the English of Adolescents in London, 2004-2005 5200Devolution and Identity in Northern Ireland: a Longitudinal Discursive Study, 2003-2004 4457Phonological Memory as a Predictor of Language Development in Down Syndrome, 1995 and 2001 4634Transnational Seafarers, 1999-2001 4632Dutch Map Task Corpus, 1999 3991Profiling Elements of Prosodic Systems in Children (PEPS-C), 1997-1998 3556Age of Acquisition, Frequency, Concreteness and Imageability Ratings for Welsh Words and Their English Equivalents, 1995-1996 5487Literary Practices and the Mass-Observation Project, 1992-1993 3435Welsh Social Survey, 1992; Including Welsh House Condition Survey, 1992 4896English People, 1965-1990 4897Language People, 1965-1986 2715Northern Ireland Transcribed Corpus of Speech, 1973-1980 430U.K. County Data, 1851-1966 5251Study of the Abelam of Papua New Guinea and the Nso of Cameroon, 1939-1963 2947Susanne Corpus, 1961 3821Social History of the Welsh Language : Evidence of the 1891 Census; Project 2

9 Examples of linguistics data holdings in UKDA-Store

10 Linguist users of ESDS data 51 self-reported linguists (out of around 22,000) about 30 of these downloaded ESDS data, the majority of them being survey data, then qualitative interviews and a few historical data downloads the rest might well have accessed documentation, study methods and instruments about studies (but since these do not require registration, we cannot report usage)

11 How linguists have used ESDS data a researcher and their team based at the University of Sheffield used 2 audio collections for analysis of speech patterns (SN2000 - Edwardians, SN5407- Health And Social Consequences Of The Foot And Mouth Disease Epidemic In North Cumbria)SN2000 - EdwardiansSN5407- Health And Social Consequences Of The Foot And Mouth Disease Epidemic In North Cumbria an ESRC joint project between the UK Data Archive and the Language Processing team at the University of Edinnburgh used three classic social science collections to test natural language processing tools. They looked at named entity recognition on typical social science data interviews. Person- based identification enabled the testing of an anonymisation tool.

12 ESDS data uses by Linguists a JISC project between EDINA and the UK Data Archive using the HISTPOP collection at the UK Data Archive to augment resource search and discovery methods. –data and metadata were fed to GeoDigRef and LTG GeoParser –the enriched data were embedded in an experimental geographical service by EDINA –allows users to search resource collections via a map- based interface, which provides links back to the reference of the place-name in the original resource

13 That sounds interesting! Where to look for relevant data ? ESDS data catalogue (homepage) Some of these options can be used to find data: –search the ESDS Catalogue (simple or advanced search) –search variables –browse Major Studies list –browse the latest releases

14 Finding data: Searching the Data Catalogue

15 Finding data: Sample data catalogue record

16 Finding Data: Sample Documentation

17 Where to find more data

18 Finding Data: our researcher self-archiving UKDA-Store

19 Accessing data Documentation is freely available to anyone Users must be registered with ESDS to download access data You can use your university username & password to register Access to some data is limited to users at UK Higher or Further Education Institutions Currently have approx. 22,000 registered users

20 How to access data register with ESDS agree to the terms & conditions of the End User Licence select the dataset from the Data Catalogue and click ‘Download/Order’ specify a usage/project for which the data are to be used then: –download data selecting your preferred format (SPSS, Stata, TAB etc.) or –place an online order for the data for more see http://www.esds.ac.uk/support/e2.asphttp://www.esds.ac.uk/support/e2.asp

21 How to access data

22 Teaching resources ESDS can help provide support in many areas of teaching and research methods –teaching datasets –thematic guides, e.g. on health and crime –guides on: data collection and use data sharing and data management confidentiality, consent and ethics issues survey and research design and analysis software for analysing data –case studies of re-use –training events and workshops recently involved in creating formal assessments based on Qualitative data collections (TALIF grant with Dept of Sociology, Essex)

23 Workshops and training Thematic data resources events Help with using data –specific datasets –data handling skills –methodological issues –analytical skills - introductory and advanced level We are pro-active and re-active, so ask us, if you want to have a workshop! Forthcoming events: http://www.esds.ac.uk/news/esdsforthevents.asp

24 Other UK Data Archive services

25 Thank you! Questions?

26 References Corti, Louise. (2011, 11 Jan). Report on Linguists’ use of ESDS. ESDS/UK Data Archive.


Download ppt "Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language and Computation Group Day 07 Oct 2011"

Similar presentations


Ads by Google