UK Archives Discovery Forum 7 th March 2013 An open old  modern surname index for digitized archives Patrick Hanks, UWE Sean Cunningham, TNA Paul Cullen,

Slides:



Advertisements
Similar presentations
Issues In the Digital Humanities La Trobe eCoffee Dr Craig Bellamy VeRSI, 5 November, 2010
Advertisements

Launch of the Integrated Archives and Manuscript System (IAMS) Bill Stockting, Cataloguing Systems and Processing Manager (Scholarship & Collections) UKAD.
Karen Dennison Accessing international survey data collections via ESDS British Academy, Tuesday 14 March 2006 ESDS International.
Access to Economic and Social Data via the UK Data Archive Jack Kneeshaw UKDA.
An Introduction to the UK Data Archive and the Economic and Social Data Service November 2007 Jack Kneeshaw, UKDA.
A centre of expertise in digital information management Developing a Quality Culture For Digital Library Programmes Author & Presenter Brian Kelly UKOLN.
Configuration management
Warwick History Dissertation. Questions 1. Do you need to use ‘primary sources’ and ‘archives’ in writing a dissertation? 2. What do these two terms mean?
The White Rose Collaborative Collection Partnership Brian Clifford University of Leeds.
Calstock Parish Archive History on the Ground Project.
Histpop – The Online Historical Populations Report website and other census resources from the HDS ……………………………………………………….………………………………
Copyright © 2006 Educational Testing Service Listening. Learning. Leading. The Test Collection at ETS Karen McQuillen, Manager, Library Services ETS
MICHAEL J DENIS, PO BOX 125, PARKSVILLE, KY Kentucky Vital Records.
1 Large-scale collaborative digitisation 19 th Century Pamphlets Online Mar-2007 – Feb-2009 Grant Young Project Manager, 19 th Century.
History of English Language Assessment Archives in context and as context Database structure ISAAR (CPF) Online Archival Sustainability.
Connected Histories Sources for Building British History, Funded under the JISC eContent Capital Programme for 18 months Partners:  Prof. Tim.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
Odour of Chrysanthemums Online access to a short story by D H Lawrence Group for Literary Archives and Manuscripts Manchester 26 March 2010 Dorothy Johnston.
Linguistic Atlas of Late Medieval English By Lauren Crowne.
Dr Anna Bülow March 2009 Business Models for Large- scale Digitisation Projects at The National Archives UK.
Creating electronic resources for the study of forced migration: a researcher's perspective Marilyn Deegan Refugee Studies Centre University of Oxford.
Page 1 / 28 Aytac, Development of a User-Centered Digital Library... Development of a User-Centered Digital Library for Ottoman Manuscripts Selenay Aytaç.
The Application of ISAD(G) to the Description of Archival Datasets Elizabeth Shepherd University College London.
Engineering Village ™ ® Basic Searching On Compendex ®
Learning and Teaching with the UK Census Developing the Collection of Historical and Contemporary Census Data and Materials into a Major Learning and Teaching.
The Oxford Google Digitization Project Frances Boyle.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
Historical GIS as a community resource: the vision of Britain through time project Paul S. Ell Centre for Data Digitisation and Analysis Queen’s University.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
AHDS History Zoe Bliss Acquisitions and Information Officer.
The Role of the Public Library in the Digital Age Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by the Library and Information Commission,
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
In 1993 Simon Fowler defined income generation by archives as ‘those activities organised by archival staff with the aim of raising.
How the Computer and the Internet Have Changed Genealogical Research Larry D. Crummer Lib15, Spring 2004 Joy Chase, Instructor.
Developing methods for evidencing social enterprise as a public health intervention Project 1: An historical perspective on social enterprise as a public.
GAUDI Ground-based Asteroseismology Uniform Database Interface E. Solano Bases de données en spectroscopie stellaire. Paris.
Dissertation Workshop How to design (and carry out) a quantitative analysis for a dissertation A practical workshop Mark Brown (Social Statistics)
Renewable Energy Policy: A Local Government Perspective Alison Johnson for PEC624: Dissertation.
Delivering Valuation Services in Changing Times Challenges & Opportunities John O’Sullivan Commissioner of Valuation and CEO, Valuation Office 20 th June.
The Berkeley linguistic archives Leanne Hinton and Andrew Garrett University of California, Berkeley.
Exploring Medieval Seals: A case study in research and outreach Enhancing Impact, Inspiring Excellence Conference Birmingham, 4 September 2013 Dr Elizabeth.
POPULATION AND HOUSING CENSUSES IN SLOVAKIA ON THE WEBSITE Miroslav Hudec Pavol Büchler INFOSTAT – Bratislava MSIS Geneva
Causal inferences During the last two lectures we have been discussing ways to make inferences about the causal relationships between variables. One of.
4 FamilyHistory.com is a member of Ancestry.com View as a Free Front End to Ancestry.com.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
BUILDING ON COMMON GROUND: EXPLORING THE INTERSECTION OF ARCHIVES AND DATA CURATION Lizzy Rolando & Wendy Hagenmaier 6/3/2015IASSIST 2015.
A Question of Interpretation The role of archivists in an online age Amanda Hill University of Manchester, UK.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
A case study in the rewards of long term data sharing : 26 years of the MRC Psycholinguistic Database Michael Wilson STFC Rutherford Appleton Laboratory.
Financial Transparency – Balancing the Books  When you think about who knows what and about whom in our society it is hard to ignore the increasing information.
The History of the Social Survey - the Social Survey in History Anne Sofie Fink Danish Data Archives.
Feb 2012Teldap, Taipai1 Creativity, Collaboration, Convergence and the change from print to a digital environment: Theme and case study. (Also Friday 09:30.
The Question Bank Graham Hughes & Julie Gibbs Department of Sociology University of Surrey Research Methods Festival, July 2008
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
Tutorial 1 Dr. Oscar Lin School of Computing and Information Systems Faculty of Science and Technology Athabasca University January 18, 2011.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Naturalization Records The Federal Government has a form.
1 Strategy for mobilizing funds for agricultural census – Tanzania Experience By Lubili Marco Gambamala National Bureau of Statistics 97.7% of smallholder.
Parish Registers Online
Evidence Synthesis/Systematic Reviews of Eyewitness Accuracy
Text Based Information Retrieval
Configuration Management and Prince2
Karen Dennison Collections Development Manager
European Network of e-Lexicography
United Nations Statistics Division
Parish Registers.
Expert Group Meeting on SDG Economic Indicators in Africa
Presentation transcript:

UK Archives Discovery Forum 7 th March 2013 An open old  modern surname index for digitized archives Patrick Hanks, UWE Sean Cunningham, TNA Paul Cullen, UWE Richard Coates, UWE 1

Theme of the talk Of all linguistic and historical data, surnames are among the most unstable. Digitization of archives presents an opportunity for new approaches to statistical study of issues such as the relationship between surnames and localities. This implies new approaches to transcription of data (rigorously respecting the distinction between transcription and interpretation). We have some suggestions to make re the UKAD objective of developing international agreement on standards for mark-up and database structure. 2

Overview of the FaNUK project  Family Names of the United Kingdom (AHRC-funded)  An XML database  list of entries from 1997 electoral roll  Explanations of origins and history (in progress)  1881 census: geographical distribution of surnames before mass mobility and increased immigration  Select a “main entry” for each cluster of spellings, on the basis of etymology and frequency  20,000 main entries; 25,000 modern variant spellings  Being linked to innumerable medieval spellings  Medieval name forms could be linked to the main entries in the FaNUK database 3

Interface between history, geography, and philology H. B. Guppy (1890) was the first to suspect a systematic association between surnames and localities Guppy’s hypothesis has become increasingly statistically testable since c Vast amounts of primary historical evidence are becoming available in machine-readable form 4

Computational analysis of data We are applying techniques developed in corpus linguistics to the study of this primary onomastic data, e.g.  in the statistical association of surnames with localities on the basis of the primary evidence 5

Pegden and variants 1881 distributions (Steve Archer’s British Surname Atlas) Locative name from Pegden Farm in Lindfield (Sussex) PegdenPagdenPigden 6

Rochester 1881 distribution (Steve Archer’s British Surname Atlas) 7

Some currently available digitized sources (from TNA and elsewhere) Late 14 th -century Poll Taxes (ed. C. Fenwick) Parish Registers – Digitized by members of the Federation of Family History Societies and the LDS Church – the IGI) PROB11 (probate records of the Canterbury Prerogative Court, ) Chancery Proceedings ( ) Feet of Fines (C ): Chris Phillips – Wonderful resources! – but, alas, there are slight differences in format, which makes file comparison difficult – De we need an agreement on standard format? 8

IGI (193 million records) 9

14 th -century Poll Taxes (200,000 records) 10

Prob11 (TNA) 11

An associated project: British Academy funding Old modern surname index to C14 Poll Tax There are significant problems in relating medieval forms to modern forms of surnames. – Many people assume that the relationships are obvious, but all too often they aren’t. In many cases, linguistic expertise (supported by circumstantial evidence) is required to make the connections with confidence. E.g. Yelling (a Somerset surname) must surely be from Yelling (a place-name in Hunts), even though Somerset is far from Hunts Need for studies of early migration 12

Example: Sapsford Clemens Sabrichworth’, 1381 in Poll Tax (Colchester, Essex) – indexed by FaNUK as Sapsford – (because that’s the usual modern form of the surname – 756 bearers) – The surname derives from Sawbridgeworth (Herts), which until recently was locally pronounced Sapsford or Saps(w)’th 13

Misidentifications In the Kent Hundred Rolls Project ( 2006: Walwarecchare is modernized confidently but erroneously as Walmer. Should be Waldershare (near Deal). Middelton is modernized as Middleton. Should be Milton (Regis) (near Sittingbourne). Uppecham is modernized as Petham. Should be (East or Up) Peckham (near Maidstone). Stephani de Hokeregg is modernized as Stephen of Hucking. Should be Stephen of Hockeredge (i.e. Hockeredge near Cranbrook). Better not to modernize at all than to modernize erroneously! 14

More misidentifications In The Survey of Archbishop Pecham’s Kentish Manors , ed. Kenneth Witney, Kent Records vol. 28 (Kent Arch. Soc., 2000), the medieval forms of names are rarely given. They are wrongly identified surprisingly often: – Ferur is rendered Ferryman (but the modern surname Ferrer = ‘smith’) – Ismonger rendered Fishmonger (but modern Isemonger = ‘ironmonger’) – Yue rendered Yew as if referring to the tree (but modern Ive is from a personal name Ive) – Sewen rendered Sweyn (but modern Sewin is from a pers. name Sawin) – Idoyn rendered Jordan (but modern Iddon is from the fem. pers. name Idonea) – Cissor is sometimes rendered Sawyer, sometimes Tailor. Why? 15

Some other dubious interpretations The Northumberland Lay Subsidy Roll of 1296, ed. Constance Fraser (Soc. of Antiquaries of Newcastle upon Tyne, 1968): Thomas ad Fontem has been modernized as Thomas Spring. BUT Fons / Fontem in medieval Latin normally represents the English surname that later became Well or Wells. (There is indeed a Middle English surname atte Spring, but it refers to residence near a plantation of saplings, not to water, and it would not be represented by ad Fontem.) 16

Transcribe first; then interpret/translate! Even the simplest-looking case of a common occupational surname must be treated with caution. For example: “John Mercator” appears in dozens of medieval deeds in Canterbury Cathedral Archives (Kent). This used to be modernized in the online catalogue as John Merchant, until the archivist Elizabeth Finn noticed that a seal attached to one of these deeds gives the name as John Chapman. Molinarius may be the modern surname Miller or Milner, or Millward,... or.... Faber... Smith or Ferrer... or? 17

Concluding the problem Surnames are unstable – over 50% of all current surnames are variants of some other name. – They need to be studied statistically. – The (highly variable) medieval and Tudor forms need to be indexed to a selected modern form of each name Transcription vs. interpretation: 1.Transcribe verbatim: “diplomatic” transcriptions are preferable; digitize. If not diplomatic, declare what level of interpretation is used. 2.Then (separately) translate/interpret the data; don’t just assume that the correct modernizations are ‘obvious’ 3.Place-name scholars take a similar view : See for example O. J. Padel, ‘Place-Names and Calendaring Practices’ in M. Hicks (ed., 2012): The 15 th -century Inquisitions Post Mortem: A Companion. Boydell, Woodbridge. 18

Next Steps Summer 2012 UWE and TNA agreed a partnership for a three year project to create a digital resource to: – Identify names in selected name-rich historical records – Link early name forms to reliable headforms – Begin indexing and linguistic interpretation Original plan to analyse, index and compare three large datasets: - -Fenwick’s Poll Tax returns; -TNA’s digitised catalogue descriptions of early Chancery bills and ancient petitions; -the IGI data 19

Intended Outcomes Link early forms of names to reliable inventory of modern names in the FaNUK database Identify significant associations between surnames and localities over time Develop an Old↔Modern index of surnames Plot the continuity between medieval and modern name forms Create statistical analytic procedures for re-use with datasets Agree a publicly available ‘gold standard’ of old and modern spellings of every established UK surname (i.e. The nineteenth century names still extant) 20

New Departures For various reasons, further discussion between the partners has refined the scope of the project Project now aims to combine the intended outcomes of the former plan towards a funding bid to create: an archival tool that delivers linguistically and historically reliable authority data on ancient name forms Intended as a name-relational cataloguing toolkit to be freely available via TNA website Will cluster variant spellings of surnames according to their derivation and geographical distribution Will use linguistic and onomastic expertise of FaNUK and draw data from a wider pool of earlier TNA records series (in addition to Poll Tax, chancery bills, petitions, and IGI) 21

A toolkit for surname cataloguing An open-access toolkit resource that incorporates the authority of a structured name inventory /database will be of valuable practical benefit to a wide archival user base It will facilitate the recall of name data without damaging (linguistic or onomastic) precision It will be maintained and refined as part of TNA’s sustainable catalogue technology It will be an infrastructure not an interface Name data created to the standards defined at the start of the project (stemming from FaNUK’s existing expertise) will broaden the range of evidence included and allow contributions from diverse users over time 22

Feedback - Questions How might the archival community use such a tool ? How can archivists contribute to its development, range and usability? Should there be a mechanism for archival users to contribute name data (IP and copyright issues) What early records data should be created to provide a versatile foundation for comparison with modern name forms? How might it be delivered to users as a web tool? How might it adapt to the broadening role envisaged for TNA’s catalogue 23