The spatial patterns in historical texts: combining corpus linguistics and geographical information systems to explore places in Victorian newspapers Exploring Historical Sources with Language Technology: Results and Perspectives Huygens Institute The Hague, 8-9 December 2014 Amelia Joulain-Jay Ian Gregory Andrew Hardie Acknowledgements: Alistair Baron, David Cooper, Chris Donaldson, Daniel Hartmann, Tony McEnery, Patricia Murrieta-Flores, Paul Rayson, C.J. Rupp, Paul Atkinson, Catherine Porter (Lancaster) Sarah Hastings (Mt Holyoke College) Claire Grover (Edinburgh) – geo-parsing; Richard Deswarte – help with the HistPop data; James Baker – British Library
GIS = Geographical Information Systems (a set of tools for manipulating and analysing data with a spatial component) Funding:European Research Council (ERC) Base:Lancaster University History Department Dates: Goal:develop & apply methodologies for analysing unstructured texts within a GIS environment Discipline:interdisciplinary team including historians, geographers, literary scholars, linguists and computer scientists Project leader:Ian Gregory
spatialhum.wordpress/ Aims to bring corpus approaches to a range of social sciences Aims to develop ways of using GIS to facilitate qualitative analysis of text Intersection: Using corpus linguistics (with a focus on places) for historical research
How can we explore the geographies in historical texts?
Can we explore the geographies in historical texts? Three approaches: -place-centred reading -geographical text analysis -geographically enhanced reading All of them involve both CLOSE and DISTANT forms of reading.
Examples drawn from… The Lake District projectThe Health and Disease project The c19th newspapers project 80 georeferenced texts about the Lake District Registrar General’s reports for England and Wales British Library’s c19th newspapers (part 1) Includes travelogues, poetry, novels and other text-types by more- and less- known authors such as Th. Pennant, S. T. Coleridge, W. Wordsworth, H. Martineau Reports contain information about births, marriages and deaths + occasional discussions of specific topics Full runs of 48 regional & national newspapers published in Great Britain during the c19th century Over 1,5 million wordsOver 10,000 pagesOver 2 million pages (est. 30+ billion words) Available through the geo-text explorer Available on Histpop.orgAvailable through the GALE CENGAGE portal
Place-centred reading EXAMPLES: LAKE DISTRICT PROJECT: the geo-text explorer C19th NEWSPAPERS: Infant mortality in rural Suffolk Close reading centred on places
The geo-text explorer
The geo-text explorer
Infant Mortality in Rural Suffolk With thanks to Sarah Hasting, Mt Holyoke College
Infant Mortality in Rural Suffolk With thanks to Sarah Hasting, Mt Holyoke College
SubjectSearch-terms Sanitary conditionsSewage, sewerage, nuisance(s), ditch(es), drain(s), foul, pit, cesspit(s), cesspool(s), stench, manure, pollut[ion/ing] Sanitary authorities and inspections Inspector(s), inspection, sanitary inspector(s), inspector(s) of nuisances, sanitation committee(s), authorit[y/ies], Board of Guardians, surveyor(s), landlord(s), responsibility[y/ies], sanitation board, Rural Sanitary Authorit[y/ies] Housingcottage(s), farmhouse(s), yard, roof,floor, seep(ing), crowd(ed), structure, build(ing), construction, home(s), visit(ation), inspect(ion) Local GovernmentLocal Government Board, Chancellor, Board of Guardians, boundar[y/ies], divided parishes, boundary act, rural district, west Suffolk, legislation, Parliament, commissioners, committee, chair(s), parochial, deanery, church Citizenship and labour citizen(s), labourer(s), labour, employ(ment), employees, dut[y/ies], moral(s/ity), ethic(al/s), pride, improve(ment), civil, civilian, parishioner(s) Infant Mortality in Rural Suffolk
About Risbridge – “It is an old maxim that the rulers’ sin and pestilence visits the people. The sanitary condition of this town unhappily affords another illustration of this common truth.” The Bury and Norwich Post and Suffolk Herald, September 12, 1865; pg. 7. – “…showing the abominable state of things existing in Haverhill, he entirely agreed with him that no place required a Local Board more.” The Bury and Norwich Post and Suffolk Herald, August 28, 1877; pg. 6 About Sudbury – “The town should be thoroughly cleansed, the poorer residents instructed what to do if the disease broke out, and some fixed uniform plan resolved upon. A Committee (already appointed by the Corporation), who should, if necessary, take legal proceedings in any case where remonstrance and suggestion did not avail. The drains should be well seen to; disinfectants provided, to be given gratuitously to the poor…” The Bury and Norwich Post and Suffolk Herald, August 07, 1866; pg. 8. Infant Mortality in Rural Suffolk
Geographical text analysis EXAMPLES: C19TH NEWSPAPERS: European countries in The Era HEALTH AND DISEASE: Registrar general about water-borne diseases Asks: What places are mentioned? (and uses GIS to answer) What is said about these places? (and uses collocation to answer)
Russia France Turkey X = year Y = number of occurrences Mentions of European countries in the Era : Crimean war : Russo-Turkish war : Franco- Prussian war : 2 nd opium war
Mentions of European countries in the Era X = year Y = number of occurrences
RankingWord Total no. in whole corpus Times it occurs with ‘Russia’ Issues in which it occurs with ‘Russia’ Log Ratio value (strength of association) 1 Bristle Turkey SHEETiNG AUSTRIA RIGA Prussia PETERSBURG Emperor Italy Sweden Moscow Vladimir Czar Constantine Poland Germany Emperors Empress intrigues PERSIA leather Jews invasion Hungary Handles Mentions of European countries in the Era
RankingWord Total no. in whole corpus Times it occurs with ‘Russia’ Issues in which it occurs with ‘Russia’ Log Ratio value (strength of association) 23intrigues invasion alliance war relations treaty policy designs commerce forces interior demands ally peace influence views interests power powers states Mentions of European countries in the Era
Cholera, diarrhoea, dysentery Kulldorf Scan Statistic RG’s discussion of water-borne diseases
Supply, supplied, company, companies, sewage, reservoirs and waterworks – 860 collocations with place-names RG’s discussion of water-borne diseases
Deaths Highest 10: Liverpool, West Derby, Birmingham, Sheffield, Chorlton, Salford, Leeds, Aston, West Ham, Leicester (Highest London: Southwark,14 th ) Death rate Highest 10: Liverpool, Preston, Manchester, Salford, Leicester, Birmingham, Prestwich, St. Olave Bermondsey, Whitechapel, and St. George in the East RG’s discussion of water-borne diseases
CityNumber of deaths (per decade) Of which percentage of total deaths In text: n° of co- occurrences wth water-born diseases Of which percent of total mentions of water-borne diseases (in text) London33, Liverpool10, Manchester10, Birmingham6, Sheffield3, Leeds2, Bristol1, Portsmouth1, Southampton England & Wales206, , RG’s discussion of water-borne diseases Legend: South of London; North of London
Words that collocate with sentences that contain place-names and cholera/diarrhoea/dysentery: London: – Water related: Water (n=154, z=32.755); water-fields (11, 31.34); supplied (59, 26.49); companies (34, 23.75); waterfields (7, z=22.39); elevation (21, 22.31); company (46, 22.00); supplying (9, 14.51); elevations (6, 12.66); waters (17, 10.88); supply (32, 10.86); impure (11, 10.60); ditches (4, 8.90); pipes (6, 8.14); matter [organic or cholera] (15, 7.04); waterworks (5, 6.723); etc – Research related: M ap (n=8, z=14.98); extract [from report] (5, 12.16); circular [a circular] (4, 10.09); exhibiting (4, 9.44); Professor (6, 7.54); diagrams (3, 6.98); Dr. (17, 5.73); report (32, 5.67); return [a return] (10, 3.81) – Descriptive : Epidemic (n=56; z=18.47); outbreak (19, 15.70); infected (5, 6.56); epidemics (5, 3.58) Liverpool, Manchester, Birmingham: – Descriptive: Prevailed, epidemic, occurred, deaths, prevalent, mortality – Eg. “In Liverpool more children of this age died from lung diseases than from diarrhoea or diseases of the brain, and the high rate of mortality there was mainly due to those three causes…” 37 th Annual Report, 1874 – Eg. “If English towns are selected for comparison it will be seen that the borough of Liverpool was the most unhealthy in 1866; for by a malignant fever in winter and cholera in summer, the mortality of the year was raised to 4.185, while that of Manchester was 3.195” RG’s discussion of water-borne diseases
Geographically enhanced reading EXAMPLES: LAKE DISTRICT PROJECT: Mapping routes using cost-surface analysis Involves extrapolating geographical patterns which are alluded to in the texts
Mapping routes using cost-surface analysis Focus on 4 tours: Young (1768), Pennant (1769), Gray (1769) and Pennant (1774)
Conclusion (findings) These approaches have allowed us to learn about: – The importance of governance in facilitating or hindering improvements in infant mortality in rural Suffolk, end c19th – The registrar general’s biases when discussing water-borne diseases in the mid c19th – The importance of local road infrastructure for the development of tourism in the Lake District, end c18th
Conclusion (methods) Results are promising: Combining tools from NLP, corpus linguistics and GIS, we can: - Focus reading on passages mentioning specific places - Summarise the geographies in large volumes of texts - Explore in detail what is being said about particular places - Extrapolate from the basic geographies in text But also: - Interdisciplinary collaborations are fruitful - Close and distant forms of reading complement each other well - Quantitative and qualitative sources complement each other well - There’s still a long way to go
Thank you for listening! NB: GIS summer school at Lancaster in July; info: contact Ian Gregory or Andrew Hardie