Presentation is loading. Please wait.

Presentation is loading. Please wait.

The spatial patterns in historical texts: combining corpus linguistics and geographical information systems to explore places in Victorian newspapers Exploring.

Similar presentations


Presentation on theme: "The spatial patterns in historical texts: combining corpus linguistics and geographical information systems to explore places in Victorian newspapers Exploring."— Presentation transcript:

1 The spatial patterns in historical texts: combining corpus linguistics and geographical information systems to explore places in Victorian newspapers Exploring Historical Sources with Language Technology: Results and Perspectives Huygens Institute The Hague, 8-9 December 2014 Amelia Joulain-Jay Ian Gregory Andrew Hardie Acknowledgements: Alistair Baron, David Cooper, Chris Donaldson, Daniel Hartmann, Tony McEnery, Patricia Murrieta-Flores, Paul Rayson, C.J. Rupp, Paul Atkinson, Catherine Porter (Lancaster) Sarah Hastings (Mt Holyoke College) Claire Grover (Edinburgh) – geo-parsing; Richard Deswarte – help with the HistPop data; James Baker – British Library

2 http://www.lancaster.ac.uk/fass/projects/spatialhum.wordpress/ GIS = Geographical Information Systems (a set of tools for manipulating and analysing data with a spatial component) Funding:European Research Council (ERC) Base:Lancaster University History Department Dates:2012-2016 Goal:develop & apply methodologies for analysing unstructured texts within a GIS environment Discipline:interdisciplinary team including historians, geographers, literary scholars, linguists and computer scientists Project leader:Ian Gregory

3 http://www.lancaster.ac.uk/fass/projects/ spatialhum.wordpress/ http://cass.lancs.ac.uk/ Aims to bring corpus approaches to a range of social sciences Aims to develop ways of using GIS to facilitate qualitative analysis of text Intersection: Using corpus linguistics (with a focus on places) for historical research

4 How can we explore the geographies in historical texts?

5 Can we explore the geographies in historical texts? Three approaches: -place-centred reading -geographical text analysis -geographically enhanced reading All of them involve both CLOSE and DISTANT forms of reading.

6 Examples drawn from… The Lake District projectThe Health and Disease project The c19th newspapers project 80 georeferenced texts about the Lake District Registrar General’s reports for England and Wales British Library’s c19th newspapers (part 1) Includes travelogues, poetry, novels and other text-types by more- and less- known authors such as Th. Pennant, S. T. Coleridge, W. Wordsworth, H. Martineau Reports contain information about births, marriages and deaths + occasional discussions of specific topics Full runs of 48 regional & national newspapers published in Great Britain during the c19th century Over 1,5 million wordsOver 10,000 pagesOver 2 million pages (est. 30+ billion words) 1622-19001840-18801800-1900 Available through the geo-text explorer Available on Histpop.orgAvailable through the GALE CENGAGE portal

7 Place-centred reading EXAMPLES: LAKE DISTRICT PROJECT: the geo-text explorer C19th NEWSPAPERS: Infant mortality in rural Suffolk Close reading centred on places

8 http://www.lancaster.ac.uk/fass/projects/spatialhum/geotext/ The geo-text explorer

9 http://www.lancaster.ac.uk/fass/projects/spatialhum/geotext/ The geo-text explorer

10 Infant Mortality in Rural Suffolk With thanks to Sarah Hasting, Mt Holyoke College

11 Infant Mortality in Rural Suffolk With thanks to Sarah Hasting, Mt Holyoke College

12 SubjectSearch-terms Sanitary conditionsSewage, sewerage, nuisance(s), ditch(es), drain(s), foul, pit, cesspit(s), cesspool(s), stench, manure, pollut[ion/ing] Sanitary authorities and inspections Inspector(s), inspection, sanitary inspector(s), inspector(s) of nuisances, sanitation committee(s), authorit[y/ies], Board of Guardians, surveyor(s), landlord(s), responsibility[y/ies], sanitation board, Rural Sanitary Authorit[y/ies] Housingcottage(s), farmhouse(s), yard, roof,floor, seep(ing), crowd(ed), structure, build(ing), construction, home(s), visit(ation), inspect(ion) Local GovernmentLocal Government Board, Chancellor, Board of Guardians, boundar[y/ies], divided parishes, boundary act, rural district, west Suffolk, legislation, Parliament, commissioners, committee, chair(s), parochial, deanery, church Citizenship and labour citizen(s), labourer(s), labour, employ(ment), employees, dut[y/ies], moral(s/ity), ethic(al/s), pride, improve(ment), civil, civilian, parishioner(s) Infant Mortality in Rural Suffolk

13 About Risbridge – “It is an old maxim that the rulers’ sin and pestilence visits the people. The sanitary condition of this town unhappily affords another illustration of this common truth.” The Bury and Norwich Post and Suffolk Herald, September 12, 1865; pg. 7. – “…showing the abominable state of things existing in Haverhill, he entirely agreed with him that no place required a Local Board more.” The Bury and Norwich Post and Suffolk Herald, August 28, 1877; pg. 6 About Sudbury – “The town should be thoroughly cleansed, the poorer residents instructed what to do if the disease broke out, and some fixed uniform plan resolved upon. A Committee (already appointed by the Corporation), who should, if necessary, take legal proceedings in any case where remonstrance and suggestion did not avail. The drains should be well seen to; disinfectants provided, to be given gratuitously to the poor…” The Bury and Norwich Post and Suffolk Herald, August 07, 1866; pg. 8. Infant Mortality in Rural Suffolk

14 Geographical text analysis EXAMPLES: C19TH NEWSPAPERS: European countries in The Era HEALTH AND DISEASE: Registrar general about water-borne diseases Asks: What places are mentioned? (and uses GIS to answer) What is said about these places? (and uses collocation to answer)

15 Russia France Turkey X = year Y = number of occurrences Mentions of European countries in the Era 1854-1856: Crimean war 1877-1878: Russo-Turkish war 1870-1871: Franco- Prussian war 1856-1860: 2 nd opium war

16 Mentions of European countries in the Era X = year Y = number of occurrences

17 RankingWord Total no. in whole corpus Times it occurs with ‘Russia’ Issues in which it occurs with ‘Russia’ Log Ratio value (strength of association) 1 Bristle 11030 11.453 4 Turkey 28792371689.389 5 SHEETiNG 15012 9.344 6 AUSTRIA 47203493119.221 7 RIGA 17913119.193 8 Prussia 44633022859.083 9 PETERSBURG 159898918.932 10 Emperor 127257184918.804 11 Italy 71973923838.75 12 Sweden 106946 8.393 13 Moscow 118250498.367 14 Vladimir 39014128.121 15 Czar 244385808.074 17 Constantine 83421197.593 18 Poland 281565627.465 19 Germany 70841611537.442 20 Emperors 4621087.37 22 Empress 540599857.124 23 intrigues 6041187.115 24 PERSIA 135021186.884 25 leather 403860586.817 26 Jews 130119166.792 27 invasion 109215146.702 28 Hungary 8341096.503 29 Handles 159818 6.412 Mentions of European countries in the Era

18 RankingWord Total no. in whole corpus Times it occurs with ‘Russia’ Issues in which it occurs with ‘Russia’ Log Ratio value (strength of association) 23intrigues6041187.115 27invasion109215146.702 43alliance281217 5.507 44war317011841505.447 47relations398721 5.307 50treaty518121204.927 53policy1028538374.793 55designs925930274.603 56commerce310110 4.596 57forces503816144.574 62interior827719184.104 64demands621114124.078 65ally49351194.062 66peace1302628264.009 73influence1174119143.599 92views1137511 2.855 97interests1329912 2.755 100power6238149432.555 102powers2047316 2.547 105states2330616 2.36 Mentions of European countries in the Era

19 Cholera, diarrhoea, dysentery Kulldorf Scan Statistic RG’s discussion of water-borne diseases

20 Supply, supplied, company, companies, sewage, reservoirs and waterworks – 860 collocations with place-names RG’s discussion of water-borne diseases

21 Deaths Highest 10: Liverpool, West Derby, Birmingham, Sheffield, Chorlton, Salford, Leeds, Aston, West Ham, Leicester (Highest London: Southwark,14 th ) Death rate Highest 10: Liverpool, Preston, Manchester, Salford, Leicester, Birmingham, Prestwich, St. Olave Bermondsey, Whitechapel, and St. George in the East RG’s discussion of water-borne diseases

22 CityNumber of deaths (per decade) Of which percentage of total deaths In text: n° of co- occurrences wth water-born diseases Of which percent of total mentions of water-borne diseases (in text) London33,63616.251240.0 Liverpool10,2424.9352.7 Manchester10,5675.1252.0 Birmingham6,3873.180.6 Sheffield3,162 1.5 60.5 Leeds2,867 1.4 70.5 Bristol1,563 0.8 131.0 Portsmouth1,131 0.5 141.1 Southampton621 0.3 131.0 England & Wales206,552 100 1,278100 RG’s discussion of water-borne diseases Legend: South of London; North of London

23 Words that collocate with sentences that contain place-names and cholera/diarrhoea/dysentery: London: – Water related: Water (n=154, z=32.755); water-fields (11, 31.34); supplied (59, 26.49); companies (34, 23.75); waterfields (7, z=22.39); elevation (21, 22.31); company (46, 22.00); supplying (9, 14.51); elevations (6, 12.66); waters (17, 10.88); supply (32, 10.86); impure (11, 10.60); ditches (4, 8.90); pipes (6, 8.14); matter [organic or cholera] (15, 7.04); waterworks (5, 6.723); etc – Research related: M ap (n=8, z=14.98); extract [from report] (5, 12.16); circular [a circular] (4, 10.09); exhibiting (4, 9.44); Professor (6, 7.54); diagrams (3, 6.98); Dr. (17, 5.73); report (32, 5.67); return [a return] (10, 3.81) – Descriptive : Epidemic (n=56; z=18.47); outbreak (19, 15.70); infected (5, 6.56); epidemics (5, 3.58) Liverpool, Manchester, Birmingham: – Descriptive: Prevailed, epidemic, occurred, deaths, prevalent, mortality – Eg. “In Liverpool more children of this age died from lung diseases than from diarrhoea or diseases of the brain, and the high rate of mortality there was mainly due to those three causes…” 37 th Annual Report, 1874 – Eg. “If English towns are selected for comparison it will be seen that the borough of Liverpool was the most unhealthy in 1866; for by a malignant fever in winter and cholera in summer, the mortality of the year was raised to 4.185, while that of Manchester was 3.195” RG’s discussion of water-borne diseases

24 Geographically enhanced reading EXAMPLES: LAKE DISTRICT PROJECT: Mapping routes using cost-surface analysis Involves extrapolating geographical patterns which are alluded to in the texts

25 Mapping routes using cost-surface analysis Focus on 4 tours: Young (1768), Pennant (1769), Gray (1769) and Pennant (1774)

26 Conclusion (findings) These approaches have allowed us to learn about: – The importance of governance in facilitating or hindering improvements in infant mortality in rural Suffolk, end c19th – The registrar general’s biases when discussing water-borne diseases in the mid c19th – The importance of local road infrastructure for the development of tourism in the Lake District, end c18th

27 Conclusion (methods) Results are promising: Combining tools from NLP, corpus linguistics and GIS, we can: - Focus reading on passages mentioning specific places - Summarise the geographies in large volumes of texts - Explore in detail what is being said about particular places - Extrapolate from the basic geographies in text But also: - Interdisciplinary collaborations are fruitful - Close and distant forms of reading complement each other well - Quantitative and qualitative sources complement each other well - There’s still a long way to go

28 Thank you for listening! NB: GIS summer school at Lancaster in July; info: contact Ian Gregory (i.gregory@lancaster.ac.uk )i.gregory@lancaster.ac.uk or Andrew Hardie (a.hardie@lancaster.ac.uk )a.hardie@lancaster.ac.uk


Download ppt "The spatial patterns in historical texts: combining corpus linguistics and geographical information systems to explore places in Victorian newspapers Exploring."

Similar presentations


Ads by Google