Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges.

Similar presentations

Presentation on theme: "Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges."— Presentation transcript:


2 Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges & opportunities 1996-2014 Peter Doorn Data Archiving and Networked Services CEDAR Mini Symposium, Amsterdam, 31 st March 2014

3 Contents Two slides about DANS Why digitize historical censuses? History of the census digitization projects 1996-2006 Results: CD-ROMs, Websites, Publications Digital preservation of the first “digitally born” census of 1960 Projects and activities since 2006 Challenges for the years to come

4 What is DANS? Institute of Dutch Academy and Research F unding O rganisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Mission: promote and provide permanent access to digital research information

5 EASY: Electronic Archiving System for self-deposit NARCIS: Gateway to scholarly information In the Netherlands Data Seal of Approval Persistent Identifier URN:NBN resolver Our services

6 Why digitize historical censuses? Important source for statistics and research Limited number of census books Preservation of 19 th and 20 th century originals Digital archiving Target audience: researchers, onderzoekers, students, local governments, amateur historians, education

7 Systematic digitization of Dutch Census Books 1995/96 possibility raised in talks between CBS and Steinmetz archive 1996: small pilot by CBS and Netherlands Historical Data Archive – Selection of material – How to digitize? – How to store? – How to pubish? – Project plan for continuation project

8 Digitization in three projects 1997 - 1999: – Microfilming and scanning 200 books, 42,500 pages – Data-entry 10,000 pages Census 1899 2002 – March 2004: – Checking and correction censuses 1795-1859 and 1930 – Archiving digitally born census 1960 and 1971 March 2003 – July 2006: Life Courses in Context – First project in humanities funded by NWO “large investments” – In collaboration with Historical Sample of the Netherlands (Kees Mandemakers, IISG) – Data-entry censuses 1869-1956 – Scanning handwritten tables 1947 and OCR tests – Documentation, harmonisation, “linking”, access, research

9 Digitizing Censuses: division of tasks Collaboration project CBS and NHDA/NIWI/DANS since early 1997 Subsidized by NWO and KNAW CBS: –data entry tables Census 1899 –Statline publication NIWI: –Scanning Census 1795-1971 –OCR of Introduction to Census 1899 –First Website Census 1899

10 Results 1999 Set of 5 CD-ROMs –images of censuses 1795-1971 (200 books, c. 42,500 pages Set van 2 CD-ROMs –Database Census 1899 –27 books –10.000 pages > 17,000,000 numbers/characters Introduction to Census 1899 (also as Website) StatLine publication tables of 1899 Images 1899 Conference & book with analyses of the Census 1899 (2001)

11 CD-ROM publications in September 1999

12 Book publications [related projects: Historical GIS, HASH, HDNG]

13 Website of Introduction to Census 1899 Launched in September 1999

14 Census 1899 also published in CBS StatLine


16 The 1960 census: the first born digital census in the Netherlands First computer at CBS: X1 Electrologica 1969: punch cards transferred to Steinmetz Archive Kf. 100 needed for reconstructing files Bitrot, data input errors and more… 121586004 3995813013110 3 52801322981010 1061 121586010 3855413012060 3 52701322981010 1060 12158W000 3755113010010 2 52801322981010 0061 121586001 3406713012050 00 0152701322981010 1860 121586003 4225113013110 2 52801322113010 0061 1115100421115 6302120001000995581111405057126086200 B’(‘N3=‘)’5ZD,10B 1760 1115100421110 1306363301000075-81111718035817732405 SC2+NSC3); 1770 1115100421116 1305352202000900521111205041728284204 ‘,/’)’); 1780 1115100421119 4303430001000930521111203038829276500 B’(‘N3=‘)’5ZD,10B 1790

17 The size of the problem PersonsMissing personsPersons too many Men183,970254,100 Women182,7557,661 Total366,725261,761

18 Lanceerknop voor de geheel vernieuwde website Launched in November 2004

19 Web statistics 2004-2009 194.000 visitors (3300 per month) 2 mln. page views 0,5 Tb data down- loaded

20 Projects and activities since 2006 Digitization of “transparancies” and collotypes NLGIS – historical GIS Checking and correction Harmonisation Archiving in EASY Scanning historical data at CBS & CBS website HISTEL project CEDAR project

21 Digitization of “transparencies” and collotypes (early photo copies) Totaaloverzicht lichtdrukken/transparantenTekens per paginaOpmerking TellingBandenPagina'sTabel-inhoud Voorkolom (gedrukt) Blanco cellenTotaal BDT 19303865007005005501750 BDT 195048150007005004001600 BDT 1963224700004505003501300 BDT 1978 1314 micro- fiches 722703505003001150 digitaal beschikbaar BRT 19302172761520050012001900 VT & BRT 1947 80292004506004501500 WT 1947206935400 150950 WT 195613147815500 1500 VT 1960?75000500 1500 deels digitaal beschikbaar VT 1971 geprint uit bestanden 87000500 1500 digitaal beschikbaar Totaal4373354755004901465 Digitaal beschikbaar2267704505004331383 Totaal excl. digitaal beschikbaar 2105654885005131500

22 2006: Scanning and OCR of transparancies Scan record attempt, February 2005: Census 1947 C. 12.500 pages scanned in one day

23 Manual data entry of 1947 Census Templates prepared for each table type Data entry carried out by Xerox (India) Supervision by Jan Jonker Archived in and available from DANS EASY

24 Project idea June 2009: New portal historical population data







31 Checking and correction Most underestimated task of the project Ongoing work since 1999… Distinction between data-entry / conversion errors and source errors Data-entry errors are corrected Error detection method based on differences between calculated and given row and column totals Source errors are indicated with notes… Tom Vreugdenhil is the hero of error checking and correction

32 Harmonisation Three key variables: – Occupations – Municipalities – Religious denomination

33 Harmonizing occupations Occupations available for 1849, 1889, 1899, 1909, 1920, 1930 and 1947 Coded according to Historical International Standard Codes of Occupations (HISCO) Results: – Coded occupations and exact content and context of each table with unique occupational titles (Excel & Access) – Total of all unique occupational titles in the censuses (Excel & Access) – Excel Workbook Lookup tool to code occupations automatically – Excel Workbook hisco toolbar to search for codes, occupational titles and descriptions of occupations in the HISCO databaseHISCO database

34 Harmonizing municipalities Based on the work by Onno Boonstra and Ad van der Meer “Repertorium van Nederlandse gemeenten 1812-2006” New standard code (“Amstrdam code”) for all Dutch municipalities that have ever existed Database tool to code municipalities in the censuses ID amsterdamse _codebegindatumeinddatumgemeente_provgemeenteprovincie 1100011-1-18121-10-1816Almenum Friesland 2100021-1-18121-12-1999Zuidlaren Drenthe 3100021-12-1999Tynaarlo Drenthe 4100031-1-181230-1-1820Zeddam Gelderland 5100041-1-1812Zijpe Noord-Holland 6100051-10-1816Opsterland Friesland 7100051-1-18121-10-1816Ureterp Friesland

35 CBS Historical Collection website: 19 th and 20 th century publications

36 HISTEL project Umbrella project to oversee the various census activities that are going on, supervised by René van Horik: Transfer of data, website – new agreement between CBS and DANS – publish as extended data guide / paper in new DANS data "Anonymous open access" to the census data in EASY Archiving of existing data and newly scanned tables in EASY Version management, updating corrected tables Lisaison with CEDAR

37 Archiving everything in EASY

38 Why a CEDAR project? Great examples of LOD projects on new census data – Are they applicable to historical tables The historical censuses are stored in numerous containers in an archival silo – Can we open up the containers and silos to connect the data? – Can we make the data comparable over time? – Can we link it to outside sources? Is it viable to publish the whole DANS archive as LOD? – Provide insight to the possibilities for more data collections

39 Lots of challenges left… CEDAR: publishing the historical censuses as LOD – First priority for linking: linking the census data over time – Further harmonization is a prerequisite for this – LOD offers new insight in the extent of the harmonization problem and a systematic solution (we expect ;-) Archiving LOD – PRELIDA (PREserving Linked Data) project offers insight in the requirements and options – Storing the RDF is only part of the answer Lots of images of historical census tables left to turn into figures Preserving the census services: no longer supported, NLGIS tool already Wish for 2020: a user-friendly tool to link historical census data over time and to external sources

40 Data Archiving and Networked Services DANS is an institute of KNAW en NWO Thank you for your attention twitter: @pkdoorn

Download ppt "Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges."

Similar presentations

Ads by Google