Presentation is loading. Please wait.

Presentation is loading. Please wait.

Géraldine Camile Bibliothèque nationale de France Tallinn, 2015-01-30 1.

Similar presentations


Presentation on theme: "Géraldine Camile Bibliothèque nationale de France Tallinn, 2015-01-30 1."— Presentation transcript:

1 Géraldine Camile Bibliothèque nationale de France Tallinn,

2 Summary Context and objectives of the “subscription- based press project” Harvesting news websites with robots Results and lessons learnt The future of the project – and its alternatives 2

3

4 Collecting digital news at the BnF Harvesting of news websites since 2010 Use of crawlers 100 news websites harvested every day Only freely accessible content Using robots to collect digital equivalents of newspapers “Subscription-based” press project Obtain passwords from publishers and crawl protected content Focus on the PDF versions to ensure collection continuity As microfilming budgets for local editions of regional newspapers are decreasing 4

5 The subscription-based press project Various actors within the Library Law, Economy and Politics department Legal deposit department: printed periodicals service Legal deposit department: digital legal deposit service IT department Different skills and approaches for printed and digital periodicals Calendar A one-year experiment Started end 2012; assessment end 2013 Now in production mode 5

6

7 The harvesting workflow Selection Contact with publisher Technical instruction Web harvest Quality assurance Cataloguing Description on access UI Curators Library assistants Cataloguers Engineers Preservation Engineers 7

8 August 20 th 2014 Harvesting digital newspapers at the BnF – Clément Oury – IFLA WLIC conference8 Format Cataloguing… Link with the printed edition record Link to the archives Type: digital document Local editions

9 And access in the archives… August 20 th 2014 Harvesting digital newspapers at the BnF – Clément Oury – IFLA WLIC conference 9

10 A guided tour of the news collection August 20 th 2014 Harvesting digital newspapers at the BnF – Clément Oury – IFLA WLIC conference 10

11 Long term preservation in SPAR, BnF’s digital repository August 20 th 2014 Harvesting press websites at the BnF – Clément Oury – IFLA WLIC conference 11

12

13 August 20 th 2014 Harvesting digital newspapers at the BnF – Clément Oury – IFLA WLIC conference13 22 titles 192 local editions Start of harvest Ouest-France 53 July 19, 2012 Le Républicain lorrain 8 December 12, 2012 Le Progrès 18 April 16, 2013 Midi libre 14 May 2, 2013 L’Indépendant 3 May 2, 2013 Centre Presse 1 May 2, 2013 La Tribune 1 May 22, 2013 Mediapart 1 July 16, 2013 La Montagne 14 October 10, 2013 Le Populaire du Centre 3 October 10, 2013 La République du Centre 2 October 10, 2013 Le Berry Républicain 1 October 10, 2013 L’Écho Républicain 1 October 10, 2013 Le Journal du Centre 1 October 10, 2013 Le Dauphiné libéré 20 April 7, 2014 Les Dernières Nouvelles d'Alsace 18 April 7, 2014 L'Est Républicain 10 April 7, 2014 L'Alsace 8 April 7, 2014 Le Journal de Saône-et-Loire 7 April 7, 2014 Le Bien Public 4 April 7, 2014 Vosges Matin 2 April 7, 2014 The collections

14 August 20 th 2014 Harvesting digital newspapers at the BnF – Clément Oury – IFLA WLIC conference 14 (n° 1, oct./nov. 2012, p ) Harvested titles Map of the daily regional newspapers Vosges Matin La Liberté de l’Est

15 Main achievements The collections! Technical experimentations of harvest of protected content Creation of links between the General Catalogue and web archives Raising awareness among wider library staff about collecting digital publications Even library assistants are now managing digital documents 15

16 The dark side of the crawl News websites’ architecture may change very quickly Requires high reactivity and dedicated time of technical staff Difficulty to recover non-harvested collections Press collections disappear very rapidly from the publisher’s website Some websites are technically NOT possible to harvest with crawling robots 16

17

18 The next steps of the project Extend the harvest to new titles Improve access to collections A dedicated interface? Full-text index of the press corpus? Promote the service towards: Librarians at reference desks Researchers and other users Open remote access From the researchers desktops From regional libraries entitled to receive access to web legal deposit collections 18

19 Success and alternatives Identify alternative ways of collection Deposit from publishers through FTP? Deposit from press aggregators? Build upon the experience of the ebook deposit workflow A successful project… which needs to be complemented 19


Download ppt "Géraldine Camile Bibliothèque nationale de France Tallinn, 2015-01-30 1."

Similar presentations


Ads by Google