Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:

Similar presentations


Presentation on theme: "Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:"— Presentation transcript:

1 Aarhus

2 BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”: PDFs of daily newspapers –Tests with other kinds of content Work on direct deposit of e-books

3 BnF main topics – 2013 – access and preservation sides Merging professional and public WB –Various optimizations –Clickable permalink… Draw links between web archives and BnF indexing and promotion tools –general catalogue, data.bnf.fr… Open access to web archives in regional libraries –Legal and technical aspects Start ingesting our web archives in our digital repository

4 Direct deposit for e-books? High-level discussions between National Publishers Union and BnF –A better international framework: IFLA statement on legal deposit, FEP/CENL declaration… Why not crawling? –A better unitary indexation of each e-book –No problems of DRMs –Discussing directly with publishers

5 Direct deposit for e-books? / technical side A technical layer is available: the extranet for publishers –2011: digital legal deposit forms –2012/3: direct transfer of metadata (ONIX) –2013/4: ebooks? What do we need to decide? –Who will be the main interlocutor? –How many and what kind of formats? What validation? Is it possible to refuse? –What link between the paper and digital version in the catalogue? –What access tool? Gallica or web archives?

6 RESAW project : some keywords Networking (researchers and heritage institutions) Standards and collection quality Shared tools and services (storage infrastructure, analyzing tools, portal) Methods and training

7 RESAW project : interest for BnF Promote the use of web archives towards researchers Help launching international and national research programs Offer groundbreaking tools and services Get feedback about our collection development policies Promote the building and use of web archives towards high level decision makers

8 Current situation at BnF No current research project –But the Web legal deposit team involved in research frameworks: “Labex” : “excellence laboratories” –Participation in the “Hypertext corpus initiative framework” (lead: Medialab) Relationships with researchers –Political sciences (Political science institute in Paris and Grenoble, universities of Nancy and Cergy) –Social sciences (university of Paris 1, Grenoble) –Netart (Avignon) –Web metrics (AFNIC)? Relationships with associations (literature, sustainable development…)

9 International initiatives to follow up Collaborative web harvesting –EU elections, “Olympics” project, Vaclav Havel collection –Use of “nomination tool” provided by University of North Texas Portal and shared access –IIPC website, Memento Research project –BL/IA/JISC project on.uk analysis –80 Tb of data provided by IA –Common crawl project (?) Training –PhD sponsorship (UNT)

10 Questions and comments The networked we dream about! Some objectives already (partially) covered by IIPC –standards, interoperability, shared portal Legal issues will be very difficult to solve Be cautious with the term “quality” (prefer relevancy for specific goals?) What will you ask for? –Money, doctoral students, engineers…


Download ppt "Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:"

Similar presentations


Ads by Google