Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institution update KB DK

Similar presentations


Presentation on theme: "Institution update KB DK"— Presentation transcript:

1 Institution update KB DK
NAS workshop Vienna, April 2017 Sabine and Tue

2 Netarchive 2015-2017 Highlights
10 Years anniverary Full text search Mapping special collections Web Danica Upgrading to Heritrix 3 Broad crawl analysis New collection strategy Social Media collection strategy Studies on content collection via API Image Search (pilote project) Using Archive-it and testing Brozzler ISO statistics Access using Citrix Secure Research Bitarchive Instanse OAI harvesting of research libraries A new e- and audio-books workflow Archive Compression on the way

3 10 Years anniverary Preparation: gathering lots of information on Netarchive Tables and information sheets

4 Full text search SoLr search
Wayback Evt. demonstration ??

5 Mapping special collections

6 Web Danica

7 Upgrading to Heritrix 3

8 New collection strategy
Now We talk about the details later on this workshop Before

9 Social Media collection strategy
Analysis Choice in process Training of the curators External partners (e.g. journalist)

10 Studies on Facebook collection via API
Digital Footprints Tool for researchers Needs user consent, even for open profiles Further development needed Whale Can collect all active, open, Danish Facebook profiles and posts. License (999€/month) Cannot solve the problem of being blokked by Facebook

11 Image Search (pilote project)
Shine MimeTypeSearch (mere info mgl.)

12 Using Archive-it and testing Brozzler
We are using Archive-it to: test sites we have problems with in NAS harvest a limited numbers of facebooks profiles download the harvested warc files and preserve them locally We have tested Brozzler with succes and wants to do more integration..

13 ISO statistics Based on ISO/TR 14873( ).

14 Access using Citrix Part of Royal Danish Library common Citrix platform ( about 40 concurrent users during a workday). Different access restriction setups (e.g. Researcher from Home/Researcher only in Reading Room) using Citrix GPO’s (Group Policy Object templates and Active Directory groups). Using given browser IE v.11 with given proxy and plugins setup and workspace (plan to change to Chrome). Plans for digital reference e.g. Zotero integration..

15 Secure Research Bitarchive Instanse
Batch jobs validated and executed by operational manager in secure environment and using a separate NAS instans. Extracted data exported to secure researcher proccessing server.

16 OAI harvesting of research libraries
Based on

17 A new workflow for e- and audio-books
Daily deduplicated extracts of 85 % of all e-/audio books through 1 aggregator. Feeds into a new structured workflow. Testing GoAnywhere for secure exchange of data. Facilitating upload of 15 % from personal publishers. Enhanced with metadata from National Bibliography aggregator. Integrated with the Library system.

18 Archive Compression – on the way
All new harvest are gzipped since february 2017 (without deduplication). Gzip of the old part is expected to be finished in autumn 2017. Using jwat basicly and recreating new CDX’s and ”middleware” datafiles for creating revisits and creation of new metadatafiles and facilitating creation of new non deduplicated CDX indexes ( 8 TB). The precompress job of 800 TB is expected to take between day’s for the distributed archive in CPH.

19


Download ppt "Institution update KB DK"

Similar presentations


Ads by Google