Steps in archiving of web- publications in NL of Latvia Tallinn, 24.11.2005 Ivars Indāns.

Slides:



Advertisements
Similar presentations
Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
Advertisements

Dominik Stoklosa Poznan Supercomputing and Networking Center, Supercomputing Department EGEE 2007 Budapest, Hungary, October 1-5 Workflow management in.
Endeca Taking a different path Cindi Holt Information Services Manager September, 2007.
Computing Infrastructure
So You Want To Work In IT? Which jobs will you find rewarding?
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
MONITORING THE CONNECTICUT EDUCATION NETWORK Aliza Bailey 10/20/2010.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Digital archival storage for the University of Michigan Library collections.
The Next I.T. Tsunami Paul A. Strassmann. Copyright © 2005, Paul A. Strassmann - IP4IT - 11/15/05 2 Perspective Months  Weeks.
Merit Network: Connecting People and Organizations Since 1966 CALEA Compliance – A Feasibility Study October 25, 2006 Mary Eileen McLaughlin Director –
Accelerate Your Business RP IaaS (Infrastructure as a Service) IaaS.
Cluster architecture for Java web hosting at CERN CHEP 2006, Mumbai Michał Kwiatek, CERN IT Department Database and Engineering Services Group.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
Given Connections Solution
Developing PANDORA Mark Corbould Director, IT Business Systems.
FireToWire.com The Broadband Wireless Way. FireToWire.com About Fire2Wire.com Qualifications & Expertise Wireless network of 5500 square miles Contracted.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
Nelson Androes Online Achievement Level Setting Software.
WebArchiv Czech Web Archive IIPC 2007, Paris.
What is a Computer? How Does it Work?.  All a computer can do is ◦ Accept Input – You give it this ◦ Process Data – It “Thinks” about it ◦ Store Data.
5 September 2015 Culrur-exp project CULTURe EXchange Platform (CULTUR-EXP) project kick-off meeting, August 2013, Tbilisi, Georgia Joint Operational.
EE616 Technical Project Video Hosting Architecture By Phillip Sutton.
How the Web Works AGED How the Web Works Most people use an internet service provider (ISP) or an online service provider (OSP) like AOL to access.
Preparing for Your New Computer Math/Science Technology Workshop September 11, 2002.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
What kind of technical infrastructure would be required? Team 4 I.T.
Network Setup Assignment Chris Moore, Kwan Tonpoobaln, Jon Light.
TRC Mini-Grant 2002 Dell PowerEdge 2500 Server. Project Goals Provide CS students with exposure to Linux (Unix) computing environment in CS courses Provide.
Data delivery Adolf Knoll National Library of the Czech Republic.
The II SAS Testbed Site Jan Astalos - Institute of Informatics Slovak Academy of Sciences.
2-3 April 2001HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.
Billy Coss Alex Newton SMALL BUSINESS NETWORKING EQUIPMENT.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
Translational Informatics (IDTC 289). Who we are.
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
CSU - DCE Webmaster I Scaling Issues - Fort Collins, CO Copyright © XTR Systems, LLC Web Site Scaling Issues (or Size Really Does Matter) Instructor:
May 10, 2001An Overview of the Princeton University Web - Library Princeton University Library  History of Web Presence –Started 1994/5 when Mosaic browser.
“OpenCALEA” Pragmatic Cost Effective CALEA Compliance Manish Karir, Merit - Research and Development.
LINUX CLUSTERING USING OPENMOSIX Jose Matthews Computer Electronic Networking, EKU College of Business and Technology.
Comprehensive Project Management Solutions with the.NET Server family.
Granite Banc, Inc. Maralee Viox. Designing and Implementing a Local Area Network Granite Banc, Inc.
Al Cornish, Systems Librarian Washington State University Libraries Preserving Access to Multimedia Collections.
AMS02 Software and Hardware Evaluation A.Eline. Outline  AMS SOC  AMS POC  AMS Gateway Computer  AMS Servers  AMS ProductionNodes  AMS Backup Solution.
 Computer hardware refers to the physical parts of a computer and related devices. Internal hardware devices include motherboards, hard drives,
Reseller Hosting – RS Hosting
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Electricity licensing and trading rules in Latvia Mini-Forum, July 3, 2006 Sandija Audzere, Head of Energy Unit, Legal Department Public Utilities Commission.
A Complete Guide to Select the Best VPS Hosting Providers.
24-25th November 2005,Tallinn Digital Memory Archiving and Preservation of Web Publications - Lithuanian approach Jonas Juškys Project manager of Lithuanian.
F-Secure Messaging Security Gateway
Hosted Services Led by Jason Gross, Terrice McClain, & Jen Paulin
“OpenCALEA” Pragmatic Cost Effective CALEA Compliance
Server pschiu.
Virtualization OVERVIEW
Uses of Communication Technologies and Standards
Introducing FOR LIBRARIES.
”The Ball” Radical Cloud Resource Consolidation
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Overview Introduction VPS Understanding VPS Architecture
Azure’s Scalability, Array of Services in Cloud
Microsoft Azure Tools Help to Redefine Media Sharing and Processing Practices at L’Oréal Paris “Functionality for resizing, reformatting, previewing, and.
Network Attached Storage NAS100
Web Server Administration
ISPs (Internet Service Providers)
Data Communication & Networking
their business through new product lines
SUSE CaaS and Dell EMC.
digital archival storage
Presentation transcript:

Steps in archiving of web- publications in NL of Latvia Tallinn, Ivars Indāns

Chronology (2003) First attempt: 2003: Nedlib: –very short budget (~500 EUR); –technical problems. Project was cancelled.

Chronology (2005) Second attempt: 2005: Heritrix: –gathering of information before starting; –agreement of sharing of the experience with Royal Library of Sweden; –Financing from State program for “Digital Library”. Project started successfully on September, 2005

Technical infrastructure. Server. No dedicated server for harvesting; Dell PowerEdge 1600C Server: - 2,4GHz - 512MB RAM - 2x72GB HDD. Fedora Linux.

Technical infrastructure. Networking. Optical line connection to “backbone” of Latvia. Optical line serves the main workstation cluster of NLLa as well as all servers. Dramatically increased dataflow from NLLa in 2005: starting the service of Digital Library.

6 MB/s data speed What does it means in real life? Does 600MB (~CD-ROM, one small web- site) will be harvested in 100 seconds? NO!

Back to reality  Real speed of harvester is slow. Harvesting of medium size web site takes ~8 hours. The amount of archived information is very different and unpredictable. “Full scale” harvesting of sites may overfill the server.

How to improve situation? Commercial company uses 12 optical lines and “cluster of servers”- no additional info: business is business. What about NLLa? Solutions: - Improving of Hardware. - Restrictions in harvesting rules: –Limitations in “depth” of harvesting; –Restrictions of file types. I don’t know the best solution. Your opinions?

Thank you for patience. Ivars Indāns