Long tails and Archive systems Elliot Jaffe FDIS 2005.

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
* Making Timely and Accurate Patient Care Decisions through Vista Imaging.
University of Sydney Library Sydney eScholarship Repository DSpace User Group Meeting Sten Christensen, Digital Repository Coordinator Gary Browne, Development.
PROM DEMO Raimund Moser CASE Free University of Bolzano October 2004.
1 Web 2.0: Introduction Hsinchun Chen February 2009.
Developing PANDORA Mark Corbould Director, IT Business Systems.
A Data Management Life-Cycle By David Ferderer Project Chief Chris SkinnerContractor Greg GuntherContractor
1 1© 2011 Hitachi Data Systems. All rights reserved. FILE ARCHIVING SOLUTION WITH ARKIVIO® AUTOSTOR® PRESENTER NAME DATE FILE ARCHIVING SOLUTION WITH ARKIVIO®
We help people who want to share information freely and legally Created in 1992 as sunsite.unc.edu In 1998 we became metalab.unc.edu In 2000 we became.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Storage Solutions The use case at the National Library of the.
Networked Markets Ashish Goel Mukund Sundararajan.
1 Key Notions of Versioning and the Information Good.
OCR A2 MEDIA.  Online media… What is it?  Online media… where did it begin?  Example of online production  Example of online distribution  Example.
One Laptop per Child BiB : Rural edition Samuel Klein
Video file format.
PolyServe Matrix Server™ Redefining the Way Servers Talk to Storage Carter George VP, Corporate Development PolyServe.
OCLC Research Libraries Partners 10 June 2011 Robin Murray Vice President, Global Product Management OCLC Collaboratively Building Web-Scale with Libraries.
Computer & Communications Systems Software Development Unit 1.
External Drives An external flash drive, also known as a thumb drive, is a removable storage device that connects to a USB port. A flash drive uses the.
Knowledge creation, dissemination and implementation: The Librarians role in today’s knowledge economy Stellenbosch Symposium / IFLA Presidential Meeting.
Hosted by Case Study - Storage Consolidation Steve Curry Yahoo Inc.
“Storage is Cheap” and Other Lies Lance Stuchell, University of Michigan Library Curating and Managing Research Data for Re-Use ICPSR, July 31, 2013 “Hard.
Software Architecture
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
Computing Essentials 2014 Secondary Storage © 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized.
Demystifying Deduplication. Global SMB Event Marketing 2 APPROACH: What is deduplication? Eliminate redundant data Start with the backup environment as.
JENN RILEY, HEAD, CAROLINA DIGITAL LIBRARY AND ARCHIVES WHAT EVERY LIBRARIAN NEEDS TO KNOW ABOUT DIGITAL COLLECTIONS.
 Secondary storage (or external memory) - is not directly accessible by the CPU. Secondary storage does not loose the data when the device is powered.
MULTIMEDIA DATABASES -Define data -Define databases.
1 Designing Storage Architecture for Digital Collections 2012.
New Approaches to Content Management Video Archive Appliances: Which tool is right for you? Moderator: Dan McGraw, Seven Dials Media.
An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science
Component 8/Unit 9bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9b Creating Fault Tolerant.
Springer.com Online or Invisible How Springer takes advantage of the online marketing opportunities with Google Rome, 8 March 2007 Wim van der Stelt VP.
Ensures (distributed) preservation of bitstreams Black box = easy-to-use Data are immutable, CRud Heterogeneous data (file size kByte –TByte, content independent)
Future home directories at CERN
1 May File allocation system with minimized reallocation for multimedia home server Hironori Sakakihara TA 8 Technical Secretary 100/AGS483.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
EXAMPLES OF DIGITAL ARCHIVES AND LIBRARIES Advanced Techniques in Processing Images Advanced Techniques in Processing Images Chapter 6. Slide 57.
Managing Learning Objects in Large Scale Courseware Authoring Studio Ivo Marinchev, Ivo Hristov Institute of Information Technologies Bulgarian Academy.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
The Power of Aligning Backup, Recovery, and Archive Bob Madaio Sr. Manager; Backup, Recovery and Archive Marketing EMC Corporation.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
University of Audiovisualand Online AV media the U of A Library Jacob Powell, Media Librarian, University of Auckland Library
 Before you continue you should have a basic understanding of the following:  HTML  CSS  JavaScript.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
Document Management with Office SharePoint Server 2007 Jason Morrill Program Manager Windows SharePoint Services.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
Ball State University Digital Media Repository …a project of the University Libraries Customization, Web Services, and Storage at Ball State using CONTENTdm.
This courseware is copyrighted © 2016 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –
Productivity Architect Meet Chris Bortlik Author, Blogger, Speaker.
WHAT IS CLOUD COMPUTING? Pierce County Library System.
Finnish web-archive and digital legal deposit copies
Demystifying Deduplication
“Storage is Cheap” and Other Lies:
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Chapter III, Desktop Imaging Systems and Issues: Lesson II Storing Image Data
Chapter 7.
PostBeyond Delivers an Enterprise Solution That Strengthens Communications with Employees and Integrates Seamlessly with Microsoft Office 365 OFFICE 365.
كيــف تكتـب خطـة بحـث سيئـة ؟؟
الدكتـور/ عبدالناصـر محمـد عبدالحميـد
Merging Traffic Accessing Archival Collections and Museum Artifacts Through a Common Interface Brian Wilson Digital Access & Preservation Archivist.
ورود اطلاعات بصورت غيربرخط
Presentation transcript:

Long tails and Archive systems Elliot Jaffe FDIS 2005

Archive Metrics What –Distribution of file sizes –Distribution of occupied storage –How are files accessed Why –System architecture –Scaling for access

File size studies UFS93 (1993) 12 million files UNIX only Avg. file size is 2k 90% of storage in 11% of files HUJI (2005) 4 million files UNIX + Windows Avg. file size is 8k 90% of storage in 5.5% of files

What’s Changed Then JAWS, NOW Online was expensive Offline tape storage Now Central File Servers Digital Libraries Online is cheap No offline storage XML Multimedia

Empirical Data

Questions What is the future of these distributions? Are the changes extensions of the tails with power laws, so that 10/90 and 20/80 rules no longer work and are the wrong way to think about them? Are the changes based on external factors that are unpredictable?

The Long Tail Chris Anderson (2004) – The long tail of a distribution has tremendous mass and creates new market opportunities Amazon, Netflix, Wikipedia

Today’s landscape NOW File Servers Sarbanes Oxley Digital Libraries Storage Capacity Access Frequency

Next Steps Collecting data from large storage systems –File Sizes, Created, Last Modified, Last Access, Frequency of Reads Goal: New architectures for Digital libraries –Focus on Operations –Store large and small files differently –Store very-low access files in slow access