EndRE: An End-System Redundancy Elimination Service.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

REfactor-ing Content Overhearing to Improve Wireless Performance Shan-Hsiang Shen Aaron Gember Ashok Anand Aditya Akella abc 1d ab 1.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
REDUNDANCY IN NETWORK TRAFFIC: FINDINGS AND IMPLICATIONS Ashok Anand Ramachandran Ramjee Chitra Muthukrishnan Microsoft Research Lab, India Aditya Akella.
Day 20 Memory Management. Assumptions A process need not be stored as one contiguous block. The entire process must reside in main memory.
CS 153 Design of Operating Systems Spring 2015
Virtual Memory Chapter 8.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Virtual Memory Chapter 8.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Computer Organization Cs 147 Prof. Lee Azita Keshmiri.
CSI 400/500 Operating Systems Spring 2009 Lecture #9 – Paging and Segmentation in Virtual Memory Monday, March 2 nd and Wednesday, March 4 th, 2009.
Mercury: Supporting Scalable Multi-Attribute Range Queries A. Bharambe, M. Agrawal, S. Seshan In Proceedings of the SIGCOMM’04, USA Παρουσίαση: Τζιοβάρα.
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Wide-area cooperative storage with CFS
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 28 Fasih ur Rehman.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
DATA DEDUPLICATION By: Lily Contreras April 15, 2010.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
Improving Content Addressable Storage For Databases Conference on Reliable Awesome Projects (no acronyms please) Advanced Operating Systems (CS736) Brandon.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Authors: Haowei Yuan, Tian Song, and Patrick Crowley Publisher: ICCCN 2012 Presenter: Chai-Yi Chu Date: 2013/05/22 1.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
EndRE: An End-System Redundancy Elimination Service Bhavish Aggarwal, Aditya Akella, Ashok Anand, Athula Balachandran, Pushkar Chitnis, Chitra Muthukrishnan,
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Operating Systems Unit 7: – Virtual Memory organization Operating Systems.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA
Operating Systems Session 7: – Virtual Memory organization Operating Systems.
Chapter 5 Record Storage and Primary File Organizations
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
COMPUTER NETWORKS CS610 Lecture-27 Hammad Khalid Khan.
Information Retrieval in Practice
Computer Organization
Chapter 2 Memory and process management
CS703 - Advanced Operating Systems
File System Implementation
Chapter 8 Virtual Memory
Kalyan Boggavarapu Lehigh University
Computer Architecture
CPSC 457 Operating Systems
Cooperative Caching, Simplified
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
ICOM 5016 – Introduction to Database Systems
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Ch 17 - Binding Protocol Addresses
COMP755 Advanced Operating Systems
Virtual Memory 1 1.
Presentation transcript:

EndRE: An End-System Redundancy Elimination Service

Identify and remove redundancy Implemented either at IP or socket layer Accomplished in two steps:  Fingerprinting  Matching and encoding

Fingerprinting Selecting a few “representative regions” for. the current block of data handed down by application(s)

Matching and encoding Approaches for identification of redundant content (given representative regions have been identified)  Chunk-Match  Max-Match These two approaches differ in the trade-off between the memory overhead imposed on the server and the effectiveness of RE

Fingerprinting: Balancing Server Computation with Effectiveness

Notation and terminology Data block(S): certain amount of data handed down by an application w(S>>w) represent the size of the minimum redundant string (contiguous bytes) that is to be identified Number of potential candidates?

1/p representative candidates are chosen. P is varied based on load. Markers : The first byte of these chosen candidate strings Chunks: The string of bytes between two markers Fingerprints: a pseudo-random hash of fixed w-byte strings beginning at each marker Chunk-hashes: hashes of the variable sized chunks

Fingerprinting algorithms MODP MAXP FIXED SAMPLEBYTE

MODP Marker identification and fingerprinting both handled by same hash function per block computational cost is independent of the sampling period, p

MAXP markers are chosen as bytes that are local- maxima over each region of p bytes of the data block Once the marker byte is chosen, an efficient hash function such as Jenkins Hash can be used to compute the fingerprint By increasing p, fewer maxima-based markers need to be identified

FIXED A content-agnostic approach. Select every p th byte as a marker. (incurs no computational cost) Once markers are chosen, S/p fingerprints are computed using Jenkins Hash as in MAXP.

SAMPLEBYTE uses a 256-entry lookup table with a few predefined positions set a byte is chosen as a marker if the corresponding entry in the lookup table is set fingerprint is computed using Jenkins Hash, and p/2 bytes of content are skipped before the process repeats

SAMPLEBYTE

Matching and Encoding: Optimizing Storage and Client Computation

Matching and encoding Accomplished in two ways-  Chunk match: data that repeat in full across data blocks  Max-Match: maximal matches around fingerprints that are repeated across data blocks

Chunk match Chunk-hashes from payloads of future data blocks are looked up in the Chunk-hash store to identify if one or more chunks have been encountered earlier. Once matching chunks are identified, they are replaced by meta-data.

EndRE optimization Offloads all storage management and computation to servers Client simply maintains a fixed-size circular FIFO log of data blocks For each matching chunk, the server encodes and sends a four-byte tuple of the chunk in the client’s cache The client “de-references” the offsets sent by the server and reconstructs the com-pressed regions from local cache

Drawbacks of chunk match  can only detect exact matches in the chunks computed for a data block  could miss redundancies that span contiguous portions of neighboring chunks or redundancies that only span portions of chunks

Max-Match For each matching fingerprint, the corresponding matching data block is retrieved from the cache and the match region is expanded byte-by-byte in both directions to obtain the maximal region of redundant bytes. Matched regions are then encoded with tuples.

EndRE optimization significantly limit fingerprint store maintenance overhead for all four algorithms optimize the representation of the fingerprint hash table to limit storage needs  fingerprint table is a contiguous set of offsets, indexed by the fingerprint hash value