Web-Conscious Storage Management for Web Proxies Evangelos P. Markatos, Dionisios N. Pnevmatikatos, Member, IEEE, Michail D. Flouris, and Manolis G. H.

Slides:



Advertisements
Similar presentations
Conserving Disk Energy in Network Servers ACM 17th annual international conference on Supercomputing Presented by Hsu Hao Chen.
Advertisements

More on File Management
File Systems.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Chapter 11: File System Implementation
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
File System Implementation
File System Implementation
1 Adaptive Live Broadcasting for Highly-Demanded Videos Hung-Chang Yang, Hsiang-Fu Yu and Li-Ming Tseng IEEE International Conference on Parallel and Distributed.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Recap of Feb 25: Physical Storage Media Issues are speed, cost, reliability Media types: –Primary storage (volatile): Cache, Main Memory –Secondary or.
Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
1 Operating Systems Chapter 7-File-System File Concept Access Methods Directory Structure Protection File-System Structure Allocation Methods Free-Space.
Efficient Support for Interactive Browsing Operations in Clustered CBR Video Servers IEEE Transactions on Multimedia, Vol. 4, No.1, March 2002 Min-You.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Operating Systems CMPSC 473 I/O Management (4) December 09, Lecture 25 Instructor: Bhuvan Urgaonkar.
CH2 System models.
Distributed File Systems
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Free Space Management.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
File Systems cs550 Operating Systems David Monismith.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Lecture 3 Secondary Storage and System Software I
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
File-System Management
Jonathan Walpole Computer Science Portland State University
Virtual Memory CSSE 332 Operating Systems
Module 11: File Structure
File-System Implementation
Chapter 11: File System Implementation
Reddy Mainampati Udit Parikh Alex Kardomateas
FileSystems.
Chapter 11: File System Implementation
O.S Lecture 13 Virtual Memory.
What Happens if There is no Free Frame?
Overview Continuation from Monday (File system implementation)
Outline Allocation Free space management Memory mapped files
Overview: File system implementation (cont)
File Storage and Indexing
File-System Structure
File System Implementation
Presentation transcript:

Web-Conscious Storage Management for Web Proxies Evangelos P. Markatos, Dionisios N. Pnevmatikatos, Member, IEEE, Michail D. Flouris, and Manolis G. H. Katevenis, Member, IEEE IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 10, NO. 6, DECEMBER 2002

Outline  1.Introduction  2.W EB C O SM  3.Simulation-Based Evaluation  4.Implementation (Foxy)  5.Conclusion

Introduction Fig. 1. Typical web proxy action sequence.

Introduction (Cont.)  WWW proxies are being increasingly used to provide Internet access to users behind a firewall and to reduce wide-area network traffic by caching frequently used URLs.  However, many proxy servers often fail to provide the available bandwidth to the proxy process.

Introduction (Cont.)  The authors study the overheads associated with file I/O for web proxies, and propose Web- Conscious Storage Management (WebCoSM), a set of techniques, to overcome file I/O limitations.

Overheads associated with disk I/O  Storing each URL in a separate file. Aggregate several URLs per file.  Disk head movements due to file write requests in widely scattered disk sectors. File space allocation algorithm.  URL read operations. Cluster several read operations together and reorganize the layout of the URLs on the magnetic disk.

WebCoSM  The file system of a web proxy will not be able to keep up with the proxy’s Internet requests due to the mismatch between the storage requirements needed by the web proxy and the storage guarantees provided by the file system.  We address this performance mismatch in two ways: meta-data overhead reduction, and data- locality exploitation.

Meta-data overhead reduction  Most of the meta-data overhead that cripples web proxy performance can be traced to the storage of each URL in a separate file.  To eliminate this performance bottleneck, we propose a novel URL-grouping method (called BUDDY), in which we store all the URLs into a small number of files.

BUDDY (URL-grouping Method)  To simplify space management, we use the URL size as the grouping criterion.  Each file is composed of fixed-size slots, where each slot is large enough to contain a URL.  Each new URL is stored in the first available slot of the appropriate file.

BUDDY (Cont.)  The main advantage of BUDDY is that it practically eliminates the overhead of file creation/deletion operations by storing potentially thousands of URLs per file.  Although BUDDY reduces the file management overhead by avoiding file creations and deletions, it makes no special effort to lay data intelligently on a disk so as to improve write or read performance.

Data-Locality Exploitation  A significant amount of locality exists in the URL reference streams.  Identifying and exploiting this locality can result in large performance improvements.

1.Optimizing Write Throughput  Instead of writing new data in some free space on the disk, we continuously append data to the disk until we reach the end of the disk, in which case we continue from the beginning.  STREAM: file-space management algorithm.

STREAM  URL-write operations continue appending data to the file until the end of the file, in which case, new URL-write operations continue from the beginning of the file writing on free slots.  URL-delete operations mark the space currently occupied the URL as free, so it can later be reused by future URL-write operations.

STREAM (Cont.)  Note that STREAM stores all data in a single file, while BUDDY stores data in more than one files; therefore STREAM and BUDDY are incompatible, but STREAM subsumes the functionality (and the benefits) of BUDDY.

2.Preserving the Locality of the URL Stream  Web objects requested contiguously by a single client, may be serviced and stored in the proxy’s disk subsystem interleaved with web objects requested from totally unrelated clients.  To recover the lost locality, we augmented the STREAM technique with an extra level of buffers called locality buffers, between the proxy server and the file system.

STREAM-PACK  Small-write problem : writing a small amount of data to the file system, usually resulted in both a disk-read and a disk-write operation.  The reason for this peculiar behavior : if a process writes a small amount of data in a file, the OS will read the corresponding page from the disk (if it is not already in the main memory file buffer cache), perform the write in the main memory page, and then, at a later time, write the entire updated page to the disk.

STREAM-PACK (Cont.)  Add in one-page-long packetizer buffer.  Once the packetizer fills up, or if the current request is not contiguous to the previous one, the packetizer is sent to the file system to be written to the disk.

STREAM-PACK-LOC  Grouping requests according to their origin web server before storing them to the disk. Fig.2. Streaming into locality buffers.

Simulation-Based Evaluation Fig.3. Evaluation methodology.

Meta-Data Overhead Reduction Evaluation Fig.4. File Management Overhead for web proxies.

Optimizing Write Throughput  1) Streaming Write Throughput Fig.5. Performance of BUDDY and STREAM.

Optimizing Write Throughput (Cont.)  2) Achieving Maximum Write Bandwidth Fig.6. Performance of STREAM and STREAM-PACK

Optimizing Write Throughput (Cont.)  3) Latency Issues: Fig.7. Cumulative distribution of (a) URL-write and (b) URL-read operation latency for STREAM-PACK and SQUID.

Preserving the Locality of the URL Stream  1) Performance Evaluation of LOCALITY BUFFERS: Fig.8. Average size of contiguous free disk blocks as a function of time.

Preserving the Locality of the URL Stream (cont.) Locality buffers not only cluster the free space more effectively, they also populate the allocated space with clusters of related documents by gathering Fig.9. Cumulative distribution of distances between read requests.

Preserving the Locality of the URL Stream (cont.)  2) Latency Issues: Fig.10. Cumulative distribution of latency of URL-read operations for SQUID, STREAM-PACK, and STREAM-PACK-LOC.

Performance Evaluation Summary  Table 1. Performance of traditional and WEBCOSM techniques

Implementation Result

Conclusion  Methods like WebCoSM that reduce disk head movements and stream data to disk will result in increasingly larger performance improvements.  Furthermore, web-conscious storage management methods will not only result in better performance, but also help to expose areas for further research in discovering and exploiting the locality in the Web.