Chapter 12.6 Consistency and Replication

Chapter 12.6 Consistency and Replication
Berkay Aydin & Zhuoli Lin November 11th, 2015

Consistency and Replication
Web-Proxy Caching Replication for Web Hosting Systems Metric Estimation Adaptation Triggering Adjustment Measures Replication of Web Applications Consistency and Replication Part I Perhaps one of the most important systems-oriented developments in Web- based distributed systems is ensuring that access to Web documents meets stringent performance and availability requirements. These requirements have led to numerous proposals for caching and replicating Web content, of which various ones will be discussed in this section. Where the original schemes (which are still largely deployed) have been targeted toward supporting static content, much effort is also being put into support dynamic content, that is, supporting documents that are generated as the result of a request, as well as those containing scripts and such. An excellent and complete picture of Web caching and replication is provided by Rabinovich and Spatscheck (2002).

Web Proxy Caching Client-side caching
Browsers - simple caching facility (local cache) Store documents in the client’s browser cache Web proxy on client side (shared cache) Client-side proxy can cache the documents Sends the document to the client if another response comes in Hierarchical caching (regions, countries etc.) Reduce network traffic Possible to have higher latency, multiple cache checks

Web Proxy Caching Cooperative (distributed) caching
When a cache-miss occurs, check neighboring proxies If neighbor has it, it sends the document. Else request is forwarded to web server Trade-offs between hierarchical and cooperative caching Cooperative -> lower transmission time, storage requirements are less strict Hierarchical -> expected latency is lower Image taken from [1]

Web Proxy Caching Cache-consistency
Conditional HTTP request (if-modified-since header) If the document is modified since the associated header value, server returns Else web proxy returns Proxy contacts the server each time Squid Web proxy Expiration time (shows how long ago the document was last modified) Until Texpire the document is considered as valid (practically α can be set to 0.2)

Web proxy caching Problems with Squid Alternatives
Less consistency Proxy may return an invalid document There is no way to detect Alternatives Server notifies proxies by sending an invalidation Downside -> scalability It can outperform in terms of bandwidth and perceived latency Web-proxy is for static content Cache replacement strategy -> LRU

Replication for Web Hosting Systems
Content Delivery Network (CDN) Maintaining the content of Web Ensuring that the site is accessible CDNs act as web hosting service replicate the content in different sites self-managing system -> automatic distribution and replication Three aspects of CDNs Metric estimation Adaptation triggering Taking appropriate measures {Replica replacement, consistency enforcement, client-request routing}

Replication - Metric Estimation
Trade-offs (access time vs. cost) Latency metrics Time spent for fetching a document Available bandwidth (bandwidth between two nodes) Important for large document transfers Spatial metrics Distance between nodes (number of network level routing-hops) Network usage metrics Consumed bandwidth, number of bytes to transfer Consistency metrics To what extent a replica is deviating from its master copy Financial metrics Financial performance

Replication - Adaptation Triggering
When and how adaptations are triggered Simple Approach: Periodically estimate metrics and take measures as needed Responding to a flash crowd Flash crowd predicting Use a window Linear regression Warn when # of requests passes a pre-determined period Hard to get threshold, and windows size

Replication - Adjustment Measures
Deciding how and when to redirect client requests Embedded documents Get base document DNS lookup from regular DNS system DNS lookup from CDN DNS system Get embedded documents from CDN server If cached - use cache Else - use origin server Perceived performance

Replication of Web Applications
Edge-server stores replicated data Replication can be Partial Full Full replication - works well when low update ratio, frequent joins Partial replication which data to be stored? content-aware caches well with repeated queries consistency problem content-blind caching caching the query results

Introduction Related Work Techniques to Scale Web Application Consistency and Replication Part II

Introduction Developers often use relocation and caching mechanisms to enhance Web application performance. This paper present a qualitative and quantitative analysis of state–of–the-art replication and caching techniques used to host Web application.

Related Work Web sites can be slow for many reasons, but the most prevalent one is the dynamic generation of Web documents. Dynamic generation of a Web page typically requires issuing one or more queries to a database, so access time to the database can easily get out of hand when the request load is high. There are several techniques to overcome this problem. The most straightforward one is Web page caching. This technique works well if the same cached HTML page can answer many requests to a particular Web site. With the growing drive toward personalized Web content, generated pages tend to be unique for each user, thereby reducing the benefits of page – caching techniques.

Techniques to Scale Web Application
Instead of caching the dynamic pages generated by a central web server, various techniques aim to replicate the means of generating pages over multiple edge servers. They typically provide “read–your–writes” consistency, which guarantees an application at an edge server performs an update, any subsequent reads from the same edge server will return that update’s effect.

Techniques to Scale Web Application Edge Computing
The simplest way to generate user–specific pages is to replicate the application code at multiple edge servers and keep the data centralized Drawbacks If the edge servers are located worldwide, each data access incurs wide – area network latency. The central database quickly becomes a performance bottleneck because it needs to serve the entire system’s database requests.

Techniques to Scale Web Application Data Replication
To solve the database bottleneck problem, data replication places the data at each edge server so that generating a page requires only local computation and data access. This technique helps maintain identical copies of the database at multiple locations. Drawbacks If a Web application generates many database updates, each update must be propagated to all the other replicas to maintain the consistency. Potentially introduce a huge network traffic and performance overhead.

Techniques to Scale Web Application Content-Aware Data Caching
Instead of maintaining full copies of the database at each edge server, content – aware caching systems cache database query results as the application code issues them. Each edge server maintains a partial copy of the database, and each time the application running at the edge issues a query, the edge – server database checks of it contains enough data locally to answer the query correctly. Drawbacks This method can reduce the cache hit rate. The update queries always execute at the origin server.

Techniques to Scale Web Application Blind Data-Caching
Edge servers do not need to run a database at all. Servers store the results of remote database queries independently, so the cache replacement is simple. Can apply many popular replacement algorithms.

Comparison RUBBos benchmark TPC – W brosing TPC – W ordering

Future Work Consistency and Replication Part III

Future Work Plan to build and evaluate a prototype system that enables dynamic provisioning and reconfiguration of multitier Web applications. A combination of end–to–end analytical model and virtual caches will determine the optimal resource configuration for a given application.

References [1] Tanenbaum, A., & Steen, M. (2007). Distributed systems: Principles and paradigms (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall. [2] Sivasubramanian, S., Pierre, G., van Steen, M., & Alonso, G. (2007). Analysis of caching and replication strategies for web applications. Internet Computing, IEEE, 11(1), [3] Sivasubramanian, S., Pierre, G., Van Steen, M., & Alonso, G. (2006).GlobeCBC: Content-blind result caching for dynamic web applications. Technical Report IR-CS-022, Vrije Universiteit, Amsterdam, The Netherlands.

Chapter 12.6 Consistency and Replication

Similar presentations

Presentation on theme: "Chapter 12.6 Consistency and Replication"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 12.6 Consistency and Replication

Similar presentations

Presentation on theme: "Chapter 12.6 Consistency and Replication"— Presentation transcript:

Similar presentations

About project

Feedback