Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code.

Similar presentations


Presentation on theme: "Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code."— Presentation transcript:

1 Caching Willem Visser RW334

2 Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

3 AppEngine Python Datastore Datastore – db Old and will be going away at some point – ndb (https://developers.google.com/appengine/docs/ python/ndb/) New and supports some cool features from google.appengine.ext import ndb class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)

4 NDB Python class defines the model Each entity has a key, which in turn has a parent, up to the root that has no parent – Entities in this chain is in the same group – Entities in the same group has consistency guarantees stuff_title = self.request.get(’stuff_name') stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"), content = self.request.get('content')) stuff.put()

5 NDB (2) Queries and Indexes There are very many ways to query Complex queries might need complex indexes – NDB creates simple indexes automatically – Complex ones can be defined in index.yaml GQL is similar to SQL Only gets executed when accessed stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff)

6 No Caching Every db_read hits the database Database reads tend not to be the fastest thing This can be very inefficient therefore

7 Example No Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

8 Naïve Caching This will do wonders for performance If the cache is too large it might start to slow down a bit Above the db_read is avoided but rendering HTML could also be cached if that takes a lot of time If not cache[request]: cache[request] = db_read(); return cache[request]

9 Example No Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

10 Example CACHE = {} def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

11 New data? Will the previous solution work? What happens if you add new data – Added to the DB and then redirect to / – Render_front calls top_stuff – However cache is hit and we get the old data Cache must be invalidated when new data comes

12 Clear Cache CACHE = {} def top_stuff(): … class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() CACHE.clear() self.redirect("/")

13 Cache Stampede If one user writes new data – Cache gets cleared Now lots of users all access the site at the same time – All of them doing db_reads since the cache is empty This hammers the DB and slows everybody down – Depending on settings the DB might also block or even crash Without any caching this could also happen

14 Cache Refresh def top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")

15 Cache Update Most aggressive solution – No DB reads! On new data, store in the DB and also directly into the cache, without reading from the DB The DB is just a backup storage now for in case something goes wrong, such as a server going down

16 Cache Comparisons Cache Approach DB_Read on page view DB_Read on submit Wrong results NoneAlwaysNone NaïveCache missNoneYes RefreshSeldomOnce UpdateNone

17 Sharing a Cache If we have more than one server Do we have a cache for each server, or, share a cache amongst servers? Cache for each server can have suboptimal behavior if they are not synchronized – Data might be in the cache on server 1 and not server 2, for example Good solution is to use a very fast shared cache

18 Memcached See http://memcached.org/http://memcached.org/ Very fast, in-memory, key-value store Caching technology behind very many websites Support for it within AppEngine from google.appengine.api import memcache … def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff

19 NDB and Caching Two Caches controlled by policies – In context (microseconds) Only current http request Writes to datastore and cache, reads first checks cache – Memcache (milliseconds) All nontransactional context caches here All contexts share same memcache Within a transaction memcache is not used Can be configured by policies – Some standard ones available

20 More Caching Some caches also live outside the developers immediate control Browser Cache – Single user Proxy Cache – Multiple users Gateway Cache – Distributed by Content Delivery Networks HTTP 1.1 supports “Cache-Control” header – Allows developers to control how things are cached


Download ppt "Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code."

Similar presentations


Ads by Google