CSE-291 Cloud Computing, Fall 2016 Kesden

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
ITIS 3110 Jason Watson. Replication methods o Primary/Backup o Master/Slave o Multi-master Load-balancing methods o DNS Round-Robin o Reverse Proxy.
Cache Craftiness for Fast Multicore Key-Value Storage
K ALI ROHAN KOKA AZEEM HIRANI. A Simple database Implementing a Dictionary where keys are associated with values. For e.g.: It can set the key “surname_1987”
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 31/10/2007.
1 - Oracle Server Architecture Overview
Overview Distributed vs. decentralized Why distributed databases
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
A Dynamic Caching Mechanism for Hadoop using Memcached Gurmeet Singh Puneet Chandra Rashid Tahir University of Illinois at Urbana Champaign Presenter:
31 January 2007Craig E. Ward1 Large-Scale Simulation Experimentation and Analysis Database Programming Using Java.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
PMIT-6102 Advanced Database Systems
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
RAPID-Cache – A Reliable and Inexpensive Write Cache for Disk I/O Systems Yiming Hu Qing Yang Tycho Nightingale.
Introduction to Hadoop and HDFS
How computer’s are linked together.
IMDGs An essential part of your architecture. About me
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Windows Azure Conference 2014 Caching Data in the Cloud with Windows Azure.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Lecture 20: Big Data, Memristors Today: architectures for big data, memristors.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Cassandra as Memcache Edward Capriolo Media6Degrees.com.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
REPLICATION & LOAD BALANCING
Managing Multi-User Databases
Hadoop.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
The Case for a Session State Storage Layer
Redis:~ Author Anil Sharma Data Structure server.
Integrating HA Legacy Products into OpenSAF based system
Database System Concepts and Architecture
Module Overview Installing and Configuring a Network Policy Server
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Open Source distributed document DB for an enterprise
CSCI 210 Data Structures and Algorithms
Informatica PowerCenter Performance Tuning Tips
CSE-291 (Cloud Computing) Fall 2016
System Architecture & Hardware Configurations
Introduction to NewSQL
ASP.NET Caching.
Key Terms Windows 2008 Network Infrastructure Confiuguration Lesson 6
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
What is the Azure SQL Datawarehouse?
Predictive Performance
Outline Midterm results summary Distributed file systems – continued
ColdFusion Performance Troubleshooting and Tuning
THE GOOGLE FILE SYSTEM.
Database System Architectures
Database administration
Client/Server Computing and Web Technologies
Presentation transcript:

CSE-291 Cloud Computing, Fall 2016 Kesden Memcached and Redis CSE-291 Cloud Computing, Fall 2016 Kesden

Classic Problems Web servers query from disk Main memory is wasted Underlying databases queries can be redundant and slow Distributing load among Web servers partitions main memory Redundant memory cache, not larger memory cache Dedicated caching can often maintain whole, or large portions of, the working set, limiting the need to go to disk

Common Use of Memcached (But, dedicated servers are also common) https://memcached.org/about

Memcached Overview Distributed hashtable (Key-Value Store) Except that “Forgetting is a feature” When full – LRU gets dumped Excellent for high-throughput servers Memory is much lower latency than disk Excellent for high-latency queries Caching results can prevent the need to repeat these big units of work

Memcached Architecture Servers maintain a “key-value” store Clients know about all servers Clients query server by key to get value Two hash functions Key--->Server Key-->Associative Array, within server All clients know everything.

Code, From Wikipedia function get_foo(int userid) { data = db_select("SELECT * FROM users WHERE userid = ?", userid); return data; } function get_foo(int userid) { /* first try the cache */ data = memcached_fetch("userrow:" + userid); if (!data) { /* not found : request database */ data = db_select("SELECT * FROM users WHERE userid = ?", userid); /* then store in cache until next get */ memcached_add("userrow:" + userid, data); } return data;

Memcached: No replication Designed for volatile data Failure: Just go to disk Recovery: Just turn back on and wait Need redundancy? Build above memcached Just build multiple instances

REDIS REmote DIctionary SErver Like Memcached in that it provides a key value store But, much richer Lists of strings Sets of strings (collections of non-repeating unsorted elements) Sorted sets of strings (collections of non-repeating elements ordered by a floating-point number called score) Hash tables where keys and values are strings HyperLogLogs used for approximated set cardinality size estimation. (List from Wikipedia)

REDIS REmote DIctionary SErver Each data type has associated operations, e.g. get the one in particular position in a list, intersect sets, etc. Transaction support Configurable cache policy LRU-ish, Random, keep everything, etc.

REDIS: Persistence Periodic snapshotting to disk Append-only log to disk as data is updated Compressed in background Updates to disk every 2 seconds to balance performance and risk Single process, single thread

REDIS Clustering: Old School Divide up yourself Clients hash Clients divide range Clients Interact with Proxy which does the same Hard to handle queries involving multiple keys

REDIS Clustering: New School Distribute REDIS across nodes Multiple key queries okay, so long as all keys in query in same slot Hash tags used to force keys into the same slot. this{foo}key and that{foo}key in the same slot Only {foo} is hashed