Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Data Srinivas Narayanan 11/13/09.

Similar presentations

Presentation on theme: "Scalable Data Srinivas Narayanan 11/13/09."— Presentation transcript:

1 Scalable Data Srinivas Narayanan 11/13/09

2 Scale

3 #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries Over 300 million active users More than 2 32 photos … 100 million search queries per day > 3.9 trillion feed actions processed per day 2 billion pieces of content per week 6 billion minutes per day

4 Growth Rate M Active Users

5 Social Networks

6 The social graph links everything

7 Scaling Social Networks Much harder than typical websites where... Typically 1-2% online: easy to cache the data Partitioning & scaling relatively easy What do you do when everything is interconnected?

8 name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo

9 System Architecture

10 Architecture Database (slow, persistent) Load Balancer (assigns a web server) Web Server (PHP assembles data) Memcache (fast, simple)

11 Simple in-memory hash table Supports get/set,delete,multiget, multiset Not a write-through cache Pros and Cons The Database Shield! Low latency, very high request rates Can be easy to corrupt, inefficient for very small items Memcache

12 Multithreading and efficient protocol code - 50k req/s Polling network drivers - 150k req/s Breaking up stats lock - 200k req/s Batching packet handling - 250k req/s Breaking up cache lock - future Memcache Optimization

13 Network Incast Many Small Get Requests Memcache Switch PHP Client

14 Memcache Switch PHP Client Many big data packets Network Incast

15 Memcache Switch PHP Client Network Incast

16 Memcache Switch PHP Client Network Incast

17 Memcache Clustering Many small objects per server Many servers per large object

18 Memcache Clustering Memcache 10 Objects PHP Client

19 Memcache 5 Objects PHP Client 2 round trips total1 round trip per server 5 Objects Memcache Memcache Clustering

20 Memcache 3 Objects PHP Client 3 round trips total1 round trip per server 4 Objects Memcache 3 Objects Memcache Clustering

21 Memcache Pool Optimization Currently a manual process Replication for obvious hot data sets Interesting problem: Optimize the allocation based on access patterns

22 General pool with wide fanout Shard 1Shard 2 Specialized Replica 2 Shard 1 Shard 2 Shard 1 Shard 2Shard 3Shard n Specialized Replica 1... Vertical Partitioning of Object Types

23 ScribeScribeScribe ScribeScribeScribe ScribeScribeScribe Thousands of MySQL servers in two datacenters MySQL has played a role from the beginning

24 MySQL Usage Pretty solid transactional persistent store Logical migration of data is difficult Logical-Physical db mapping Rarely use advanced query features Performance Database resources are precious Web tier CPU is relatively cheap Distributed data - no joins! Sound administrative model

25 MySQL is better because it is Open Source We can enhance or extend the we see fit...when we see fit Facebook extended MySQL to support distributed cache invalidation for memcache INSERT table_foo (a,b,c) VALUES (1,2,3) MEMCACHE_DIRTY key1,key2,...

26 Scaling across datacenters West Coast MySql replication SF Web SF Memcache SC Memcache SC Web SC MySQL East Coast VA MySQL VA Web VA Memcache Memcache Proxy

27 Other Interesting Issues Application level batching and parallelization Super hot data items Cachekey versioning with continuous availability

28 Photos

29 Photos + Social Graph = Awesome!

30 Photos: Scale 20 billion photos x4 = 80 billion Would wrap around the world more than 10 times! Over 40M new photos per day 600K photos / second

31 Photos Scaling - The easy wins Upload tier - handles uploads, scales images, stores on NFS Serving tier: Images served from NFS via HTTP However... File systems are not good at supporting large number of files Metadata too large to fit in memory causing too many IOs for each file read Limited by I/O not storage density Easy wins CDN Cachr (http server + caching) NFS file handle cache

32 Photos: Haystack Overlay file system Index in memory One IO per read

33 Data Warehousing

34 Data: How much? 200GB per day in March TB(compressed) raw data per day in April TB(compressed) raw data per day today

35 The Data Age Free or low cost of user services Consumer behavior hard to predict Data and analysis are critical More data beats better algorithms

36 Deficiencies of existing technologies Analysis/storage on proprietary systems too expensive Closed systems are hard to extend

37 Hadoop & Hive

38 Hadoop Superior availability/scalability/manageability despite lower single node performance Open system Scalable costs Cons: Programmability and Metadata Map-reduce hard to program (users know sql/bash/python/perl) Need to publish data in well known schemas

39 Hive A system for managing and querying structured data built on top of Hadoop Components Map-Reduce for execution HDFS for storage Metadata in an RDBMS

40 Hive: New Technology, Familiar Interface hive> select key, count(1) from kv1 where key > 100 group by key; vs. $ cat > /tmp/ uniq -c | awk '{print $2"\t"$1} $ cat > /tmp/ awk -F '\001' '{if($1 > 100) print $1} $ bin/hadoop jar contrib/hadoop dev-streaming.jar - input /user/hive/warehouse/kv1 -mapper -file /tmp/ -file /tmp/ -reducer - output /tmp/largekey -numReduceTasks 1 $ bin/hadoop dfs –cat /tmp/largekey/part*

41 Hive: Sample Applications Reporting E.g.,: Daily/Weekly aggregations of impression/click counts Measures of user engagement Ad hoc Analysis E.g.,: how many group admins broken down by state/country Machine Learning (Assembling training data) Ad Optimization E.g.,: User Engagement as a function of user attributes Lots More

42 Hive: Server Infrastructure 4800 cores, Storage capacity of 5.5 PetaBytes, 12 TB per node Two level network topology 1 Gbit/sec from node to rack switch 4 Gbit/sec to top level rack switch

43 Hive & Hadoop: Usage Stats 4 TB of compressed new data added per day 135TB of compressed data scanned per day Hive jobs on per day 80K compute hours per day 200 people run jobs on Hadoop/Hive Analysts (non-engineers) use Hadoop through Hive 95% of jobs are Hive Jobs

44 Hive: Technical Overview

45 Hive: Open and Extensible Query your own formats and types with your own Serializer/Deserializers Extend the SQL functionality through User Defined Functions Do any non-SQL transformations through TRANSFORM operator that sends data from Hive to any user program/script

46 Hive: Smarter Execution Plans Map-side Joins Predicate Pushdown Partition Pruning Hash based Aggregations Parallel execution of operator trees Intelligent Scheduling

47 Hive: Possible Future Optimizations Pipelining? Finer operator control (controlling sorts) Cost based optimizations? HBase

48 Spikes: The Username Launch

49 System Design Database tier cannot handle the load Dedicated memcache tier for assigned usernames Miss => Available Avoid database hits altogether Blacklists: bucketize, local tier cache timeout

50 Username Memcache Tier Parallel pool in each data center Writes replicated to all nodes 8 nodes per pool Reads can go to any node (hashed by uid)... UN0UN1UN7 PHP Client Username Memcache

51 Write Optimization Hashout store Distributed key-value store (MySQL backed) Lockless (optimistic) concurrency control

52 Fault Tolerance Memcache nodes can go down Always check another node on miss Replay from a log file (scribe) Memcache sets are not guaranteed to succeed Self-correcting code: write again to mc if we detect it during db writes

53 Nuclear Options Newsfeed Reduce number of stories Turn off scrolling, highlights Profile Reduce number of stories Make info tab the default Chat Reduce buddy list refresh rate Turn if off!

54 How much load? 200k in 3 min 1M in 1 hour 50M in first month Prepared for over 10x!

55 Some interesting problems

56 Graph models and languages Low latency fast access Slightly more expressive queries Consistency, Staleness can be a bit loose Analysis over large data sets Privacy as part of the model Fat data pipes Push enormous volumes of data to several third party applications (E.g., entire newsfeed to search partners). Controllable QoS

57 Some interesting problems (contd.) Search relevance Storage systems Middle tier (cache) optimization Application data access language

58 Questions?

Download ppt "Scalable Data Srinivas Narayanan 11/13/09."

Similar presentations

Ads by Google