Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Web Architectures Common Patterns & Approaches Cal Henderson.

Similar presentations


Presentation on theme: "Scalable Web Architectures Common Patterns & Approaches Cal Henderson."— Presentation transcript:

1 Scalable Web Architectures Common Patterns & Approaches Cal Henderson

2 SAM-SIG, 23rd August 20062 Hello

3 SAM-SIG, 23rd August 20063 Scalable Web Architectures? What does scalable mean? Whats an architecture?

4 SAM-SIG, 23rd August 20064 Scalability – myths and lies What is scalability?

5 SAM-SIG, 23rd August 20065 Scalability – myths and lies What is scalability not ?

6 SAM-SIG, 23rd August 20066 Scalability – myths and lies What is scalability not ? –Raw Speed / Performance –HA / BCP –Technology X –Protocol Y

7 SAM-SIG, 23rd August 20067 Scalability – myths and lies So what is scalability?

8 SAM-SIG, 23rd August 20068 Scalability – myths and lies So what is scalability? –Traffic growth –Dataset growth –Maintainability

9 SAM-SIG, 23rd August 20069 Scalability Two kinds: –Vertical (get bigger) –Horizontal (get more)

10 SAM-SIG, 23rd August 200610 Big Irons Sunfire E20k $450,000 - $2,500,000 36x 1.8GHz processors PowerEdge SC1425 2.8 GHz processor Under $1,500

11 SAM-SIG, 23rd August 200611 Cost vs Cost

12 SAM-SIG, 23rd August 200612 Cost vs Cost But sometimes vertical scaling is right Buying a bigger box is quick (ish) Redesigning software is not Running out of MySQL performance? –Spend months on data federation –Or, Just buy a ton more RAM

13 SAM-SIG, 23rd August 200613 Cost vs Cost But lets talk horizontal –Else this is going to be boring

14 SAM-SIG, 23rd August 200614 Architectures then? The way the bits fit together What grows where The trade offs between good/fast/cheap

15 SAM-SIG, 23rd August 200615 LAMP Were talking about LAMP –Linux –Apache (or LightHTTPd) –MySQL (or Postgres) –PHP (or Perl, Python, Ruby) All open source All well supported All used in large operations

16 SAM-SIG, 23rd August 200616 Simple web apps A Web Application –Or Web Site in Web 1.0 terminology InterwebnetApp serverDatabase

17 SAM-SIG, 23rd August 200617 App servers App servers scale in two ways:

18 SAM-SIG, 23rd August 200618 App servers App servers scale in two ways: –Really well

19 SAM-SIG, 23rd August 200619 App servers App servers scale in two ways: –Really well –Quite badly

20 SAM-SIG, 23rd August 200620 App servers Sessions! –(State) –Local sessions == bad When they move == quite bad –Central sessions == good –No sessions at all == awesome!

21 SAM-SIG, 23rd August 200621 Local sessions Stored on disk –PHP sessions Stored in memory –Shared memory block Bad! –Cant move users –Cant avoid hotspots

22 SAM-SIG, 23rd August 200622 Mobile local sessions Custom built –Store last session location in cookie –If we hit a different server, pull our session information across If your load balancer has sticky sessions, you can still get hotspots –Depends on volume – fewer heavier users hurt more

23 SAM-SIG, 23rd August 200623 Remote centralized sessions Store in a central database –Or an in-memory cache No porting around of session data No need for sticky sessions No hot spots Need to be able to scale the data store –But weve pushed the issue down the stack

24 SAM-SIG, 23rd August 200624 No sessions Stash it all in a cookie! Sign it for safety –$data = $user_id. -. $user_name; –$time = time(); –$sig = sha1($secret. $time. $data); –$cookie = base64($sig-$time-$data); –Timestamp means its simple to expire it

25 SAM-SIG, 23rd August 200625 Super slim sessions If you need more than the cookie (login status, user id, username), then pull their account row from the DB –Or from the account cache None of the drawbacks of sessions Avoids the overhead of a query per page –Great for high-volume pages which need little personalization –Turns out you can stick quite a lot in a cookie too –Pack with base64 and its easy to delimit fields

26 SAM-SIG, 23rd August 200626 App servers The Rasmus way –App server has shared nothing –Responsibility pushed down the stack Ooh, the stack

27 SAM-SIG, 23rd August 200627 Trifle

28 SAM-SIG, 23rd August 200628 Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation

29 SAM-SIG, 23rd August 200629 Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation

30 SAM-SIG, 23rd August 200630 App servers

31 SAM-SIG, 23rd August 200631 App servers

32 SAM-SIG, 23rd August 200632 App servers

33 SAM-SIG, 23rd August 200633 Well, that was easy Scaling the web app server part is easy The rest is the trickier part –Database –Serving static content –Storing static content

34 SAM-SIG, 23rd August 200634 The others Other services scale similarly to web apps –That is, horizontally The canonical examples: –Image conversion –Audio transcoding –Video transcoding –Web crawling

35 SAM-SIG, 23rd August 200635 Parallelizable == easy! If we can transcode/crawl in parallel, its easy –But think about queuing –And asynchronous systems –The web aint built for slow things –But still, a simple problem

36 SAM-SIG, 23rd August 200636 Asynchronous systems

37 SAM-SIG, 23rd August 200637 Asynchronous systems

38 SAM-SIG, 23rd August 200638 Helps with peak periods

39 SAM-SIG, 23rd August 200639 Asynchronous systems

40 SAM-SIG, 23rd August 200640 Asynchronous systems

41 SAM-SIG, 23rd August 200641 Asynchronous systems

42 SAM-SIG, 23rd August 200642 The big three Lets talk about the big three then… –Databases –Serving lots of static content –Storing lots of static content

43 SAM-SIG, 23rd August 200643 Databases Unless were doing a lot of file serving, the database is the toughest part to scale If we can, best to avoid the issue altogether and just buy bigger hardware Dual Opteron/Intel64 systems with 16GB of RAM can get you a long way

44 SAM-SIG, 23rd August 200644 More read power Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10 If we can scale read capacity, we can solve a lot of situations MySQL replication!

45 SAM-SIG, 23rd August 200645 Master-Slave Replication

46 SAM-SIG, 23rd August 200646 Master-Slave Replication Reads and Writes Reads

47 SAM-SIG, 23rd August 200647 Master-Slave Replication

48 SAM-SIG, 23rd August 200648 Master-Slave Replication

49 SAM-SIG, 23rd August 200649 Master-Slave Replication

50 SAM-SIG, 23rd August 200650 Master-Slave Replication

51 SAM-SIG, 23rd August 200651 Master-Slave Replication

52 SAM-SIG, 23rd August 200652 Master-Slave Replication

53 SAM-SIG, 23rd August 200653 Master-Slave Replication

54 SAM-SIG, 23rd August 200654 Master-Slave Replication

55 SAM-SIG, 23rd August 200655 Caching Caching avoids needing to scale! –Or makes it cheaper Simple stuff –mod_perl / shared memory – dumb –MySQL query cache - dumbish

56 SAM-SIG, 23rd August 200656 Caching Getting more complicated… –Write-through cache –Write-back cache –Sideline cache

57 SAM-SIG, 23rd August 200657 Write-through cache

58 SAM-SIG, 23rd August 200658 Write-back cache

59 SAM-SIG, 23rd August 200659 Sideline cache

60 SAM-SIG, 23rd August 200660 Sideline cache Easy to implement –Just add app logic Need to manually invalidate cache –Well designed code makes it easy Memcached –From Danga (LiveJournal) –http://www.danga.com/memcached/

61 SAM-SIG, 23rd August 200661 But what about HA?

62 SAM-SIG, 23rd August 200662 But what about HA?

63 SAM-SIG, 23rd August 200663 SPOF! The key to HA is avoiding SPOFs –Identify –Eliminate Some stuff is hard to solve –Fix it further up the tree Dual DCs solves Router/Switch SPOF

64 SAM-SIG, 23rd August 200664 Master-Master

65 SAM-SIG, 23rd August 200665 Master-Master Either hot/warm or hot/hot Writes can go to either –But avoid collisions –No auto-inc columns for hot/hot Bad for hot/warm too –Design schema/access to avoid collisions Hashing users to servers

66 SAM-SIG, 23rd August 200666 Rings Master-master is just a small ring –With 2 members Bigger rings are possible –But not a mesh! –Each slave may only have a single master –Unless you build some kind of manual replication

67 SAM-SIG, 23rd August 200667 Rings

68 SAM-SIG, 23rd August 200668 Rings

69 SAM-SIG, 23rd August 200669 Dual trees Master-master is good for HA –But we cant scale out the reads We often need to combine the read scaling with HA We can combine the two

70 SAM-SIG, 23rd August 200670 Dual trees

71 SAM-SIG, 23rd August 200671 Data federation At some point, you need more writes –This is tough –Each cluster of servers has limited write capacity Just add more clusters!

72 SAM-SIG, 23rd August 200672 Data federation Split up large tables, organized by some primary object –Usually users Put all of a users data on one cluster –Or shard, or cell Have one central cluster for lookups

73 SAM-SIG, 23rd August 200673 Data federation

74 SAM-SIG, 23rd August 200674 Data federation Need more capacity? –Just add shards! –Dont assign to shards based on user_id! For resource leveling as time goes on, we want to be able to move objects between shards –Lockable objects

75 SAM-SIG, 23rd August 200675 Data federation Heterogeneous hardware is fine –Just give a larger/smaller proportion of objects depending on hardware Bigger/faster hardware for paying users –A common approach

76 SAM-SIG, 23rd August 200676 Downsides Need to keep stuff in the right place App logic gets more complicated More clusters to manage –Backups, etc More database connections needed per page The dual table issue –Avoid walking the shards!

77 SAM-SIG, 23rd August 200677 Bottom line Data federation is how large applications are scaled

78 SAM-SIG, 23rd August 200678 Bottom line Its hard, but not impossible Good software design makes it easier –Abstraction! Master-master pairs for shards give us HA Master-master trees work for central cluster (many reads, few writes)

79 SAM-SIG, 23rd August 200679 Multiple Datacenters Having multiple datacenters is hard –Not just with MySQL Hot/warm with MySQL slaved setup –But manual Hot/hot with master-master –But dangerous Hot/hot with sync/async manual replication –But tough

80 SAM-SIG, 23rd August 200680 Multiple Datacenters

81 SAM-SIG, 23rd August 200681 Serving lots of files Serving lots of files is not too tough –Just buy lots of machines and load balance! Were IO bound – need more spindles! –But keeping many copies of data in sync is hard –And sometimes we have other per-request overhead (like auth)

82 SAM-SIG, 23rd August 200682 Reverse proxy

83 SAM-SIG, 23rd August 200683 Reverse proxy Serving out of memory is fast! –And our caching proxies can have disks too –Fast or otherwise More spindles is better We stay in sync automatically We can parallelize it! –50 cache servers gives us 50 times the serving rate of the origin server –Assuming the working set is small enough to fit in memory in the cache cluster

84 SAM-SIG, 23rd August 200684 Invalidation Dealing with invalidation is tricky We can prod the cache servers directly to clear stuff out –Scales badly – need to clear asset from every server – doesnt work well for 100 caches

85 SAM-SIG, 23rd August 200685 Invalidation We can change the URLs of modified resources –And let the old ones drop out cache naturally –Or prod them out, for sensitive data Good approach! –Avoids browser cache staleness –Hello akamai (and other CDNs) –Read more: http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

86 SAM-SIG, 23rd August 200686 Reverse proxy Choices –L7 load balancer & Squid http://www.squid-cache.org/ –mod_proxy & mod_cache http://www.apache.org/ –Perlbal and Memcache? http://www.danga.com/

87 SAM-SIG, 23rd August 200687 High overhead serving What if you need to authenticate your asset serving –Private photos –Private data –Subscriber-only files Two main approaches

88 SAM-SIG, 23rd August 200688 Perlbal backhanding Perlbal can do redirection magic –Backend server sends header to Perbal –Perlbal goes to pick up the file from elsewhere –Transparent to user

89 SAM-SIG, 23rd August 200689 Perlbal backhanding

90 SAM-SIG, 23rd August 200690 Perlbal backhanding Doesnt keep database around while serving Doesnt keep app server around while serving User doesnt find out how to access asset directly

91 SAM-SIG, 23rd August 200691 Permission URLs But why bother!? If we bake the auth into the URL then it saves the auth step We can do the auth on the web app servers when creating HTML Just need some magic to translate to paths We dont want paths to be guessable

92 SAM-SIG, 23rd August 200692 Permission URLs

93 SAM-SIG, 23rd August 200693 Storing lots of files Storing files is easy! –Get a big disk –Get a bigger disk –Uh oh! Horizontal scaling is the key –Again

94 SAM-SIG, 23rd August 200694 Connecting to storage NFS –Stateful == Sucks –Hard mounts vs Soft mounts SMB / CIFS / Samba –Turn off MSRPC & WINS (NetBOIS NS) –Stateful but degrades gracefully HTTP –Stateless == yay! –Just use Apache

95 SAM-SIG, 23rd August 200695 Multiple volumes Volumes are limited in total size –Except under ZFS & others Sometimes we need multiple volumes for performance reasons –When use RAID with single/dual parity At some point, we need multiple volumes

96 SAM-SIG, 23rd August 200696 Multiple volumes

97 SAM-SIG, 23rd August 200697 Multiple hosts Further down the road, a single host will be too small Total throughput of machine becomes an issue Even physical space can start to matter So we need to be able to use multiple hosts

98 SAM-SIG, 23rd August 200698 Multiple hosts

99 SAM-SIG, 23rd August 200699 HA Storage HA is important for assets too –We can back stuff up –But we want it hot redundant RAID is good –RAID5 is cheap, RAID 10 is fast

100 SAM-SIG, 23rd August 2006100 HA Storage But whole machines can fail So we stick assets on multiple machines In this case, we can ignore RAID –In failure case, we serve from alternative source –But need to weigh up the rebuild time and effort against the risk –Store more than 2 copies?

101 SAM-SIG, 23rd August 2006101 HA Storage

102 SAM-SIG, 23rd August 2006102 Self repairing systems When something fails, repairing can be a pain –RAID rebuilds by itself, but machine replication doesnt The big appliances self heal –NetApp, StorEdge, etc So does MogileFS

103 SAM-SIG, 23rd August 2006103 Real world examples Flickr –Because I know it LiveJournal –Because everyone copies it

104 SAM-SIG, 23rd August 2006104 Flickr Architecture

105 SAM-SIG, 23rd August 2006105 LiveJournal Architecture

106 SAM-SIG, 23rd August 2006106 Buy my book!

107 SAM-SIG, 23rd August 2006107 The end!

108 SAM-SIG, 23rd August 2006108 Awesome! These slides are available online: iamcal.com/talks/


Download ppt "Scalable Web Architectures Common Patterns & Approaches Cal Henderson."

Similar presentations


Ads by Google