Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alan Smith Active Handling Big Data in Windows Azure Storage.

Similar presentations


Presentation on theme: "Alan Smith Active Handling Big Data in Windows Azure Storage."— Presentation transcript:

1 Alan Smith Active Solution cloudcasts.net@gmail.com @alansmith www.cloudcasts.net Handling Big Data in Windows Azure Storage

2

3

4

5

6 On-Premise Replication On-Premise

7

8 TimeData 30 Days1.6 TB 10 Days4.8 TB 2 Days24.4 TB MSDN Universal - $150

9

10 Implementation Challenges Number of Articles4,356,508 Number of Indexed Words27,765,188 Total number of Index Entries1,003,489,254 Total Text Content File Size41.4 GB

11 Text Search Implementation Windows Azure Storage Windows Azure Websites Table Storage – Text Index Blob Storage – Pages Azure Wiki Website

12 Text Index Table Design PartitionKeyWord RowKey(10,000 – word count on page)_PageId PageIdNumeric page ID (Integer) PageTitleTitle of Page (String) Query on PartitionKey (word) Ordered by RowKey (word count on page)

13 Text Index Table Example PartitionKeyRowKeyPageIdPageTitle azure999604_3330052733300527Capetian Armorial azure999635_2335268523352685Morphological classification of Czech verbs azure999790_2514819625148196Armorial of the Communes of Seine-Maritime azure999901_00864847864847Azure (color) azure999913_1996141619961416Windows Azure azure999913_3168708831687088Ministry of Defence (Spain) azure999920_1401185414011854Coats of arms of the Holy Roman Empire azure999926_2531218625312186Armorial of the Communes of Eure azure999930_013176791317679Lancia Aurelia azure999935_00717434717434Ordinary (heraldry) azure999935_046443834644383Characters of The Order of the Stick

14 Uploading Page Data Upload Page Content to Blob Storage 27 XML Content Files (41.4 GB - 4,356,508 Pages) Windows Azure Storage Blob Storage (4,356,508 Blobs)

15 Creating Text Index Data Parse Page Text 27 XML Content Files (41.4 GB - 4,356,508 Pages) Page IDs and Titles (124 MB) Index Entries (19,277 Files - 9.83 GB)

16 Index Data Files typical#2356523,1|2356987,1|2357098,1|2357186,1|2357237,1|2357704, 1|2357705,1 history#2375229,1|2375230,1|2375232,1|2375279,1|2375293,3|2375300, 1|2375314,2 renowned#2338682,1|2338841,2|2339194,1|2339509,1|2339791,1|23402 98,1|2340408,1 line#2372733,1|2372749,2|2372774,2|2372784,2|2372790,1|2372796,1|2 372813,1 varies#2316134,1|2317202,1|2318782,1|2319263,1|2319437,1|2319766,1 |2319969,1 moore#2348931,2|2349076,2|2349268,1|2349746,8|2349903,1|2350368, 2|2350437,1 journal#2371460,2|2371490,1|2371518,2|2371524,1|2371565,3|2371591, 6|2371609,2 elderly#2300000,2|2300127,1|2301060,1|2301207,1|2301873,1|2302199, 1|2302733,1 bearing#2331971,1|2332125,1|2332422,1|2332610,1|2333094,1|2333854,1|2334189,1 Contains 1,000 lines Each line contains 100 entries for a word (1 transaction)

17 Insert Index Entries Windows Azure Storage Blob StorageQueue Table Storage Windows Azure Services Worker Roles

18 Insert Index Entries

19 Windows Azure On-Premise Windows Azure Storage TablesBlobsQueues http://azurespeedtest.azurewebsites.net/

20 Windows Azure On-Premise Windows Azure Storage TablesBlobsQueues Windows Azure Virtual Machines VM http://azurespeedtest.azurewebsites.net/

21

22

23 ServicePointManager.DefaultConnectionLimit = 100; ServicePointManager.UseNagleAlgorithm = false; ServicePointManager.Expect100Continue = false;

24 Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks Blob Upload Block Upload Block Commit

25 Tuning Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks SingleBlobUploadThresholdInBytes ParallelOperationThreadCount StreamWriteSizeInBytes Blob Upload Block Upload Block Commit

26 Tuning Blob Operations PropertyDefaultRangeDescription SingleBlobUploadThresholdInBytes32 MB1-64 MBMaximum size of a blob in bytes that may be uploaded as a single blob. ParallelOperationThreadCount11-64Number of blocks that may be simultaneously uploaded PropertyDefaultRangeDescription StreamWriteSizeInBytes (Block)4 MB1-4 MBBlock size for writing to a block blob. StreamWriteSizeInBytes (Page)512 bytes – 4 MBNumber of bytes to buffer when writing to a page blob stream. StreamMinimumReadSizeInBytes1-4 MBMinimum number of bytes to buffer when reading from a blob stream. CloudBlobClient CloudBlockBlob

27 Parallel and Asynchronous Uploads Parallel Blobs Blob Container Files Blob Parallel Blocks Blob Container Files Blob Parallel Blobs & Blocks Blob Container Files Blob

28

29

30 Storage Monitoring Tables $MetricsCapacityBlob $MetricsTransactionsBlob $MetricsTransactionsTable $MetricsTransactionsQueue

31 Handling Outages 29 th February 2012 – Major due to certificate error – MVP Summit 2012 - February 28 th – March 2 nd 22 nd February 2013 – Storage outage due to certificate error – MVP Summit 2013 – February 18 th – 22 nd MVP Summit November 2013 – November 18 th – 21 st – Correlation does not mean causation!

32 Consider processing “In the Cloud” Modify ServicePointManager Settings Use Parallel and Asynchronous Actions Tune CloudBlobClient and CloudBlockBlob properties Fiddler is Your Friend (Especially the Timeline) Use the Source (Windows Azure SDK on GitHub) Understand Storage Emulator Limitations Understand transient faults Understand Pricing Implications Leverage Storage Analytics

33 Thanks! http://wikisearch.azurewebsites.net/


Download ppt "Alan Smith Active Handling Big Data in Windows Azure Storage."

Similar presentations


Ads by Google