Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)

Similar presentations


Presentation on theme: "Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)"— Presentation transcript:

1 Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)

2 About Me Software architect, consultant and instructor Software Engineering Lecturer @ Ruppin Academic Center Technology addict 10 years of experience.NET and Native Windows Programming @tamir_dresher tamirdr@codevalue.net http://www.TamirDresher.comhttp://www.TamirDresher.com.

3 Agenda Storage Blob Relational DB NoSql DB MapReduce 3

4 Storage 4 Where is my dataStorage

5 Numbers – 1 Second is 1,132 Instagram photos uploaded 5 Where is my dataStorage

6 Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 6 Where is my dataStorage

7 Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 7 Where is my dataStorage

8 Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 8 Where is my dataStorage

9 Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 84,921 YouTube videos viewed http://www.internetlivestats.com/one-second/ http://onesecond.designly.com/ 9 Where is my dataStorage

10 Storage Prices 10

11 Types of information Product catalogs Employee data User profiles Images Session state Shopping cart Game scores and state 11 Social feeds Query output results Airline seating charts Inventory management system Game leaderboards Performance counters Weather Stock quotes Where is my dataStorage

12 Gartner Magic Quadrant 12 IaaS PaaS

13 North America Europe Asia Pacific Data centers Windows Azure Growing Global Presence Storage SLA – 99.99% 52.56 minutes per year http://azure.microsoft.com/en-us/support/legal/sla

14 AZURE BLOBS 14

15 What is a BLOB BLOB – Binary Large OBject Storage for any type of entity such as binary files and text documents Distributed File Service (DFS) – Scalability and High availability BLOB file is distributed between multiple server and replicated at least 3 times 15 Where is my dataBLOB

16 Azure Blob Storage Concepts BlobContainerAccount http://.blob.core.windows.net/ / Pages/ Blocks contoso PIC01.JPG Block/Page PIC02.JPG images VID1.AVIvideos 16 Where is my dataBLOB

17 Amazon Simple Storage Service(S3) Concepts ObjectBucketAccount http://. s3.amazonaws.com/ contoso PIC01.JPG PIC02.JPG images VID1.AVIvideos 17 Where is my dataBLOB

18 Blob Operations 18 REST Where is my dataBLOB

19 DEMO Creating a Blob 19

20 BLOBS - Azure Block blob - up to 200 GB in size Page blobs – up to 1 TB in size Total Account Capacity - 500 TB 20 Where is my dataBLOB

21 BLOBS - AWS Object size – up to 5 TB AWS account can own up to 100 buckets at a time, unlimited objects 99.999999999% durability, 99.99% availability Reduced Redundancy Storage (RRS) - 99.99% durability and 99.99% Amazon Glaciar - low-cost storage service as a storage option for data archival. 21 Where is my dataBLOB

22 Pricing - AWS pay for what you use Components: – Storage capacity used (per GB per month) – Data transfer out (per GB per month) – Requests (per n thousand requests per month) http://aws.amazon.com/s3/pricing/ 22 Where is my dataBLOBPricing

23 Pricing - Azure pay for what you use or 6,12 months plan Components – Storage capacity used (per GB per month) – Replication option (LRS, GRS, RA-GRS) – Number of requests (per n thousand requests per month) – Data egress (per GB per month) http://azure.microsoft.com/en-us/pricing/details/storage/ 23 Where is my dataBLOBPricing

24 RELATIONAL DB 24

25 Relational Database Service (RDS) MySQL, Oracle, or Microsoft SQL Server in the cloud No administrative overheads Dedicated Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations - http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html 25 Where is my dataRelational DB

26 SQL Azure SQL Server in the cloud No administrative overheads Shared or Reserved (Dedicated) Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspxhttp://msdn.microsoft.com/en-us/library/ff394115.aspx 26 Where is my dataRelational DB

27 DEMO Creating and Using SQL Azure 27

28 PricingSQL - Azure 28 Where is my dataRelational DB

29 Pricing - RDS 29 Where is my dataRelational DB pay for what you use Components: – Storage capacity used (per GB-month and per million I/O requests) – Deployment type - Single-AZ/Multi-AZ (AZ-Availabiity Zone) – DB instance hours (per hour) – Additional backup storage (per GB-month( – Data transfer in / out (per GB per month) http://aws.amazon.com/rds/pricing/

30 Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ 30 Where is my dataSQL Azure

31 Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ http://www.troyhunt.com/2013/12/working-with-154-million- records-on.html http://www.troyhunt.com/2013/12/working-with-154-million- records-on.html How do I make querying 154 million email addresses as fast as possible? if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$) 31 Where is my dataSQL Azure

32 NoSql - Azure Tables, DynamoDB 32

33 NoSql Relational technology has long been the dominant approach for data. Large amount of data – Scaling across many servers is challenging. Different kind of data on Relational DB – JSON documents – Graphs ACID – Atomicity, Consistency, Isolation, Durability. CAP - Consistency, Availability, Partition tolerance. BASE - Basic Availability, Soft-state, Eventual consistency. 33 Where is my dataNoSql

34 34 Where is my dataNoSql

35 Table Storage Concepts EntityTableAccount contoso Name =… Email = … Name =… EMailAdd= customers Photo ID =… Date =… photos Photo ID =… Date =… 35 Where is my dataNoSqlAzure Tables

36 Table Storage Not RDBMS – No relationships between entities – NoSql Entity can have up to 255 properties - Up to 1MB per entity Mandatory Properties for every entity – PartitionKey & RowKey (only indexed properties) Uniquely identifies an entity Same RowKey can be used in different PartitionKey Defines the sort order – Timestamp - Optimistic Concurrency Strongly consistent 36 Where is my dataNoSqlAzure Tables

37 No Fixed Schema FIRSTLASTBIRTHDATE WadeWegner2/2/1981 NathanTotten3/15/1965 NickHarrisMay 1, 1976 FAV SPORT Canoeing 37 Where is my dataNoSqlAzure Tables

38 Table Object Model ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties – Implemented by TableEntity and DynamicTableEntity 38 // This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity { public int SampleProperty { get; set; } } Where is my dataNoSqlAzure Tables

39 Sample – Inserting an Entity into a Table 39 // You will need the following using statements using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Table; // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); peopleTable.CreateIfNotExists(); // Create a new customer entity. CustomerEntity customer1 = new CustomerEntity("Harp", "Walter"); customer1.Email = "Walter@contoso.com"; customer1.PhoneNumber = "425-555-0101"; // Create an operation to add the new customer to the people table. TableOperation insertCustomer1 = TableOperation.Insert(customer1); // Submit the operation to the table service. peopleTable.Execute(insertCustomer1); Where is my dataNoSqlAzure Tables

40 Retrieve 40 // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); // Retrieve the entity with partition key of "Smith" and row key of "Jeff" TableOperation retrieveJeffSmith = TableOperation.Retrieve ("Smith", "Jeff"); // Retrieve entity CustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result; Where is my dataNoSqlAzure Tables

41 Table Storage – Important Points Azure Tables can store TBs of data Tables Operations are fast Tables are distributed –PartitionKey defines the partition – A table might be stored in different partitions on different storage devices. 41 Where is my dataNoSqlAzure Tables

42 Pricing 42 Where is my dataNoSqlAzure Tables

43 Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ 43 Where is my dataNoSqlAzure Tables

44 Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ How do I make querying 154 million email addresses as fast as possible? foo@bar.com – the domain is the partition key and the alias is the row key if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month SQL Server will cost $176 a month - 22 times more expensive 44 Where is my dataNoSqlAzure Tables

45 DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 45 Where is my dataNoSqlAWS DynamoDB

46 DynamoDB Eventually-consistent reads (by default), and strongly-consistent reads (optional) Provisioned Throughput - the request throughput you want your table to be able to achieve – 10 units of Write Capacity (enough capacity to do up to 36,000 writes per hour)* – 50 units of Read Capacity (enough capacity to do up to 180,000 strongly consistent reads, or 360,000 eventually consistent reads, per hour) 46 Where is my dataNoSqlAWS DynamoDB

47 Pricing Pay for what you use Components: – Provisioned throughput capacity (per hour) – Indexed data storage (per GB per month) – Data transfer out (per GB per month) http://aws.amazon.com/dynamodb/pricing/ 47 Where is my dataNoSqlAWS DynamoDB

48 DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 48 Where is my dataNoSqlAWS DynamoDB

49 MapReduce on the Cloud 49

50 Hadoop in the cloud Hadoop on Azure Cloud Some Facts: – 2013 Global mobile data traffic reached 1.5 exabytes per month – Cisco predicts 1.1 zettabytes (1000 exabyte) of internet traffic in 2016 Cisco 50 Where is my dataMapReduce

51 MapReduce – The BigData Power Map – takes input and output key;value pairs 51 (Key1,Value1) (Key2,Value2) : (Key n,Value n ) Where is my dataMapReduce

52 MapReduce – The BigData Power Reduce – take group of values per key and produce new group of values 52 Key1: [value1-1,Value1-2…] Key2: [value2-1,Value2-2…] Key n : [valueN-1,ValueN-2…] [new_value1-1,new_value1-2…] [new_value2-1,new_value2-2…] [new_valueN-1,new_valueN-2…] :: Where is my dataMapReduce

53 Server MapReduce - How Does It Work? Files Server Where is my dataMapReduce

54 So How Does It Work? Server RUNTIME Code Where is my dataMapReduce

55 Elastic Map Reduce (EMR) 55 Where is my dataMapReduceEMR Amazon Hadoop on the Cloud Hortonworks and Microsoft Hadoop to Windows Cluster of EC2 Pricing: – hourly rate for every instance hour (by instance type) – Additional EMR price per EC2 instance – http://aws.amazon.com/elasticmapreduce/pricing/ http://aws.amazon.com/elasticmapreduce/pricing/

56 HDInsight 56 Where is my dataMapReduceHDInsight MS Hadoop on (not only) Azure Cloud Hortonworks and Microsoft Hadoop to Windows Native integration with.NET

57 Finding common friends Facebook shows you how many common friends you have with someone There were 1,310,000,000 active users in facebook with130 friends on average (01.01.2014) Calculating the mutual friends 57 Where is my dataHDInsight

58 Finding common friends We can represent Friend Relationship as: Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A 58 Where is my dataHDInsight Someone  [List of his\her friends] Common Friends

59 Example of Friends file U1 -> U2 U3 U4 U2 -> U1 U3 U4 U5 U3 -> U1 U2 U4 U5 U4 -> U1 U2 U3 U5 U5 -> U2 U3 U4 59 Where is my dataHDInsight Common Friends

60 Designing our MapReduce job Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends 60 Where is my dataHDInsight Common Friends

61 Designing our MapReduce job - Mapper Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends Having the key sorted will help us with the reducer, same pairs will be provided together 61 Where is my dataHDInsight Common Friends

62 Mapper Example 62 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4

63 Mapper Example 63 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5

64 Mapper Example – final result 64 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5 (U1 U3) -> U1 U2 U4 U5 (U2 U3) -> U1 U2 U4 U5 (U3 U4) -> U1 U2 U4 U5 (U3 U5) -> U1 U2 U4 U5 U3 -> U1 U2 U4 U5 Mapper Output:Given the Line: (U1 U4) -> U1 U2 U3 U5 (U2 U4) -> U1 U2 U3 U5 (U3 U4) -> U1 U2 U3 U5 (U4 U5) -> U1 U2 U3 U5 U4 -> U1 U2 U3 U5 (U2 U5) -> U2 U3 U4 (U3 U5) -> U2 U3 U4 (U4 U5) -> U2 U3 U4 U5 -> U2 U3 U4

65 Designing our MapReduce job - Reducer The input for the reducer will be structured as: (friend1, friend2)  (friend1 friends) (friend2 friends) The reducer will find the intersection between the lists Output: (friend1, friend2)  (intersection of friend1 and friend2 friends) 65 Where is my dataHDInsight Common Friends

66 Reducer Example 66 Where is my dataHDInsight Common Friends Reducer Output:Given the Line: (U1 U2) -> (U3 U4)(U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4) (U1 U3) -> (U2 U4)(U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4) (U1 U4) -> (U2 U3)(U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4) (U2 U3) -> (U1 U4 U5)(U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5) (U2 U4) -> (U1 U3 U5)(U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5) (U2 U5) -> (U3 U4)(U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4) (U3 U4) -> (U1 U2 U5)(U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5) (U3 U5) -> (U2 U4)(U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4) (U4 U5) -> (U2 U3)(U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)

67 Creating c# MapReduce 67 Where is my dataHDInsight Common Friends

68 Creating c# MapReduce - Mapper 68 Where is my dataHDInsight Common Friends public class CommonFriendsMapper:MapperBase { public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); }

69 Creating c# MapReduce - Reduce 69 Where is my dataHDInsight Common Friends public class CommonFriendsReducer:ReducerCombinerBase { public override void Reduce(string key, IEnumerable strings, ReducerCombinerContext context) { var friendsLists = strings.Select(friendList => friendList.Split(' ')).ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); }

70 Creating c# MapReduce – Hadoop Job 70 Where is my dataHDInsight Common Friends HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop"); Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm"); var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute (myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)

71 Pricing 71 Where is my dataHDInsight 10 node cluster that will exist for 24 hours: Secure Gateway Node - free. head node - 15.36 USD per 24-hour day 1 data node - 7.68 USD per 24-hour day 10 data nodes - 76.80 USD per 24-hour day Total: $92.16 USD

72 WRAP UP 72

73 Comparing the alternatives 73 Storage TypeWhen Should you UseImplications BLOBUnstructured data Files -Application Logic Responsibility -Consider using HDInsight(Hadoop) Relational DBStructured Relational Data ACID transactions -SQL DML+DDL -Could affect scalability -BI Abilities -Reporting Azure Tables, DynamoDB Structured Data Loose Schema Geo Replication (High DR) Auto Sharding -OData, REST -Application Logic -Responsibility(Multiple Schemas) Where is my dataWrap Up

74 What have we seen Blobs Relational DB NoSql MapReduce in the Cloud 74 Where is my dataWrap Up

75 What’s Next NoSql – MongoDB, Cassandra, CouchDB, RavenDB Hadoop ecosystem – Hive, Pig, SQOOP, Mahout Cache Options - Amazon ElastiCache, Azure Cache, InRole Cache, Redis http://blogs.msdn.com/b/windowsazure/ http://blogs.msdn.com/b/windowsazurestorage/ http://blogs.msdn.com/b/bigdatasupport/ 75 Where is my dataWrap Up

76 Presenter contact details c: +972-52-4772946 t: @tamir_dresher@tamir_dresher e: tamirdr@codevalue.nettamirdr@codevalue.net b: TamirDresher.comTamirDresher.com w: www.codevalue.netwww.codevalue.net


Download ppt "Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)"

Similar presentations


Ads by Google