Presentation is loading. Please wait.

Presentation is loading. Please wait.

Term Project #2 Data Management on a Cloud (Azure)

Similar presentations


Presentation on theme: "Term Project #2 Data Management on a Cloud (Azure)"— Presentation transcript:

1 Term Project #2 Data Management on a Cloud (Azure)

2 Input Dataset Social graph –Format USER \t FOLLOWER \n Both are numeric IDs (integers). –Example 12 13 12 14 12 15 16 17 Users 13, 14 and 15 are followers of user 12. User 17 is a follower of user 16. –Provided as text files Restricted user profiles –About users who have > 10,000 followers –Schema twitter.profiles ( numeric_id int primary key, name varchar(20), screen_name varchar(16), friends_count int, followers_count int, following varchar(5), statuses_count int, favourites_count int, location varchar(40), description varchar(165), profile_image_url varchar(235), url varchar(100), created_at varchar(30), time_zone varchar(30), gender varchar(1), verified varchar(5), protected varchar(5) … ) –Stored in SQL Azure Server name: foqev3v3fp.database.windows.net Login: student Password: csed***$ http://an.kaist.ac.kr/traces/WWW2010.html

3 Problem: Who has the largest number of mutual friends in Twitter? 1.Upload a local file (social graph) to Azure blob storage 2.Bulk-load Azure table 1)Read a blob 2)Parse following relationships 3)Store the relationships into Azure table 3.Find mutual friends 1)Read Azure table 2)Self-join the table 4.Count mutual friends for each user 5.Get the name of the user who has the largest number of mutual friends from SQL Azure Distribute and parallelize the workload !!!

4 Web Interface Screen shot

5

6 Upload to Azure blob storage Web RoleWorker Role 12 13 12 14 … Storage upload

7 Upload to Azure blob storage Web Role _Default.UploadDataFileTo BlobStorageButton_Click(…)

8 Bulk-load Azure table Web RoleWorker Role 12 13 12 14 … Storage bulk-load 12 13 12 14 … userid followerid

9 Bulk-load Azure table Web Role _Default.LoadFollowerTabl eFromBlobButton_Click(…)

10 Find mutual friends Web RoleWorker Role Storage Find 12 13 12 14 12 16 … 510 18 510 27 510 320 … 1076 573 1076 589 1077 101 … 12 19 17 30 … 572 347 607 419 … 1087 2097 1090 1573 … Self-join

11 Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke11 Parallel Hash Join v In first phase, partitions get distributed to different sites: –A good hash function automatically distributes work evenly! v Do second phase at each site. v Almost always the winner for equi-join. Original Relations (R then S) OUTPUT 2 B main memory buffers Disk INPUT 1 hash function h B-1 Partitions 1 2 B-1... Phase 1 Textbook Chapter 22 p. 732-735 Textbook Chapter 22 p. 732-735

12 Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke12 Dataflow Network for || Join v Good use of split/merge makes it easier to build parallel versions of sequential join code.

13 Find mutual friends Web Role _Default.FindMutualFr iendsButton_Click(…) Web Role ToDo.FindMutualFriends(req uestQueue,responseQueue) Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() 1:n

14 Count mutual friends for each user Web RoleWorker Role Storage 12 19 17 30 … 572 347 607 419 … 1087 2097 1090 1573 … Count

15 Count mutual friends for each user Web RoleWorker Role Storage 12 19 17 30 … 572 347 607 419 … 1087 2097 1090 1573 … 12 : 3 17 : 5 … userid : #friends 17 : 2 19 : 7 … 12 : 6 25 : 3 … 12 : 9 17 : 7 19 : 7 … Aggregate Summation

16 Count mutual friends for each user Web Role _Default.CountMutual FriendsButton_Click( …) Web Role ToDo.CountMutualFriends(re questQueue,responseQueue); Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() 1:n

17 Get the name of the user Web RoleWorker Role Storage 12 : 9 17 : 7 19 : 7 … SQL Azure SELECT name FROM profiles WHERE numeric_id = 247; Hyunsouk Get name

18 Get the name of the user Web Role _Default.GetNameOf PersonWhoHasTheLa rgestNumberOfFriend sButton_Click(…)

19 ServiceConfiguration.cscfg

20 References Windows Azure Platform Training Course –http://msdn.microsoft.com/en-us/windowsazure/wazplatformtrainingcourse.aspxhttp://msdn.microsoft.com/en-us/windowsazure/wazplatformtrainingcourse.aspx –Demos Hello Windows Azure Building and Deploying a Service Windows Azure using Blobs Demo Windows Azure Worker Role Demo - Using the Worker Role Windows Azure Using Queues Demo Windows Azure Using Table Storage Demo Preparing your SQL Azure Account Connecting to SQL Azure Azure Academic Pilot –http://www.azurepilot.com/http://www.azurepilot.com/ –FREE 30-day pass (promo code: KKUMAR) http://windowsazurepass.com/ Q&A –http://ids.postech.ac.kr/xe/?mid=csed421_11_projhttp://ids.postech.ac.kr/xe/?mid=csed421_11_proj

21 Submission Instructions Make your team of 3-4 people Attachment –Compressed Windows Azure project file –Presentation file Implementation idea Experimental results –on Azure »Web page screen capture running –on your PC emulator »Performance with different number of worker roles Bonus: Other interesting problems with twitter data on Azure Due –To be announced

22 Demo Hello Windows Azure


Download ppt "Term Project #2 Data Management on a Cloud (Azure)"

Similar presentations


Ads by Google