Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Similar presentations


Presentation on theme: "Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu."— Presentation transcript:

1 Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu

2 We’ve gotten used to great applications

3 Enabling Such Apps is Hard Apps – Process huge amounts of data – Are fast – Are reliable One machine is not enough – Limited reliability, speed Super computers are expensive

4 What Makes These Applications Tick?

5 Distributed Systems

6 Cares about technology relating to distributed systems: – Networks – Virtual machines – Distributed filesystems – Distributed computation We care about details, not about products – Why? This course…

7 Traditional Data Center Network Topology … Racks of servers Top of Rack Switches Aggregation Switches Core Switch 1Gbps 10Gbps

8 Fat Tree Topology [Fares et al., 2008; Clos, 1953] Aggregation Switches K Pods with K Switches each K=4 Racks of servers 1Gbps

9 Many operating systems running on a single box Provides: – Isolation – Flexibility – Better utilization of the machine Inside a Machine: Virtualization

10 How do we store data? Distributed filesystem – NFS: UNIX-like semantics Single server Limited scalability – Google File System Optimized for large-batch writes and sequential reads Tolerates inconsistency

11 How do we get work done? Map reduce – Apply the same function in parallel on different data on many machines – Aggregate results Useful for: – Building big web-search indices – Processing large amounts of data (PB)

12 This is just a taster

13 Course outline Distributed Apps we care about – Distributed Computation (Map Reduce, Driad, Hadoop) – Distributed Filesystems (NFS and GFS) – Web search – Caching (Memcached) – Distributed Hash Tables (Chord, Dynamo) – NoSQL databases (BigTable, Cassandra) Infrastructure: networks – Topologies: FatTree, VL2, Bcube – Using capacity: Hedera, MPTCP – Performance Optimizations: Incast, DCTCP

14 Course outline [2] Infrastructure: OS abstractions – Virtual Machines (Xen, VMM) – Distributed memory (Ivy) Security – Information Leakage – Good Isolation vs. High Utilization (Seawall, CloudPolice)

15 Course Admin Lectures: – 2 hours per week, Tuesday 8-10 EC102 Lab classes: – 2 hours per week, Tuesday 10-12 EG106 – Project discussions – Help with practical issues – Help with high level goals, theory Website: curs.cs.pub.ro – If you have problems, let me know

16 Grading Project: 5p – Groups of 3-4 students – 4 stages: to help you get the job done easily, without last minute work over Christmas Exam: 3p Presentation (1h): 1p Class participation: 1p

17 Presentation Select one topic before the end of October (list will be posted this week) – Presentation date is fixed – If you miss your presentation, you lose 2p Class participation – 2 papers presented per course by your colleagues – Read them before and take part in discussion

18 Exam Open book Need to understand and think – not memorize Studying 3 days before the exam won’t work – You need to take part in classes and read-up

19 Projects Large scale data processing with MapReduce – We will use Apache Hadoop – We will run code on Amazon EC2 (and maybe on local clusters) – Several datasets you can choose from

20 Datasets available Crawled set of HTML pages from.uk Wikipedia Page Traffic Statistics Apache Mail Archives Million Song Dataset M-Lab dataset: Network Path and Application Diagnosis tool Human genome US Census databases Freebase data dump

21 Stage 1 Choose dataset to use Select one/many questions to answer using the dataset Write small Hadoop script to parse a subset of the data Come up with a few simple graphs (e.g. dataset size, histograms) Start writing: Introduction to your report, problem statement Start the implementation and evaluation – Size of dataset, time to do one pass, etc. Strict deadline [1p]: November 1 st

22 Stage 2 How do we solve the problem? – Review related work – Select potential approaches Discuss pros/cons Implementation and evaluation – Implement the code – Run experiments – Refine code and reiterate Goal: 70% of functionality should be implemented Deadline [1p]: December 1 st – Output in report: Implementation section Early evaluation section

23 Stage 3 Final implementation Evaluation What did we learn? Deadline [1p]: December 21 th – In class project presentation: 10 mins

24 Stage 4 Write-up – Polish report – Create a coherent story – Convince me that this is useful Deadline to hand-in final report: last day of semester (January 14 th ) [1p]


Download ppt "Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu."

Similar presentations


Ads by Google