Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 432: (Net-Centric Computing ) and (Distributed Systems  )

Similar presentations


Presentation on theme: "CS 432: (Net-Centric Computing ) and (Distributed Systems  )"— Presentation transcript:

1 CS 432: (Net-Centric Computing ) and (Distributed Systems  )
Lecture 1: Introduction

2 Course Information Instructor Teaching Assistants: Ahmed M E Hassan
Assistant Professor, Alexandria University Office hours: Monday 8 – 10 AM, Thursday 10 AM -12 PM (or by appointment) Teaching Assistants: Eng: Alaa Ebshihy ( Others TBD

3 Course Information Textbooks Other references
Andrew, Tanenbaum S., and Maarten van Steen. Distributed systems- principles and paradigms (Second Edition). Pearson Education, 2007 Coulouris, George F., Jean Dollimore, and Tim Kindberg. Distributed systems: concepts and design (Fifth Edition). Pearson Education, 2011 Other references Tamer Ozsu. Principles of Distributed Database Systems P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems Nancy Lynch. Distributed Algorithms Christian Cachin, Rachid Guerraoui, and Luis Rodrigues. Introduction to Reliable and Secure Distributed Programming Research articles: Mapreduce, Paxos, GFS, …

4 Course Information Web page (enrollment code cs432S19): andria_university/spring2019/cs432 Lecture notes, lab and homework assignments News and announcements related to the course Discussions. Instructors and TAs will monitor these discussions and answer when needed

5 Course Information Roadmap Systems Courses: A lot of fun!!
Particular system area: computer organization, computer architecture, operating systems, networks, distributed systems, embedded systems General systems principles: Low-level programming, performance measurement, security A lot of fun!! Principles Architectures Algorithms Design Case Studies

6 Why We Study Distributed Systems?
“The past decade has brought explosive growth in multiprocessor computing, including multi-core processors and distributed data centers. As a result, parallel and distributed computing has moved from a largely elective topic to become more of a core component of undergraduate computing curricula.” Credit: ACM/IEEE Computer Science Curricula 2013  Looks interesting to understand design fundamentals of platforms used by Google, Facebook, Twitter, Amazon, …  Did you like synchronization techniques in OS, data transfer and routing algorithms in Networks, and transaction management in databases. stay tuned for more 

7 Quiz Which is closer to what we will study in the course?
How Distributed systems can be used to solve complex computational problems How Distributed systems can be provided as services to end users How problems that require massive data can be solved using distributed systems How the internals of distributed systems are designed

8 Quiz Which is closer to what we will study in the course?
How Distributed systems can be used to solve complex computational problems HPC How Distributed systems can be provided as services to end users Cloud How problems that require massive data can be solved using distributed systems Big Data Analytics How the internals of distributed systems are designed Concepts (synchronization, coordination, replication, transactions) Case Studies (MPI, RPC, MapReduce, GFS, Chubby)

9 Course Outline Week # Lectures Sessions and Labs 1 Introduction
No sessions 2 Case Study: MapReduce Lab 0: Hadoop and MapReduce 3 Models of Distributed Systems Lab 1 : K-means using Hadoop + Sheet 1 4 Middleware 5 Synchronization (time and clocks, global state) Lab 2: RPC + Sheet 2 6 Coordination (mutual exclusion, election, multicast ordering, consensus) 7 Replication, Consistency Models, and Fault tolerance Sheet 3 + Project Assigned (TBD) 8 Midterms 9 Transactions and concurrency control Lab3: K-means using Spark 10 Distributed File Systems 11 Distributed Transactions Sheet 4 12 Consensus Revisited (Paxos, Block Chains) 13 Case Studies: (Google) GFS, Chubby, BigTable Project Discussion 14 Overview of Big Data Analytics and Cloud Computing 15 Finals

10 Grading (Tentative) Total: 150
Final exam: 90 (enforced by department bylaws) Midterm and Quizzes: 20-30 Homework assignments: 5-10 Lab Assignments: 10-15 Project: 10-15

11 Acknowledgment Course contents, slides, assignments, and other material of the course are all adapted versions of the material prepared by Dr Iman Elghandour. The course is designed by consulting similar courses offered at the following universities: University of Waterloo University of Illinois (coursera Cloud Computing Specialization) Others: Stanford, MIT, CMU, UCSB

12 Reading for This Lecture
Chapter 1, Sec:1.1. Distributed systems-principles and paradigms (Second Edition). Andrew Tanenbaum. Chapter 1, Sec:1.1, 1.2,1.5. Distributed systems: concepts and design (Fifth Edition). George Coulouris.

13 Definition “A collection of independent computers that appears to its users as a single coherent system.” (Tanenbaum) “One in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages.” (Coulouris) “You know you have one when the crash of a computer you've never heard of stops you from getting any work done.” (Leslie Lamport)

14 Consequences of That Definition
Concurrency Concurrent program execution is the norm Synchronization is required (e.g., ordering system events) Coordination is required (e.g., accessing shared resources) Challenges: correctness and performance Capacity of the system to handle shared resources can be increased by adding more resources (e.g., computers) Scaling out Vs Scaling up The main motivation of building distributed systems at first!!

15 Consequences of That Definition
No global clock Communication by exchanging messages There is no single global notion of the correct time (at which an action occurs relatively to another). How to synchronize?!

16 Consequences of That Definition
Independent failures Both machines and network can fail (crash, misbehave, or become slow) Failure models Each component of the system can fail independently, leaving the others still running (if they can!!). Fault Tolerance Some components may be unaware of others failure Distinguish between failed and slow resources

17 Layers in a Distributed System
To support heterogeneous machines while offering a single system view, distributed systems are organized as a layer (middleware) between: Users and applications Operating systems and communication facilities

18 Quiz How this course is connected to the following courses: Networks
Operating Systems Database Multiprocessor programming Distributed Computing

19 Quiz How this course is connected to the following courses: Networks
Operating Systems Database Multiprocessor programming Distributed Computing Layering, Message Passing Layering, Middleware Replication, Transaction Shared Memory VS Message Passing General term

20 Examples: Web Search Index the contents of the WWW
46.6 billion pages ( Google as an example: Distributed infrastructure (networked data centers) Distributed file system Distributed storage system of large datasets Distributed locking/agreement service Programming model

21 Examples: others Finance and commerce
eCommerce e.g. Amazon and eBay, PayPal, online banking and trading The information society Web information and search engines, ebooks, Wikipedia; social networking: Facebook and MySpace. Creative industries and entertainment online gaming, music and film in the home, user-generated content, e.g. YouTube, Flickr Healthcare health informatics, on online patient records, monitoring patients Education e-learning, virtual learning environments; distance learning Transport and logistics GPS in route finding systems, map services: Google Maps, Google Earth Science The Grid as an enabling technology for collaboration between scientists Environmental management sensor technology to monitor earthquakes, floods or tsunamis

22 Challenges Heterogeneity Openness Security Scalability
Failure handling Concurrency Transparency Quality of Service

23 Heterogeneity Many components: Solution: use middleware
Networks: wireless/wired, reliable/unreliable Computer hardware: deal with messages sent from one machine to another Operating systems: different interfaces are provided by different OSs Programming languages: different representations of characters and data structures Implementations: standards are needed to be able to communicate Solution: use middleware Programming abstraction and heterogeneity masking (eg. CORBA, RMI, RPC, MapReduce) Usually implemented on top of Internet protocols

24 Openness Determining whether the system can be extended and reimplemented Solution: published interfaces Specification and documentation of the key software interfaces of the components of a system are made available to software developers E.g.: RFCs for internet protocols, W3C for the Web. Components of distributed systems are heterogeneous, however, they all conform to the publish standard.

25 Security Components Example challenges (Coulouris -- Chapter 11)
Confidentiality: protection against disclosure to unauthorized individuals Integrity: protection against alteration or corruption Availability: protection against interference with the means to access the resources Example challenges (Coulouris -- Chapter 11) Protecting contents of messages sent over the network Identifying remote users Protection against denial of service attacks

26 Scalability Remaining effective when there is a significant increase in the number of resources and the number of users. Challenges of designing scalable distributed systems Possibility of extending the system at reasonable cost e.g.: O(n) resources to serve n users Reducing performance loss e.g. Hierarchical DNS for O(log n) loss Preventing software resources running out e.g. IPv4 and IPv6 Avoiding performance bottlenecks e.g. decentralization

27 Failure Handling Failures in a distributed system are partial – that is, some components fail while others continue to function. Dealing with failures requires: Detecting failures Masking failures Tolerating failures Recovery from failures

28 Failure Handling Detecting failures: Masking failures:
may be easy (checksum), or hard (failed vs slow remote server) Masking failures: E.g.: dropping corrupted messages and retransmitting them Tolerating failures: simple (user-level): informing the user about the problem Complex (system-level): replication (Coulouris – Chapter 18) Recovery from failures: Rolling back to a permanent state E.g., distributed transactions (Coulouris – Chapter 17)

29 Concurrency Accessing shared resources at the same time
Synchronization (Coulouris – Chapter 14, 15, 16, 17) Any object that represents a shared resource in a distributed system must be responsible for ensuring that it operates correctly in a concurrent environment. Remember: Message passing, no global clock.

30 Transparency Access transparency: enables local and remote resources to be accessed using identical operations. Location transparency: enables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address). Concurrency transparency: enables several processes to operate concurrently using shared resources without interference between them.

31 Transparency Replication transparency: enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers. Failure transparency: enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. Mobility transparency: allows the movement of resources and clients within a system without affecting the operation of users or programs.

32 Transparency Performance transparency: allows the system to be reconfigured to improve performance as loads vary. Scaling transparency: allows the system and applications to expand in scale without change to the system structure or the application algorithms.

33 Quality of Service Extending “what is provided?” to be “What is its quality?” The main nonfunctional properties of systems that affect the quality of the service experienced by clients and users are: Reliability Security Performance Adaptability to meet changing system configurations and resource availability Others can be domain specific: Time-critical data handling (e.g., fixed rate multimedia streaming) meeting deadlines (Real Time Operating Systems).

34 Case Study: WWW Web Servers and Web Browsers

35 Thanks!


Download ppt "CS 432: (Net-Centric Computing ) and (Distributed Systems  )"

Similar presentations


Ads by Google