FIT5174 Distributed & Parallel Systems

Slides:

Advertisements

Similar presentations

Distributed Data Processing

Advertisements

Distributed Processing, Client/Server and Clusters

Database Architectures and the Web

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.

2. Computer Clusters for Scalable Parallel Computing

Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.

Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.

Distributed Processing, Client/Server, and Clusters

Distributed components

Chapter 1: Introduction

Technical Architectures

Distributed Systems Architectures

City University London

© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.

OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.

1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.

Distributed Systems Architecture Presentation II Presenters Rose Kit & Turgut Tezir.

DISTRIBUTED COMPUTING

Distributed Systems: Client/Server Computing

Introduction to client/server architecture

Computer System Architectures Computer System Software

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.

Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.

Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.

DISTRIBUTED COMPUTING

Lecture 3: Sun: 16/4/1435 Distributed Computing Technologies and Middleware Lecturer/ Kawther Abas CS- 492 : Distributed system.

Jozef Goetz, Application Layer PART VI Jozef Goetz, Position of application layer The application layer enables the user, whether human.

CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.

Lecture 15 Introduction to Web Services Web Service Applications.

Distributed Systems: Concepts and Design Chapter 1 Pages

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

Architectures of distributed systems Fundamental Models

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.

Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Distributed Computing A Programmer’s Perspective.

CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.

Distributed Systems Unit – 1 Concepts of DS By :- Maulik V. Dhamecha Maulik V. Dhamecha (M.Tech.)

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.

Background Computer System Architectures Computer System Software.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Chapter 1: Introduction

Grid Computing.

CHAPTER 3 Architectures for Distributed Systems

Introduction to client/server architecture

Introduction to Cloud Computing

Chapter 16: Distributed System Structures

Chapter 17: Database System Architectures

Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.

Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.

Architectures of distributed systems Fundamental Models

Architectures of distributed systems Fundamental Models

Subject Name: Operating System Concepts Subject Number:

Introduction To Distributed Systems

Architectures of distributed systems Fundamental Models

Database System Architectures

Presentation transcript:

FIT5174 Distributed & Parallel Systems Lecture 2 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Why Study Distributed Computing? Traditional “centralised” computing has been displaced largely by distributed computing schemes over the last 20 years – centralised mainframes are used much less; Software applications are increasingly being developed for distributed computing environments, and older applications are being ported to such; For a software application to both perform well and be reliable, it must be designed around the environments it is expected to be operated in; Programmers and software architects therefore need a good understanding of distributed computing environments to be effective and successful in current and future industry; Many badly behaved applications in distributed environments are a result of insufficient understanding by programmers. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

What is Distributed Computing? Distributed computing involves any computing system, where computational activity is divided across multiple host machines, linked by a digital communications medium of some type. While a parallel computing system may be distributed, many distributed systems do not closely fit the model of parallel computing, as the components of the distributed system and applications may be dissimilar in many ways. The advent of wide area networks, especially the Internet, has been the principal enabler for distributed computing systems. The first “modern” distributed computing systems emerged during the 1980s, exploiting high speed Local Area Networks (LAN) to provide connectivity. Growth of the Internet has stimulated distributed computing. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Parallel versus Distributed Computing FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Definition of Distributed Systems “A distributed system is a collection of independent computers that appears to its users as a single coherent system.” The definition has several important aspects Autonomous components Users (whether people or program) think they are dealing with a single system A distributed system is a system in which components located at networked computers communicate and coordinate their actions only by passing messages. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Advantages of Distributed Systems Reliability: If 5% of the machines are downed, the system as a whole can still survive with a 5% degradation of performance. Incremental growth: Computing power can be added in small increments Sharing: Allow many users access to a common database and peripherals. Communication: Make human-to-human communication easier. Effective Resource Utilization: Spread the workload over the available machines in the most cost effective way. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Disadvantages of Distributed Systems Software: It is harder to develop distributed software than centralized software . Networking: The network can saturate and slow down dramatically or cause other problems. Security: Ease of access also applies to secret data. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Challenges in Distributed Systems Heterogeneity - Within a distributed system, we have variety in networks, computer hardware, operating systems, programming languages, etc. Openness - New services are added to distributed systems. To do that, specifications of components, or at least the interfaces to the components, must be published. Transparency - One distributed system made to look like a single computer by concealing the distribution mechanism. Performance - One of the objectives of distributed systems is achieving high performance while using cheap computers. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Challenges in Distributed Systems Scalability - A distributed system may include thousands of computers. Whether the system works is the question in that large scale. Failure Handling - One distributed system is composed of many components. That results in high probability of having failure in the system. Security - Because many stake-holders are involved in a distributed system, the interaction must be authenticated, the data must be concealed from unauthorized users, and so on. Concurrency - Many programs run simultaneously in a system and they share resources. They should not interfere with each other FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Heterogeneity A distributed system may be composed of a heterogeneous collection of computers. Heterogeneity arises in the following areas- networks: Even if the same Internet protocol is used to communicate, the performance parameters may widely vary within the inter-network. computer hardware: Internal representation of data is different for different processors. operating systems: The interface for exchanging messages is different from one operating system to another. programming languages: Characters and data structures are represented differently by different programming languages. implementations by different developers: Unless common standards are observed, different implementations cannot communicate. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Openness Openness means disclosing information: the usage of services provided by remote computers in particular. Open systems are easier to extend and reuse. By making services open, servers can be used by various clients. The clients which use services provided by other servers can extend the services and again provide services to other clients. The openness of distributed systems let us add new services and increase availability of services to different clients. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Transparency Transparency involves not being able to see something, or seeing through it. Transparency is an important issue to realize the single system image which makes systems as easy to use as a single processor system. e.g. in WWW we can access whatever information by clicking links without knowing whereabouts of the host. Classification of Transparency Access transparency: Data and resources can be used in a consistent way. Location transparency: A user cannot tell where resources are located Migration transparency: Resources can move at will without changing their names. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Classification of Transparency Replication transparency: A user cannot tell how many copies exist. Concurrency transparency: Multiple users can share resources automatically. Failure transparency: A user does not notice resource failure. Performance transparency: Systems are reconfigured to improve performance as loads vary Scaling transparency: Systems can expand in size without changing the system structure and the application programs. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Performance Fine-grained parallelism: Small programs are executed in parallel. Large number of messages. Communication overhead decreases the performance gain with parallel processing. Coarse-grained parallelism: Long compute-bound programs executed in parallel. Communication overhead is less in this case. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Scalability Scalability is the issue whether a distributed system works and the performance increases when more computers are added to the system. The followings are potential bottle-necks in very large distributed systems Centralized components: A single mail server for all users. Centralized tables: A single on-line telephone book Centralized algorithms: Routing based on complete information Use decentralized algorithms for scalability: No machine has complete information about the system state. Machines make decisions based only on local information. Failure of one machine does not completely invalidate the algorithm. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Reliability We have high probability to have faulty components in a distributed system because the system includes large number of components. On the other hand, it is theoretically possible to build a distributed system such that if a machine goes down, the other machine takes over the job. Reliability has several aspects. Availability: The fraction of time that the system is available. It can be expressed by the following equation- Fault tolerance: Distributed systems can hide failures from the users. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Performance Maximum aggregate performance of the system can be measured in terms of Maximum aggregate floating-point operations. P = N*C*F*R Where P performance in flops, N number of nodes, C number of CPUs, F floating point ops per clock period - FLOP, R clock rate. The similar measures with MOP/MIP. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Scalability It is computed as S = T(1) / T(N) Where T(1) is the wall clock time for a program to run on a single processor. T(N) is the runtime over N processors. A scalability figure close to N means the program scales well. Scalability metric helps estimate the optimal number of processors for an application. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Utilization It is calculated as, U = S(N)/N Values close to unity or 100% are ideally sought. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed Computing Models Client-Server – well established protocols e.g. HTTP, running over socket connections. Client-Server – Remote Procedure Calls (RPC) where client can execute program on server, NFS over RPC. Client-Server – OO schemes such as CORBA (Common Object Request Broker) where client can create, destroy, execute objects on a remote or local server. Parallel Distributed Schemes – Clusters, Grids, Clouds. Clusters – large numbers of homogenous or inhomogenous machines working on a typically parallel problem; Grids generalise and extend the cluster scheme through standard interfaces over wide area networks. Clouds – provide an environment where application environment can be decoupled from the native platform. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Client Server Computing Introduced during the 1980s with early Local Area Networks; Extension of existing protocols for host-host connectivity such as Telnet, FTP; Primary connections are simple streams using either the BSD Unix Socket programming interface, or the SVR4 Unix Streams programming interface; Higher level protocols are typically application specific, and rigidly split functions between a client and a server; The client makes requests to the server, which in turn services them; The two most widely used examples of protocols are the Hypertext Transfer Protocol used for the W3 and the X.org X11 X-Window System protocol used for graphics displays on Unix, Linux and BSD systems. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Remote Procedure Calls (RPC) The very widely used ONC RPC (RFC1831) protocol is a good example of a more advanced distributed computing scheme; The central idea in all RPC schemes is that a procedure (i.e. function call) may be executed locally on a host, or on a remote server host; The client host will make a request upon a server which involves a procedure identifier (name) and some list of arguments; the server then returns the results of the call; An important issue in all RPC systems is data representation, as the client and server may have different “endianness” and thus data being sent as arguments, or procedure results must be formatted in a compatible fashion; Another issue is whether the protocol is “stateful” or “stateless”, i.e. whether the server remembers state. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Object Oriented Distributed Schemes While the RPC scheme proved very effective and useful, it was syntactically modelled on procedural programming, and did not fit cleanly into increasingly popular Object Oriented programming languages such as C++ and later Java; The most widely used systems are CORBA and .NET both of which extend the RPC model to permit a client to create, run or destroy an object on a remote or local server process; The intent of both schemes is to provide a programmer with a completely transparent OO programming environment where a specific service to the called can be local to user host system, or located on a distant server host; Both CORBA and .NET were initially architected around a simple client server model, where typically many clients were serviced by a single server. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Multiprocessing vs Cluster Host Systems When Client-Server systems were introduced during the 1980s, most client systems were desktops with a single CPU chip, and servers were single or multiple CPU workgroup servers, superminicomputers, or mainframe computers; By the 1990s the increasing performance and declining cost of 32-bit microprocessors resulted in the introduction of the first “clusters”; Rather than connecting a large number of CPU chips via a large shared fast parallel bus and memory in a single backplane host housed in a single chassis, a cluster used a high speed LAN to interconnect initially dozens, and later hundreds or thousands of individual machines to form a parallel computing system; Early clusters were built from ordinary desktop chassis. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Multiprocessing vs Cluster Host Systems … Since the 1990s hardware manufacturers have shifted to racked CPU chassis specifically designed to permit the construction of very large clusters; Clusters resulted in two important changes in the programming environment for distributed computing; A.InterProcess Communications (IPC) shifted to Socket / Stream techniques as a networked cluster did not have a common shared memory or other IPC mechanisms which are specific to an operating system kernel or upper layers; B.Programming models had to be capable of handling possibly large numbers of CPUs, not just the 2, 4, 8, 16 CPUs typical for standalone multiple CPU hosts. Clusters were initially used for large web servers and supercomputers, the latter mainly for scientific computing. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Post 2000 – Divergence in Distributing Computing The decade following the 1990s saw Moore’s Law effects driving the performance of CPUs upward, and costs downward, making clustering more affordable; User needs and expectations also changed as the WWW expanded strongly to cover commercial applications such as retailing and search engines; Different user communities with different needs influenced growth in distributed applications and techniques; This is important, insofar as commercial users, research users and social networking users all have different needs in a distributed processing environment; A multimedia server network like Youtube has very different requirements to a generic web server, or a retail server scheme like Amazon – different applications and protocols. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Large Scale Distributed Computing Schemes At this time “traditional” small and medium scale client-server schemes continue to be widely used – based on X11, NFS/RPC, CORBA and .NET and other established schemes; Large scale distributed computing schemes are filling other niches using protocols, software interfaces and applications specifically built for large numbers of CPUs; these are based on clusters, grids and web service or cloud schemes; Cluster Schemes: Web servers and scientific supercomputers Grid Schemes: Scientific supercomputing arrangements Web Services / Cloud Schemes: Commercial search engines, social networking media, retail and wholesale systems, outsourced storage and computing services, and a range of other possible applications. What does this mean? FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Impact of Diversity in Distributed Computing At this time there is greater diversity in available distributed computing technology than ever before; Diversity will continue to increase as the technology matures, but also as new applications are devised, built to address different needs; A programmer may have to develop software within more than one distributed computing environment, and sometimes interface across one or more; To best exploit such an environment, a programmer must have a good understanding of the fundamental concepts which are unique to each of the these schemes, but also understand what problems and limitations all share; The underlying networks and operating systems will always influence behaviour and performance in such systems. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Inter-Process Communications Inter-Process Communications (IPC) are all mechanisms which permit two or more processes to exchange messages; IPC mechanisms may be restricted to communications within a host machine’s operating system, or may permit communications between processes on different host machines connected via a network; IPC mechanisms restricted to a host operating system include: Shared Memory Message Passing Unix Signals and analogues Shared Memory imposes no structure on messages, but Signals and Message Passing typically force discrete message structures. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Shared Memory IPC – Limited to Local Host FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Stream Oriented Inter-Process Communications Stream Oriented IPC is especially important since it is the most commonly used scheme for IPC within operating systems and between networked hosts; A stream connection imposes no implicit structure on the data being sent – character mode “streams” of data are handled in a FIFO manner, arriving typically at the destination in the order they were sent; The two most widely used stream oriented IPC schemes are the BSD Socket mechanism, and the SVR4 STREAMS mechanism; Application protocols treat the stream as a transparent pipe. Both are designed around standard software Application Programming Interfaces (API), although most recent STREAMS implementations also include an interface to emulate the BSD Socket API; FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

SVR4 STREAMS vs BSD Sockets FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

BSD Socket Application Programming Interface Opening a socket connection requires the creation of the socket with a socket() system call, binding a socket address to the socket with a bind() call and initiating the connection with a connect() call. The socket() call returns an index into the process file table, termed a file descriptor in Unix/Linux/BSD. Once the connection is open, the programmer may use both socket specific calls or the established Unix/Linux/BSD read() and write() system calls. The BSD Socket has become the defacto standard low level programming interface for networked IPC, although in most contemporary applications it is hidden below other protocols; Nearly all operating systems will provide a BSD socket API, although some will use the newer POSIX standard. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Socket Function Prototypes int socket(int domain, int type, int protocol); Where the domain can be IPV4, IPV6 or UNIX (local to host); int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen); Where the arguments define the interface; int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen); Where *serv_addr is the address of the server being connected to; Once the stream is established between two processes on two hosts, traffic can be sent or received using send(), recv(), write(), read() system calls; The socket is shut down using a close() call. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Client-Server – HTTP over Sockets HTTP (Hypertext Transfer Protocol) is the most widely used protocol on the WWW and is a good example of a protocol built on top of the BSD Socket API; When a browser (client) intends to make a request of a web server (server), it opens a socket connection over the Internet to the web server; The browser then sends a HTTP Method message to the web server, for instance: GET /mypath/to/myfile/blogs.html HTTP/1.0 The socket connection is then closed, while the server processes the method request; Once processing is complete, the web server opens a socket connection to the client, and responds with a message, header and MIME encoded body: HTTP/1.0 404 Not Found FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Client-Server – HTTP over Sockets [200 OK] HTTP Response: HTTP/1.0 200 OK HTTP Header: Last-Modified: Tue, 12 Jul 2011 21:59:59 GMT HTTP Body: Content-Type: text/html Content-Length: 512 <!DOCTYPE doctype PUBLIC "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="Author" content=“Carlo Kopp"> <meta name="GENERATOR" content="Mozilla/4.55 [en] (X11; U; Linux 2.4.2 i386) [Netscape]"> <title>Carlo Kopp's Homepage</title> </head> <body FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Client-Server – HTTP over Sockets … More page content … <center></center> </span> </body> </html> Once the Body is transferred, the socket connection is then closed; HTTP is widely used to support other mechanisms used in distributed computing; Secure HTTP (SHTTP) employs a more complex connection mechanism due to the use of TLS or SSL encryption layers; As HTTP lacks mechanisms to handle multiple servers concurrently, it is a good example of a basic client server protocol. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

ONC RPC Model FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

ONC RPC Model FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

CORBA Model (OMG 2000) CORBA extends the RPC model from simple procedures to complete objects including data; The intent was to provide a client-server scheme which was suitable for OO languages; CORBA provides interfaces for the C, C++, Java, COBOL, Smalltalk, ADA, LISP, and Python languages. CORBA is primarily built for client-server applications. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Client-Server Schemes vs Parallelism Client-server schemes, encompassing BSD Sockets, ONC RPC, and CORBA all share common features, but reflect evolving expectations in functionality; All are based on the idea of many clients accessing typically a single server; All are designed around the favoured programming model and languages of the period when they were devised; All have limited instrumentation and facilities for managing or balancing loads on servers; Most importantly, they are not designed around the expectation than dozens or hundreds of servers may be accessible, and that applications might exploit parallelism across large numbers of host machines. Clusters, Grids, Clouds evolved to exploit parallelism. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Clusters The central idea underpinning all clusters was to increase available computing power for an application by aggregating a large number of cheap machines via a fast shared interface; Early clusters used proprietary or custom fast busses to provide high speed interfaces, or ordinary 10-Base-T and later 100-Base-T Ethernet to provide low cost interfaces; Aggregating a large number of machines provides raw computing power, and fast interconnection provides bandwidth between the CPUs in the machines – but neither address the problem of how to spread the computing load evenly across the CPUs, or how to construct applications to run on such a system; Software remains the critical design issue in all distributed computing environments! FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Programming Clusters Clusters will remain widely used for the foreseeable future, in scientific computing, and commercial computing; Some clusters will run “traditional” clustering software environments, but some will run grid middleware, or cloud runtime software – in a sense many grids and cloud systems are effectively clusters operating under layers of middleware code to provide a grid or cloud programming interface; Traditional clustering software falls into three broad categories: Parallel Computing APIs such as MPI, typically for Finite Element Modelling engineering/scientific applications; Parametric Computing environments such as Nimrod/Enfuzion; Load sharing environments like PVM and LVS. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Grids The intent behind the development of grids was to overcome many of the basic limitations seen in clusters by providing a high standardised software API implemented as “middleware”: Grids would permit the aggregation of much greater numbers of machines than clusters; Grids would permit the aggregation of machines across geographically distant sites if required; Grids would decouple the application from the specifics of the underlying hardware, and provide security mechanisms; Grids would provide a much more flexible programmer interface to permit a wider range of applications to be run; Grids would provide “on demand” computing power in a manner analogous to an electricity or other utility “grid”. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Programming Grids Grid applications can be architected around the open source Globus grid middleware, or a range of application specific proprietary middleware products; Grid middleware is intended to provide a “clean” API for the programmer, which conceals the details of the grid hardware and operating systems as much as possible, and provides embedded job management, load management, instrumentation and security mechanisms; Current grid middleware addresses most of the functional requirements well, but is immature in some areas, especially those involving network Quality of Service mechanisms; While optimal application performance often requires new application designs for grid execution, many legacy applications have been successfully ported to grids. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Clouds NIST Cloud Definition: Cloud computing is a “model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” The intent behind Cloud Computing is to extend the ideas used in Grid Computing to provide a “utility” computing services scheme, where compute power, storage, or other services can be used flexibly on demand, and their usage metered and charged for; Cloud Computing is currently dominated by proprietary products rather than open standards. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 46

Programming Clouds Most Cloud products currently available involve providing access to extant pools of servers built for other applications, such as bulk storage, retailing services, web searches and like; The Cloud product is employed to exploit and earn return on investment on surplus computing or storage capacity in an existing pool of servers; Therefore, there is no suite of standard APIs for Clouds, with providers often offering proprietary APIs or applications; Many cloud products are based on “virtual machines” where an emulator such as VMWare or Parallels is provided, and the end user must provide the operating system, applications and other tools – often this is called a “bare metal” API; Open Cloud Computing Interface (OCCI) is in development. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Parallel Processing… Parallel applications led to parallel computing: Coarse-grained systems – SMP (Symmetrical Multi Processing); Medium-grained systems – Clusters; Fine-grained systems – MPP (Massively Parallel Processing); The intention behind all of these schemes is to divide up large computational tasks across large numbers of processors; Performance improvements intended to be proportional to the number of processors, but limited by Amdahl’s Law; These schemes were well established by the 1990s but with the exception of SMP, usage was mostly limited to specialised applications, mainly in scientific computing. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 48

Parallel Processing Limitations Problems with MPP: Highly application specific; Limited to “embarrassingly” parallel methods; Expensive to administer, specialist skills required; Costly due to the use of custom hardware; SMP is heavily dependent on the internal bus design used to interconnect the processors: Difficult to scale-up due to bus limitations; Low parallelism yields resulting from bus size constraints; Provides transparent access to parallelism; Costly due to the use of custom hardware for busses and processor modules; Widely used in period server hosts (Sun, SGI, DEC, IBM); FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 49

Clustered Computing Cluster computing provides a cheaper alternative with free software components: Parallel Virtual Machine (PVM) software; Message Passing Interface (MPI) software; Parametric Computing Toolsets (Nimrod/Enfuzion); Network Area Storage (NAS) Network File System (NFS) – established client–server code; Andrew File System (AFS) with greater fault-tolerance and resource utilisation; Clustering provided a viable solution especially where the application was easy to paralelise, but is limited in growth by the performance of the network or bussing design used; FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 50

Batch Processing… Central computers used batch jobs concept Batch jobs are scheduled according to their resource needs: Short queue for low CPU low resource jobs Long queue for massive jobs SMP brought the concept of job scheduling over groups of CPUs; Batch processing is well suited to repeated runs using the same application; Problems can arise in mixed workloads with determining best scheduling strategies. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 51

Batch Processing … Batch job processing then moved to distributed environments Load Sharing Facility (LSF), a sophisticated batch queue manager with distributed systems support Condor and Condor-G, supports service disruptions. Unicore, is much more than a queue manager, a vertically integrated grid infrastructure. Globus Resource Allocation Manager (Grids). FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 52

Data Storage Network Area Storage (NAS): Logically distributed storage; Protocol based access to a central storage infrastructure, NFS, AFS; Easy to implement; Low cost but performance limited by network; Storage Area Network (SAN): Physically distributed storage; Requires special networking infrastructure; Higher in initial cost; Highly-scalable and flexible; FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 53

Storage Management Hierarchical Storage Management (HSM) SAMfs, Sun Microsystems, open tar based Data Management Facility (DMF), widely used CASTOR, from CERN Meta data – data describing other data: Meta Data Service (MDS) Storage Resource Broker (SRB) Metadata management is becoming a major area of interest in archiving and search applications. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 54

HPPC Infrastructure High Performance Parallel Computing High Throughput Networks: Ethernet (10 Megabit/s) Fast Ethernet(100 Megabit/s) Gigabit(1000 Megabit/s) 10 Gigabit etc Low-latency networks: Myrinet, 4-10 times lower latency than standard TCP/IP over Gigabit; Network infrastructure performance can be critical in any HPPC or HPC system designs – throughput and latency can have a large performance impact. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 55

Long Distance High Throughput Data Transfers HPPC/HPC applications forced the development of new protocols capable of performing large parallel transfers quickly; GridFTP: provides software level data streaming for higher throughput; Fast TCP: provides better utilisation of the network by TCP parameter tuning; SCTP: provides data streaming at the network level for better network bandwidth utilisation and quality of service; Insufficient network performance can cripple HPPC/HPC applications. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 56

Distributed Computing Evolution BSD Sockets: Early FTP, SMTP, and NNTP Services Remote Procedure Call (RPC) Windows COM/DCOM CORBA for supporting OO concepts Java Remote Method Invocation (RMI) Web Services, for a uniform higher-level communications standard FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013 57

Technology Trends (1) Initially, distributed applications used sockets to connect two processes on different hosts; the programmer had to construct everything else. RPC was intended to remove much of the burden to the programmer by providing a mechanism for remote calling of procedures, hiding complexity. CORBA / OLE/DCOM extended the model from procedures to complete objects, including data and procedural components. Java RMI conceptually like CORBA / DCOM but platform specific. Web Services modelled on RPC approach. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Technology Trends (2) Language Dependencies: the “preferred” language of the period has influenced the design of distributed computing models. Sockets and RPC designed around C language. CORBA / OLE / DCOM designed around C++ language. Java RMI designed around Java and W3 Web Services designed around XML, W3 and Java. Level of Abstraction: details of network progressively hidden away from programmers as technology has evolved. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

GOING BACK Simple Architectures (1) Computer architectures consisting of interconnected multiple processors are basically of two types – Tightly coupled systems Single system wide primary memory (address space) shared by all the processors FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Simple Architectures (2) Loosely coupled systems Processors do not share memory Each processor has its own local memory FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Evolution of Distributed Systems Reasons for spread of distributed systems 1. Powerful micro-processor: • 8-bit, 16-bit, 32-bit, 64-bit • x86 family, 68k family, CRAY, SPARC, dual core, multi-core • Clock rate: up to 4Ghz 2. Computer network: Local Area Network (LAN), Wide Area Network (WAN), MAN, Wireless Network type: Ethernet, Token-bus, Token-ring, Fiber Distributed Data Interface (FDDI), Asynchronous Transfer Mode (ATM), Fast-Ethernet, Gigabit Ethernet, Fiber Channel, Wavelength-Division Multiplex (WDM) Transfer rate: 64 kbps up to 1Tbps FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models Various models are used for building distributed computing systems. These models can be broadly classified into five categories- Minicomputer model Workstation model Workstation-server model Processor-pool model Hybrid model FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models 1. Minicomputer model: simple extension of centralized time-sharing systems few minicomputers interconnected by communication network each minicomputer has multiple users simultaneously logged on to it this model may be used when when resource sharing with remote users is desired Example: the early ARPAnet FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models 2. Workstation model: several workstations interconnected by communication network basic idea is to share processor power user logs onto home workstation and submits jobs for execution, system might transfer one or more processed to other workstations issues must be resolved – how to find an idle workstation how to transfer what happens to a remote process Examples- Sprite system, Xerox PARC FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models 3. Workstation-server model It consists of a few minicomputers and several workstations (diskless or diskful) Minicomputers are used for providing services For higher reliability and better scalability multiple servers may be used for a purpose. Compare this model with workstation model . Example- The V-System FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models 4. Processor-pool model Base observation – sometimes user need NO computing power, but once in a while he needs very large amount of computing power for a short period of time Run server manages and allocates the processors to different users No concept of a home machine, i.e., a user does not log onto a particular machine but to the system as a whole. Offers better utilization of processing power compared to other models. Example: Amoeba, Plan9, Cambridge DCS. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distributed computing system models 5. Hybrid Model To combine the advantages of both workstation-server model and processor-pool model a hybrid model may be used It is based on the workstation-server model but with addition of a pool of processors Expensive!! FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Distribution Model There are several distribution models for accessing distributed resources and executing distributed applications as follows. File Model - Resources are modeled as files. Remote resources are accessible simply by accessing files. Remote Procedure Call Model - Resource accesses are modeled as function calls. Remote resources can be accessed by calling functions. Distributed Object Model - Resources are modeled as objects which are a set of data and functions to be performed on the data. Remote resources are accessible simply by accessing an object. FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013

Summary What is a Distributed System? Models of Distributed System? “A distributed system is a collection of independent computers that appears to its users as a single coherent system.” Models of Distributed System? Minicomputer model Workstation model Workstation-server model Processor-pool model Hybrid model What are the Strengths and Weaknesses of a Distributed System? Strengths: Reliability, Incremental growth, Resource sharing Weaknesses: Programming, Reliance on network, Security Important characteristics of a Distributed System? Possible Heterogeneity, Openness, Transparency Performance Metrics? FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 2 - 2013