Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Systems Technologies

Similar presentations


Presentation on theme: "Distributed Systems Technologies"— Presentation transcript:

1 Distributed Systems Technologies
CM0356/CM0456 Andrew Harrison

2 Who am I? Studied Fine Art many moons ago.
Worked as an artist and theatre designer for 15 years. Then got into the Web… Did the MSc here in and a PhD after that Currently work as an RA My research is in the areas this course covers - P2P, the Web, Web Services and Grid computing currently working in Health Informatics Domain

3 Structure Link on Blackboard (learning central) to this site.
Course is 21 lectures. Omer gives 3, I give the rest. Omer will do next week The assessment is 100% exam for CM0356 The assessment is 80% exam and 20% coursework for CM0456 CM0456 will receive the coursework in week 2 (after second lecture). Hand-in is beginning of week 10. The exam is do-able from the content of the lectures A more detailed understanding can be gained from the recommended reading: “From P2P and Grids to Services on the Web” (see Web site) labs – weeks 5 – 11 on a Thursday and Friday Main course site is: Link on Blackboard (learning central) to this site.

4 Lecture Overview Week 1 Week 2 Week 3 Week 4 Lecture 1: Introduction
Lecture 2: Peer-to-Peer Systems Lecture 3: Gnutella Week 2 Lecture 1: Fault Tolerance I (Omer Rana) Lecture 2: Trust and Reputation (Omer Rana) Lecture 3: Trust and Reputation (Omer Rana) Week 3 Lecture 1: Scalability Lecture 2: Security in Distributed Systems Lecture 3: Fault Tolerance II Week 4 Lecture 1: RMI and Jini Lecture 2: BitTorrent Lecture 3: Freenet

5 Lecture Overview Week 5 Week 6 Week 7 Week 8 Lecture 1: The Web
Lecture 2: The Programmable Web I Lecture 3: The Programmable Web II Week 6 Lecture 1: Web Services I Lecture 2: Web Services II Lecture 3: Grid Computing Week 7 Lecture 1: Cloud Lecture 2: CAN Lecture 3: Jxta Week 8 Lecture 1: FREE Lecture 2: FREE Lecture 3: FREE

6 Lecture Overview Week 9 Lecture 1: FREE Lecture 2: FREE Lecture 3: FREE Week 10 Lecture 1: Revision Lecture 2: Revision Week 11: This may change slightly over time because of lecturers’ availability. The online version on the course website will be accurate.

7 Aims The aim of the course is to provide a comparative understanding of a wide variety of distributed systems Some of the systems are middleware generic tooling to support distributed comms E.g. XML and Web Service technologies, Jini Some of them are applications Provide particular functionalities for users E.g. Bittorrent and Freenet In general, the exam will test you on this comparative understanding, rather than being able to recite every last detail of a particular technology However, understanding the technologies is the first step to having a comparative understanding :-)

8 Why does Distributed Computing Matter?

9 Why does Distributed Computing Matter?
Clay Shirkey’s 4 rungs of group interaction Here comes Everybody (2008) Links in a fully connected mesh grow exponentially nodes * (nodes – 1) / 2 Internet technology enables ridiculously easy group forming e.g. Reply All button Describes an evolution from Simple/selfish sharing example: delicious conversation example: high dynamic range photography on Flickr collaboration anime – writing software to subtitle Japanese anime for Western consumption collective action flash mobs in Belarus – now Tunisia and Egypt Each rung requires greater co-ordination amongst members

10 A Distributed System A distributed system is a collection of independent computers that appears to its users as a single coherent system. Heterogeneous computers – vendors/OS should be able to interoperate Should mask the heterogeneity from users (applications) Should be easy to expand and scale Should be permanently available (even though parts of it are not) Communication is based on messages Machine A Machine B Machine C Distributed Applications Middleware Services OS, e.g. Windows Mac OS X Linux Network The role of Middleware in a distributed system: it hides the underlying infrastructure away from the application and user level. Point 2: from users Point 3: we deal with this in lecture 4 Point 4: must employ redundancy – duplicate functionality We are concentrating in this part of the course on the two upper levels: middleware and applications. Middleware hides the complexity of the underlying communications ect from the application/user In this lecture we build up a taxonomy for these systems based on some criteria. Buts let first talk about the idea of centralization/decentralization. messages

11 Layering The distributed applications and middleware we will consider occupy the Application Layer of the OSI model Some interact more or less closely with underlying layers Typically these systems have various layers within them as well.

12 Layering For example a Web Service: A Peer to Peer application may:
Serializes programming objects to XML, perhaps using W3C XML Schema specification Wraps this data in a SOAP envelope. Sticks the SOAP envelope into an HTTP POST message payload. Then HTTP uses what it uses: TCP/IP, DNS A Peer to Peer application may: Define custom messages for file sharing Use UPnP to negotiate NATs Tunnel custom messages through another protocol (e.g. HTTP) In the end, systems will do whatever they need to get their job done and don’t care much for nicely defined layers.

13 Messages These define the protocol of the distributed application or middleware. Provide the insulation from heterogeneous OS’s Are capable of traversing the network. This involves some form of serialization i.e. the message can be converted into a stream of bytes and sent over the wire Often an in-memory representation of an Object is serialized at the sending side, and de-serialized at the receiving side to create an in-memory representation. Strings, XML, Java serialization, bencoding…

14 Messages All messages are one-way
Protocols may build on top of this to create two way message exchange patterns E.g. HTTP is request/response based. RPC emulates a local computing paradigm with the concept of a remote procedure with return values Messages are typically independent of the transport protocol they use (TCP, UDP) But some rely/presume certain transport protocols E.g. bittorrent and HTTP keep the TCP connection open to reduce overhead. Would not work with UDP.

15 Distributed vs Local Systems
Distributed systems are inherently different from non-distributed systems. Latency - network speed Memory access - not shared Partial Failure remote failure does not mean local failure no global coordination (like an OS) Guaranteed Concurrency combined with latency, events are not necessarily received in the same order as they are generated Indeterminacy Your system is not in control of the whole system With partial failure, a system may just disappear with no indication of status. was it the remote machine, remote user or a network link?

16 Some Terminology Generic term - any computer on a distributed network
Device A Provider of data Node Resource A Consumer of data Resource: any hardware or software resource shared on a distributed network e.g. a file storage system, RAM, CPU, a file, a service or a communication channel Peer Client Server Both a Provider and Consumer of data Service A server provides a “Service”. Services are analogous to function calls. They receive a request (i.e. the arguments to a function call) and return a response (i.e. the returned function data or data updated within a pass by reference argument). A service therefore can be considered the network counterpart to a local function call. Examples of services are web servers, web services, Jini services, JXTA Peers, Enterprise Servers, OGSA services, Corba agents etc. “A network-enabled entity that provides some capability”

17 Taxonomy for Distributed Systems
Taxonomy is based on following factors and their relation to centralization: 1. Resource Discovery: Mechanism for discovering resources on a distributed system? Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network? See example... Availability: One important consideration to make when we talk about centralization of resources is that of scalability. When we say a resource is centralized we do not imply that there is only one server serving the information, rather, we mean that there is either a single server or a fixed-number of servers that provide the information that do not scale directly with the size of the network. e.g. DNS is centralized because there are only normally a few points of access (i.e. 3) to local DNS servers. If these servers fails then DNS is unavailable. However, DNS implemented a massively parallel look-up table behind this central access. Further web sites e.g. google are centralized in that there is only one machine with Google’s IP address. However, google uses a parallel cluster of 10,000 machine to implement its search capability.

18 Mp3.com, Napster and Gnutella
User Mp3.com MP3.com Scenario User Napster.com Napster Scenario Gnutella Scenario

19 Taxonomy for Distributed Systems
Taxonomy is based on following factors and their relation to centralization: 1. Resource Discovery: Mechanism for discovering resources on a distributed system? Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network? 3. Resource Communication: Two types: Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection between the sender and the receiver. Availability: One important consideration to make when we talk about centralization of resources is that of scalability. When we say a resource is centralized we do not imply that there is only one server serving the information, rather, we mean that there is either a single server or a fixed-number of servers that provide the information that do not scale directly with the size of the network. e.g. DNS is centralized because there are only normally a few points of access (i.e. 3) to local DNS servers. If these servers fails then DNS is unavailable. However, DNS implemented a massively parallel look-up table behind this central access. Further web sites e.g. google are centralized in that there is only one machine with Google’s IP address. However, google uses a parallel cluster of 10,000 machine to implement its search capability.

20 Centralization of Point-to-Point Connections
True Peer to Peer e.g. Gnutella Web Server Equal Peers, communication is supposed to be even i.e. each provider is also a consumer of information and each node has an equal number of connections This is not always the case … as we will learn in lecture 4 Many to one relationship between users and the web server and therefore this can be considered centralized communication

21 Taxonomy for Distributed Systems
Taxonomy is based on following factors: 1. Resource Discovery: Mechanism for discovering resources on a distributed system? Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? 3. Resource Communication: Two types: Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection (although connection maybe multi-hop) between the sender and the receiver. Availability: One important consideration to make when we talk about centralization of resources is that of scalability. When we say a resource is centralized we do not imply that there is only one server serving the information, rather, we mean that there is either a single server or a fixed-number of servers that provide the information that do not scale directly with the size of the network. e.g. DNS is centralized because there are only normally a few points of access (i.e. 3) to local DNS servers. If these servers fails then DNS is unavailable. However, DNS implemented a massively parallel look-up table behind this central access. Further web sites e.g. google are centralized in that there is only one machine with Google’s IP address. However, google uses a parallel cluster of 10,000 machine to implement its search capability. Centralized Decentralized Hybrid Centralized systems -typically, client/server based systems Decentralized systems - Peer to Peer (P2P) Hybrid – combinations of the 2 extremes e.g. brokered architecture

22 A Web Server: Centralized
Resource Availability Discovery Communication Centralized Decentralized Web Server - Clients (i.e. users) use their web browser to navigate web pages on one or more web sites. - Web site is static to particular domain Discovery: Centralized, DNS Availability: available or not Communication: centralized to the particular web server Communication is a many to one

23 The Web as a whole Discovery - ad hoc
Often highly centralized, e.g. Google, but is also highly decentralized - the Web of links, e.g. the blogosphere and out of bounds Availability - depends on the granularity of the request There is a level of replication on the Web and caching can be used to duplicate availability Communication - centralized Communication happens via a centralized entity, e.g. Facebook, MySpace, Flickr, blogs, etc

24 Napster: Brokered User Resource Discovery Resource Availability Resource Communication Napster Centralized Napster.com Decentralized Communication is a many to one Clients search through Napster web site (well, they used to….) Discovery: Centralized through web site Availability: Once discovered via web site, availability is decentralized. Communication: decentralized between peers (MP3 sharers)

25 Gnutella: Decentralized
Resource Availability Discovery Communication Centralized Decentralized Gnutella Discovery: Decentralized through Gnutella messages (ping/pong mechanisms) Availability: Often an alternate path to resource Communication: point to point: decentralized between peers Communication is a many to one

26 SETI@HOME (Client/Server)
P2P Launched In 1996 Scientific experiment - uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI) Distributes a screen saver–based application to users Applies signal analysis algorithms to different data sets to process radio-telescope data. Has more than 3 million users - used over a million years of CPU time to date 1. Install Screen Saver 3. SETI client gets data from server and runs Main Server 4. Client sends results back to server Radio-telescope Data 2. SETI client (screen Saver) starts

27 SETI@home Resource Discovery Resource Availability Resource
Communication Centralized Decentralized Search for Extraterrestrial – volunteer computing system generalized to BOINC API

28 Web Services Discovery: Centralized through registry Resource
Availability Discovery Communication Centralized Decentralized Discovery: Centralized through registry Availability: Once discovered via registry, availability is decentralized. Communication: decentralized between provider and consumer Note: This is for the current Web Services, technology stack - in principal you can host web services in a number of ways

29 Concluding Remarks Taxonomy Relevance Criteria
Resource Discovery Resource Availability Resource Communication Centralized Hybrid Decentralized Relevance Course relates distributed systems to this taxonomy


Download ppt "Distributed Systems Technologies"

Similar presentations


Ads by Google