CS6703 GRID AND CLOUD COMPUTING Unit 1

CS6703 GRID AND CLOUD COMPUTING Unit 1

UNIT I INTRODUCTION Evolution of Distributed computing: Scalable computing over the Internet – Technologies for network based systems – clusters of cooperative computers - Grid computing Infrastructures – cloud computing - service oriented architecture – Introduction to Grid Architecture and standards – Elements of Grid – Overview of Grid Architecture. 2/25/2019

Distributed Computing
Definition “A distributed system consists of multiple autonomous computers that communicate through a computer network. “Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer.” “Distributed computing is any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing.” 2/25/2019

Introduction A distributed system is one in which hardware or software components located at networked computers communicate and coordinate their actions only by message passing. In the term distributed computing, the word distributed means spread out across space. Thus, distributed computing is an activity performed on a spatially distributed system. These networked computers may be in the same room, same campus, same country, or in different continents 2/25/2019

Introduction Internet Large-scale Application Resource Management
Cooperation Internet Large-scale Application Resource Management Subscription Distribution Agent Job Request A distributed system consists of collection of autonomous computers, connected through a network and distributed operating system software, which enables computers to coordinate their activities and to share the resources of the system - hardware, software and data, so that users perceive the system as a single, integrated computing facility. A distributed system is a collection of independent computers that appears to its users as a single coherent system. This definition has several important aspects. The first one is that a distributed system consists of components (i.e., computers) that are autonomous. A second aspect is that users (be they people or programs) think they are dealing with a single system. This means that one way or the other the autonomous components need to collaborate. How to establish this collaboration lies at the heart of developing distributed systems. Note that no assumptions are made concerning the type of computers. In principle, even within a single system, they could range from high-performance mainframe computers to small nodes in sensor networks. Likewise, no assumptions are made on the way that computers are interconnected. 2/25/2019

Motivation Inherently distributed applications Performance/cost
Resource sharing Flexibility and extensibility Availability and fault tolerance Scalability Network connectivity is increasing. Combination of cheap processors often more cost-effective than one expensive fast system. Potential increase of reliability. The main motivations in moving to a distributed system are the following: Inherently distributed applications. Distributed systems have come into existence in some very natural ways, e.g., in our society people are distributed and information should also be distributed. Distributed database system information is generated at different branch offices (subdatabases), so that a local access can be done quickly. The system also provides a global view to support various global operations. Performance/cost. The parallelism of distributed systems reduces processing bottlenecks and provides improved all-around performance, i.e., distributed systems offer a better price/performance ratio. Resource sharing. Distributed systems can efficiently upport information and resource (hardware and software) sharing for users at different locations. Flexibility and extensibility. Distributed systems are capable of incremental growth and have the added advantage of facilitating modification or extension of a system to adapt to a changing environment without disrupting its operations. Availability and fault tolerance. With the multiplicity of storage units and processing elements, distributed systems have the potential ability to continue operation in the presence of failures in the system. Scalability. Distributed systems can be easily scaled to include additional resources (both hardware and software). 2/25/2019

History 1975 – 1985 Parallel computing was favored in the early years
Primarily vector-based at first Gradually more thread-based parallelism was introduced The first distributed computing programs were a pair of programs called Creeper and Reaper invented in 1970s Ethernet that was invented in 1970s. ARPANET was invented in the early 1970s and probably the earliest example of a large-scale distributed application. The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[19] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[20] ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET was invented in the early 1970s. became the most successful application of ARPANET,[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET and its successor Internet, other early worldwide computer networks included Usenet and FidoNet from 1980s, both of which were used to support distributed discussion systems. The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985. The first distributed computing programs were a pair of programs called Creeper and Reaper which made their way through the nodes of the ARPANET in the 1970s, the predecessor of the Internet. The Creeper came first and was a worm program, using the idle CPU cycles of processors in the ARPANET to copy itself onto the next system and then delete itself from the previous one. It was modified to remain on all previous computers and the Reaper was created which traveled through the same network and deleted all copies of the Creeper. In this way Creeper and Reaper were the first infectious computer programs and are actually often thought of as the first network viruses. They did no damage, however, to the computers they passed through and were instrumental in exploring the possibility of making use of idle computational power. 2/25/2019

History Massively parallel architectures start rising and message passing interface and other libraries developed Bandwidth was a big problem The first Internet-based distributed computing project was started in 1988 by the DEC System Research Center. Distributed.net was a project founded in considered the first to use the internet to distribute data for calculation and collect the results, The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[19] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[20] ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET was invented in the early 1970s. became the most successful application of ARPANET,[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET and its successor Internet, other early worldwide computer networks included Usenet and FidoNet from 1980s, both of which were used to support distributed discussion systems. The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985. The first distributed computing programs were a pair of programs called Creeper and Reaper which made their way through the nodes of the ARPANET in the 1970s, the predecessor of the Internet. The Creeper came first and was a worm program, using the idle CPU cycles of processors in the ARPANET to copy itself onto the next system and then delete itself from the previous one. It was modified to remain on all previous computers and the Reaper was created which traveled through the same network and deleted all copies of the Creeper. In this way Creeper and Reaper were the first infectious computer programs and are actually often thought of as the first network viruses. They did no damage, however, to the computers they passed through and were instrumental in exploring the possibility of making use of idle computational power. The first Internet-based distributed computing project was started in 1988 by the DEC System Research Center. The project sent tasks to volunteers through , who would run these programs during idle time and then send the results back to DEC and get a new task. The project worked to factor large numbers and by 1990 had about 100 users. The most prominent group, considered the first to actually use the internet to distribute data for calculation and collect the results, was a project founded in 1997 called distributed.net. They used independently owned computers as DEC had, but allowed the users to download the program that would utilize their idle CPU time instead of ing it to them. Distributed.net completed several cryptology challenges by RSA Labs as well as other research facilities with the help of thousands of users. 2/25/2019

History 1995 – Today Cluster/grid architecture increasingly dominant
Special node machines eschewed in favor of COTS technologies Web-wide cluster software Google take this to the extreme (thousands of nodes/cluster) started in May analyze the radio signals that were being collected by the Arecibo Radio Telescope in Puerto Rico. Commercial, off-the-shelf (COTS) is a term for software or hardware, generally technology or computer products, that are ready-made and available for sale, lease, or license to the general public. They are often used as alternatives to in-house developments or one-off government-funded developments. The use of COTS is being mandated across many government and business programs, as they may offer significant savings in procurement and maintenance. However, since COTS software specifications are written by external sources, government agencies are sometimes wary of these products because they fear that future changes to the product will not be under their control. The project that truly popularized distributed computing and showed that it could work was an effort by the Search for Extraterrestrial Intelligence (SETI) at the University of California at Berkeley. The project was started in May 1999 to analyze the radio signals that were being collected by the Arecibo Radio Telescope in Puerto Rico. It has gained over three million independent users who volunteer their idle computers to search for signals that may not have originated from Earth. This project has really brought the field to light, so that other groups and companies are quickly following their lead. (See Current Projects) 2/25/2019

Goal Making Resources Accessible Distribution Transparency
Data sharing and device sharing Distribution Transparency Access, location, migration, relocation, replication, concurrency, failure Communication Make human-to-human comm. easier. E.g.. : electronic mail Flexibility Spread the work load over the available machines in the most cost effective way To coordinate the use of shared resources To solve large computational problem solving a large computational problem. Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users From Tanenbaum book: Making Resources Accessible The main goal of a distributed system is to make it easy for the users (and applications) to access remote resources, and to share them in a controlled and efficient way. Resources can be just about anything, but typical examples include things like printers, computers, storage facilities, data, files, Web pages, and networks, to name just a few. There are many reasons for wanting to share resources. One obvious reason is that of economics. For example, it is cheaper to let a printer be shared by several users in a smaJl office than having to buy and maintain a separate printer for each user. Likewise, it makes economic sense to share costly resources such as supercomputers, high- performance storage systems, imagesetters, and other expensive peripherals. Distribution Transparency An important goal of a distributed system is to hide the fact that its processes and resources are physically distributed across multiple computers. A distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent. Openness Another important goal of distributed systems is openness. An open distributed system is a system that offers services according to standard rules that describe the syntax and semantics of those services. For example, in computer networks, standard rules govern the format, contents, and meaning of messages sent and received. Such rules are formalized in protocols. In distributed systems, services are generally specified through interfaces, which are often described in an Interface Definition Language (IDL). Interface definitions written in an IDL nearly always capture only the syntax of services. In other words, they specify precisely the names of the functions that are available together with types of the parameters, return values, possible exceptions that can be raised, and so on. The hard part is specifying precisely what those services do, that is, the semantics of interfaces. In practice, such specifications are always given in an informal way by means of natural language. Scalability Worldwide connectivity through the Internet is rapidly becoming as common as being able to send a postcard to anyone anywhere around the world. With this in mind, scalability is one of the most important design goals for developers of distributed systems. Scalability of a system can be measured along at least three different dimensions (Neuman, 1994). First, a system can be scalable with respect to its size, meaning that we can easily add more users and resources to the system. Second, a geographically scalable system is one in which the users and resources may lie far apart. Third, a system can be administratively scalable,/~~aning that it can still be easy to manage even if it spans many independent administrative organizations. Unfortunately, a system that is scalable in one or more of these dimensions often exhibits some loss of performance as the system scales up. 2/25/2019

Characteristics Resource Sharing Openness Concurrency Scalability
Fault Tolerance Transparency Resource Sharing:- Resource sharing is the ability to use any hardware, software or data anywhere in the system. Resources in a distributed system, unlike the centralized one, are physically encapsulated within one of the computers and can only be accessed from others by communication. It is the resource manager to offers a communication interface enabling the resource be accessed, manipulated and updated reliability and consistently. There are mainly two kinds of model resource managers: client/server model and the object-based model. Object Management Group uses the latter one in CORBA, in which any resource is treated as an object that encapsulates the resource by means of operations that users can invoke. Openness:- Openness is concerned with extensions and improvements of distributed systems. New components have to be integrated with existing components so that the added functionality becomes accessible from the distributed system as a whole. Hence, the static and dynamic properties of services provided by components have to be published in detailed interfaces. Concurrency:- Concurrency arises naturally in distributed systems from the separate activities of users, the independence of resources and the location of server processes in separate computers. Components in distributed systems are executed in concurrent processes. These processes may access the same resource concurrently. Thus the server process must coordinate their actions to ensure system integrity and data integrity. Scalability:- Scalability concerns the ease of the increasing the scale of the system (e.g. the number of processor) so as to accommodate more users and/or to improve the corresponding responsiveness of the system. Ideally, components should not need to be changed when the scale of a system increases. Fault tolerance:- Fault tolerance cares the reliability of the system so that in case of failure of hardware, software or network, the system continues to operate properly, without significantly degrading the performance of the system. It may be achieved by recovery (software) and redundancy (both software and hardware). Transparency:- Transparency hides the complexity of the distributed systems to the users and application programmers. They can perceive it as a whole rather than a collection of cooperating components in order to reduce the difficulties in design and in operation. This characteristic is orthogonal to the others. There are many aspects of transparency, including access transparency, location transparency, concurrency transparency, replication transparency, failure transparency, migration transparency, performance transparency and scaling transparency. 2/25/2019

Architecture Client-server 3-tier architecture N-tier architecture
loose coupling, or tight coupling Peer-to-peer Space based Client-server:- Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change. 3-tier architecture :- Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier. N-tier architecture:- N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers. Tightly coupled (clustered):- Tightly coupled architecture refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result. Peer-to-peer:- Peer-to-peer is an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers. Space based :- Space based refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved. Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database. 2/25/2019

Application Examples of commercial application :
Database Management System Distributed computing using mobile agents Local intranet Internet (World Wide Web) JAVA Remote Method Invocation (RMI) examples of commercial application of distributed system, such as the Database Management System, distributed computing using mobile agents, local intranet, internet (World Wide Web), JAVA RMI 2/25/2019

Distributed Computing Using Mobile Agents
Mobile agents can be wandering around in a network using free resources for their own computations. 2/25/2019

Local Intranet A portion of Internet that is separately administered & supports internal sharing of resources (file/storage systems and printers) is called local intranet. 2/25/2019

Internet The Internet is a global system of interconnected computer networks that use the standardized Internet Protocol Suite (TCP/IP). 2/25/2019

JAVA RMI Embedded in language Java:-
Object variant of remote procedure call Adds naming compared with RPC (Remote Procedure Call) Restricted to Java environments Java Remote Method Invocation (RMI), which is a simple and powerful network object transport mechanism, provides a way for a Java program on one machine to communicate with objects residing in different address spaces. Some Java parallel computing environments use RMI for communication, such as JavaParty, discussed in next section. It is also the foundation of Jini technology--discussed on section RMI is an implementation of the distributed object programming model, comparable with CORBA, but simpler, and specialized to the Java language. An overview of the RMI architecture is shown in Figure 2.1. Goals A primary goal for the RMI designers was to allow programmers to develop distributed Java programs with the same syntax and semantics used for non-distributed programs. To do this, they had to carefully map how Java classes and objects work in a single Java Virtual Machine1 (JVM) to a new model of how classes and objects would work in a distributed (multiple JVM) computing environment. Java RMI Architecture The design goal for the RMI architecture was to create a Java distributed object model that integrates naturally into the Java programming language and the local object model. RMI architects have succeeded; creating a system that extends the safety and robustness of the Java architecture to the distributed computing world. Important parts of the RMI architecture are the stub class, the object serialization, and the server-side Run-time System. The stub class implements the remote interface and is responsible for marshaling and unmarshaling the data and managing the network connection to a server. An instance of the stub class is needed on each client. Local method invocations on the stub class will be made whenever a client invokes a method on a remote object. Java has a general mechanism for converting objects into streams of bytes that can later be read back into an arbitrary JVM. This mechanism, called object serialization, is an essential functionality needed by Java's RMI implementation. It provides a standardized way to encode all the information into a byte stream suitable for streaming to some type of network or to a file-system. In order to provide the functionality, an object must implement the Serializable interface. The server-side run-time system is responsible for listening for invocation requests on a suitable IP port, and dispatching them to the proper, remote object on the server. Since RMI is designed for Web based client-server applications over slow network, it is not clear it is suitable for high performance distributed computing environments with low latency and high bandwidth. A better serialization would be needed, since Java's current object serialization often takes at least 25% and up to 50% of the time [50] needed for a remote invocation. Stubs: Stub is a piece of code emulating a called function , It is a temporary called program.It functions similarly like sub modules when called by the main module. A piece of code that simulates the activity of missing component. Stubs are simulations of the sub-code that otherwise is very difficult to execute in the test code. Stub is a simple routine that takes the place of the real routine. Skeletons: The role of the stubs is to marshal and unmarshal the messages that are sent and received on the client or the server side . RMI Architecture 2/25/2019

Categories of Applications in distributed computing
Science Life Sciences Cryptography Internet Financial Mathematics Language Art Puzzles/Games Miscellaneous Distributed Human Project Collaborative Knowledge Bases Charity 2/25/2019

Advantages Economics:- Speed:- Inherent distribution of applications:-
Computers harnessed together give a better price/performance ratio than mainframes. Speed:- A distributed system may have more total computing power than a mainframe. Inherent distribution of applications:- Some applications are inherently distributed. E.g., an ATM-banking application. Reliability:- If one machine crashes, the system as a whole can still survive if you have multiple server machines and multiple storage devices (redundancy). Extensibility and Incremental Growth:- Possible to gradually scale up (in terms of processing power and functionality) by adding more sources (both hardware and software). This can be done without disruption to the rest of the system. Economics:- Computers harnessed together give a better price/performance ratio than mainframes. Speed:- A distributed system may have more total computing power than a mainframe. Inherent distribution of applications:- Some applications are inherently distributed. E.g., an ATM-banking application. Reliability:- If one machine crashes, the system as a whole can still survive if you have multiple server machines and multiple storage devices (redundancy). Extensibility and Incremental Growth:- Possible to gradually scale up (in terms of processing power and functionality) by adding more sources (both hardware and software). This can be done without disruption to the rest of the system. Distributed custodianship:- The National Spatial Data Infrastructure (NSDI) calls for a system of partnerships to produce a future national framework for data as a patchwork quilt of information collected at different scales and produced and maintained by different governments and agencies. NSDI will require novel arrangements for framework management, area integration, and data distribution. This research will examine the basic feasibility and likely effects of such distributed custodianship in the context of distributed computing architectures, and will determine the institutional structures that must evolve to support such custodianship. Data integration:- This research will contribute to the integration of geographic information and GISs into the mainstream of future libraries, which are likely to have full digital capacity. The digital libraries of the future will offer services for manipulating and processing data as well as for simple searches and retrieval. Missed opportunities:- By anticipating the impact that a rapidly advancing technology will have on GISs, this research will allow the GIS community to take better advantage of the opportunities that the technology offers. 2/25/2019

Disadvantages Complexity :- Network problem:- Security:-
Lack of experience in designing, and implementing a distributed system. E.g. which platform (hardware and OS) to use, which language to use etc. Network problem:- If the network underlying a distributed system saturates or goes down, then the distributed system will be effectively disabled thus negating most of the advantages of the distributed system. Security:- Security is a major hazard since easy access to data means easy access to secret data as well. 2/25/2019

Issues and Challenges Heterogeneity of components :- Openness:-
variety or differences that apply to computer hardware, network, OS, programming language and implementations by different developers. All differences in representation must be deal with if to do message exchange. Example : different call for exchange message in UNIX different from Windows. Openness:- System can be extended and re-implemented in various ways. Cannot be achieved unless the specification and documentation are made available to software developer. The most challenge to designer is to tackle the complexity of distributed system; design by different people. Heterogeneity They must be constructed from a variety of diff. networks, OS, computer hardware and programming language. The Internet comm. Protocols mask the difference in networks, and middleware can deal with other differences. Openness Dist. Systems should be extensible – the 1st step is to publish the interfaces of the components, but the integration of components written by diff. programmers is a real challenge. 2/25/2019

Issues and Challenges cont…
Transparency:- Aim : make certain aspects of distribution are invisible to the application programmer ; focus on design of their particular application. They not concern the locations and details of how it operate, either replicated or migrated. Failures can be presented to application programmers in the form of exceptions – must be handled. 2/25/2019

Transparency:- This concept can be summarize as shown in this Figure: Location Transparency Refers to the fact that in a true dist. System, users cannot tell where hardware and software resources such as CPUs, printers, files and databases are located. Migration Transparency Resources must be free to move from one location to another without having their names change. Replication Transparency OS is free to make additional copies of files and other resources on its own without the users noticing. Eg.: The servers can decide by themselves to replicate any file on any or all servers, without the users having to know about it. Concurrency Transparency The users will not notice the existence of other users. Parallelism Transparency can be regarded as the holy grail for dist. Systems designers. 2/25/2019

Security:- Security for information resources in distributed system have 3 components : a. Confidentiality : protection against disclosure to unauthorized individuals. b. Integrity : protection against alteration/corruption c. Availability : protection against interference with the means to access the resources. The challenge is to send sensitive information over Internet in a secure manner and to identify a remote user or other agent correctly. Encryption can be used to provide adequate protection of shared resources and to keep sensitive information secret when is transmitted in message over a network. Denial of service attacks are still a problem. 2/25/2019

Issues and Challenges cont..
Scalability :- Distributed computing operates at many different scales, ranging from small Intranet to Internet. A system is scalable if there is significant increase in the number of resources and users. The challenges is : a. controlling the cost of physical resources. b. controlling the performance loss. c. preventing software resource running out. d. avoiding performance bottlenecks. Dist. Computing is scalable if the cost of adding a user is a constant amount in terms of the resources that must be added. The algorithms used to access shared data should avoid performance bottlenecks and data should be structured hierarchically to get the best access times. 2/25/2019

Failure Handling :- Failures in a distributed system are partial – some components fail while others can function. That’s why handling the failures are difficult a. Detecting failures : to manage the presence of failures cannot be detected but may be suspected. b. Masking failures : hiding failure not guaranteed in the worst case. Concurrency :- Where applications/services process concurrency, it will effect a conflict in operations with one another and produce inconsistence results. Each resource must be designed to be safe in a concurrent environment. Failure Handling Any process, computer or network may fail independently of the others. Therefore each component needs to be aware of the possible ways in which the components its depends on may fail and be designed to deal with each of those failure appropriately. 2/25/2019

Conclusion The concept of distributed computing is the most efficient way to achieve the optimization. Distributed computing is anywhere : intranet, Internet or mobile ubiquitous computing (laptop, PDAs, pagers, smart watches, hi-fi systems) It deals with hardware and software systems, that contain more than one processing / storage and run in concurrently. Main motivation factor is resource sharing; such as files , printers, web pages or database records. Grid computing and cloud computing are form of distributed computing. In this age of optimization everybody is trying to get optimized output from their limited resources. The concept of distributed computing is the most efficient way to achieve the optimization. In case of distributed computing the actual task is modularized and is distributed among various computer system. It not only increases the efficiency of the task but also reduce the total time required to complete the task. Now the advance concept of this distributed computing, that is the distributed computing through mobile agents is setting a new landmark in this technology. A mobile agent is a process that can transport its state from one environment to another, with its data intact, and be capable of performing appropriately in the new environment. 2/25/2019

Grid Computing Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks. Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources. Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals 2/25/2019

Coordinates resources that are not subject to centralized control.
Criteria for a Grid: Coordinates resources that are not subject to centralized control. Uses standard, open, general-purpose protocols and interfaces. Delivers nontrivial qualities of service. Benefits Exploit Underutilized resources Resource load Balancing Virtualize resources across an enterprise Data Grids, Compute Grids Enable collaboration for virtual organizations 2/25/2019

Grid Applications Data and computationally intensive applications:
This technology has been applied to computationally-intensive scientific, mathematical, and academic problems like drug discovery, economic forecasting, seismic analysis back office data processing in support of e-commerce A chemist may utilize hundreds of processors to screen thousands of compounds per hour. Teams of engineers worldwide pool resources to analyze terabytes of structural data. Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands. Resource sharing Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy, negotiation, payment, … Coordinated problem solving distributed data analysis, computation, collaboration, … 2/25/2019

Grid Topologies • Intragrid – Local grid within an organization – Trust based on personal contracts • Extragrid – Resources of a consortium of organizations connected through a (Virtual) Private Network – Trust based on Business to Business contracts • Intergrid – Global sharing of resources through the internet – Trust based on certification 2/25/2019

Computational Grid “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high- end computational capabilities.” ”The Grid: Blueprint for a New Computing Infrastructure”, Kesselman & Foster Example : Science Grid (US Department of Energy) 2/25/2019

Data Grid A data grid is a grid computing system that deals with data — the controlled sharing and management of large amounts of distributed data. Data Grid is the storage component of a grid environment. Scientific and engineering applications require access to large amounts of data, and often this data is widely distributed. A data grid provides seamless access to the local or remote data required to complete compute intensive calculations. Example : Biomedical informatics Research Network (BIRN), the Southern California earthquake Center (SCEC). 2/25/2019

Methods of Grid Computing
Distributed Supercomputing High-Throughput Computing On-Demand Computing Data-Intensive Computing Collaborative Computing Logistical Networking 2/25/2019

Distributed Supercomputing
Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer. Tackle problems that cannot be solved on a single system. 2/25/2019 35

High-Throughput Computing
Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work. On-Demand Computing Uses grid capabilities to meet short-term requirements for resources that are not locally accessible. Models real-time computing demands. 2/25/2019 36

Collaborative Computing
Concerned primarily with enabling and enhancing human- to-human interactions. Applications are often structured in terms of a virtual shared space. Collaborative Computing Data-Intensive Computing The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases. Particularly useful for distributed data mining. 2/25/2019 37

Logistical Networking
Logistical networks focus on exposing storage resources inside networks by optimizing the global scheduling of data transport, and data storage. Contrasts with traditional networking, which does not explicitly model storage resources in the network. high-level services for Grid applications Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels. 2/25/2019 38

P2P Computing vs Grid Computing
Differ in Target Communities Grid system deals with more complex, more powerful, more diverse and highly interconnected set of resources than P2P. VO 2/25/2019

A typical view of Grid environment
User Resource Broker Grid Resources Grid Information Service A User sends computation or data intensive application to Global Grids in order to speed up the execution of the application. A Resource Broker distribute the jobs in an application to the Grid resources based on user’s QoS requirements and details of available Grid resources for further executions. Grid Resources (Cluster, PC, Supercomputer, database, instruments, etc.) in the Global Grid execute the user jobs. Grid Information Service system collects the details of the available Grid resources and passes the information to the resource broker. Computation result Grid application Computational jobs Details of Grid resources Processed jobs 1 2 4 3 2/25/2019

Grid Middleware Grids are typically managed by grid ware -
a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance) Software that connects other software components or applications to provide the following functions: Run applications on suitable available resources – Brokering, Scheduling Provide uniform, high-level access to resources – Semantic interfaces – Web Services, Service Oriented Architectures Address inter-domain issues of security, policy, etc. – Federated Identities Provide application-level status monitoring and control 2/25/2019

Middleware Globus –chicago Univ
Condor – Wisconsin Univ – High throughput computing Legion – Virginia Univ – virtual workspaces- collaborative computing IBP – Internet back pane – Tennesse Univ – logistical networking NetSolve – solving scientific problems in heterogeneous env – high throughput & data intensive 2/25/2019 42

Two Key Grid Computing Groups
The Globus Alliance ( Composed of people from: Argonne National Labs, University of Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others. OGSA/I standards initially proposed by the Globus Group The Global Grid Forum ( Heavy involvement of Academic Groups and Industry (e.g. IBM Grid Computing, HP, United Devices, Oracle, UK e- Science Programme, US DOE, US NSF, Indiana University, and many others) Process Meets three times annually Solicits involvement from industry, research groups, and academics 2/25/2019

Some of the Major Grid Projects
Name URL/Sponsor Focus EuroGrid, Grid Interoperability (GRIP) eurogrid.org European Union Create tech for remote access to super comp resources & simulation codes; in GRIP, integrate with Globus Toolkit™ Fusion Collaboratory fusiongrid.org DOE Off. Science Create a national computational collaboratory for fusion research Globus Project™ globus.org DARPA, DOE, NSF, NASA, Msoft Research on Grid technologies; development and support of Globus Toolkit™; application and deployment GridLab gridlab.org Grid technologies and applications GridPP gridpp.ac.uk U.K. eScience Create & apply an operational grid within the U.K. for particle physics research Grid Research Integration Dev. & Support Center grids-center.org NSF Integration, deployment, support of the NSF Middleware Infrastructure for research & education 2/25/2019

Grid Architecture 2/25/2019

The Hourglass Model Focus on architecture issues
Propose set of core services as basic infrastructure Used to construct high-level, domain- specific solutions (diverse) Design principles Keep participation cost low Enable local control Support for adaptation “IP hourglass” model A p p l i c a t i o n s Diverse global services Core services Local OS 2/25/2019

Layered Grid Architecture (By Analogy to Internet Architecture)
Application Internet Transport Application Link Internet Protocol Architecture Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Resource “Sharing single resources”: negotiating access, controlling use We define Grid architecture in terms of a layered collection of protocols. Fabric layer includes the protocols and interfaces that provide access to the resources that are being shared, including computers, storage systems, datasets, programs, and networks. This layer is a logical view rather then a physical view. For example, the view of a cluster with a local resource manager is defined by the local resource manger, and not the cluster hardware. Likewise, the fabric provided by a storage system is defined by the file system that is available on that system, not the raw disk or tapes. The connectivity layer defines core protocols required for Grid-specific network transactions. This layer includes the IP protocol stack (system level application protocols [e.g. DNS, RSVP, Routing], transport and internet layers), as well as core Grid security protocols for authentication and authorization. Resource layer defines protocols to initiate and control sharing of (local) resources. Services defined at this level are gatekeeper, GRIS, along with some user oriented application protocols from the Internet protocol suite, such as file-transfer. Collective layer defines protocols that provide system oriented capabilities that are expected to be wide scale in deployment and generic in function. This includes GIIS, bandwidth brokers, resource brokers,…. Application layer defines protocols and services that are parochial in nature, targeted towards a specific application domain or class of applications. These are are are … arrgh Connectivity “Talking to things”: communication (Internet protocols) & security Fabric “Controlling things locally”: Access to, & control of, resources 2/25/2019

Example: Data Grid Architecture
App Discipline-Specific Data Grid Application Collective (App) Coherency control, replica selection, task management, virtual data catalog, virtual data code catalog, … Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, Collective (Generic) Access to data, access to computers, access to network performance data, … Resource Communication, service discovery (DNS), authentication, authorization, delegation Connect Fabric Storage systems, clusters, networks, network caches, … 2/25/2019

Simulation tools GridSim – job scheduling
SimGrid – single client multiserver scheduling Bricks – scheduling GangSim- Ganglia VO OptoSim – Data Grid Simulations G3S – Grid Security services Simulator – security services 2/25/2019

Simulation tool GridSim is a Java-based toolkit for modeling, and simulation of distributed resource management and scheduling for conventional Grid environment. GridSim is based on SimJava, a general purpose discrete- event simulation package implemented in Java. All components in GridSim communicate with each other through message passing operations defined by SimJava. 2/25/2019

Salient features of the GridSim
It allows modeling of heterogeneous types of resources. Resources can be modeled operating under space- or time- shared mode. Resource capability can be defined (in the form of MIPS (Million Instructions Per Second) benchmark. Resources can be located in any time zone. Weekends and holidays can be mapped depending on resource’s local time to model non-Grid (local) workload. Resources can be booked for advance reservation. Applications with different parallel application models can be simulated. 2/25/2019

Salient features of the GridSim
Application tasks can be heterogeneous and they can be CPU or I/O intensive. There is no limit on the number of application jobs that can be submitted to a resource. Multiple user entities can submit tasks for execution simultaneously in the same resource, which may be time-shared or space-shared. This feature helps in building schedulers that can use different market-driven economic models for selecting services competitively. Network speed between resources can be specified. It supports simulation of both static and dynamic schedulers. Statistics of all or selected operations can be recorded and they can be analyzed using GridSim statistics analysis methods. 2/25/2019

A Modular Architecture for GridSim Platform and Components.
Appn Conf Res Conf User Req Grid Sc Output Application, User, Grid Scenario’s input and Results Grid Resource Brokers or Schedulers … Appn modeling Res entity Info serv Job mgmt Res alloc Statis GridSim Toolkit Single CPU SMPs Clusters Load Netw Reservation Resource Modeling and Simulation SimJava Distributed SimJava Basic Discrete Event Simulation Infrastructure PCs Workstation Distributed Resources Virtual Machine 2/25/2019

Web 2.0, Clouds, and Internet of Things
HPC: High - Performance Computing HTC: High - Throughput Computing P2P: Peer to Peer MPP: Massively Parallel Processors 2/25/2019

What is a Service Oriented Architecture?
2/25/2019

What is a Service Oriented Architecture (SOA)?
A method of design, deployment, and management of both applications and the software infrastructure where: All software is organized into business services that are network accessible and executable. Service interfaces are based on public standards for interoperability. 2/25/2019

Key Characteristics of SOA
Quality of service, security and performance are specified. Software infrastructure is responsible for managing. Services are cataloged and discoverable. Data are cataloged and discoverable. Protocols use only industry standards. 2/25/2019

What is a “Service”? A Service is a reusable component.
A Service changes business data from one state to another. A Service is the only way how data is accessed. If you can describe a component in WSDL, it is a Service. 2/25/2019

Information Technology is Not SOA
Business Mission Information Management Information Systems SOA Systems Design Information Technology Computing & Communications 2/25/2019

Why Getting SOA Will be Difficult
Managing for Projects: Software: years Hardware: years; Communications: years; Project Managers: years; Reliable funding: years; User turnover: 30%/year; Security risks: 1 minute or less. Managing for SOA: Data: forever. Infrastructure: 10+ years. 2/25/2019

Why Managing Business Systems is Difficult?
40 Million lines of code in Windows XP is unknowable. Testing application (3 Million lines) requires >1015 tests. Probability correct data entry for a supply item is <65%. There are >100 formats that identify a person in DoD. Output / Office Worker: >30 e-messages /day. 2/25/2019

How to View Organizing for SOA
P r i v a t e A p p l i c a t i o n s a n d F i l e s P E R S O N A L L E V E L P r i v a c y a n d I n d i v i d u a l VARIETY HERE S e c u r i t y B a r r i e r G r a p h i c I n f o W i n d o w , P e r s o n a l T o o l s , I n q u i r y L a n g u a g e s C u s t o m i z e d A p p l i c a t i o n s , P r o t o t y p i n g T o o l s , L o c a l L O C A L L E V E L A p p l i c a t i o n s a n d F i l e s A p p l i c a t i o n s S e c u r i t y B a r r i e r A p p l i c a t i o n s D e v e l o p m e n t & M a i n t e n a n c e A P P L I C A T I O N L E V E L B u s i n e s s S e c u r i t y B a r r i e r Service A Service B OSD B U S I N E S S L E V E L P r o c e s s S e c u r i t y B a r r i e r F u n c t i o n a l P r o c e s s A F u n c t i o n a l P r o c e s s B F u n c t i o n a l P r o c e s s C P R O C E S S L E V E L F u n c t i o n a l P r o c e s s D C o r p o r a t e P o l i c y , C o r p o r a t e S t a n d a r d s , R e f e r e n c e M o d e l s , D a t a M a n a g e m e n t a n d T o o l s , I n t e g r a t e d S y s t e m s C o n f i g u r a t i o n D a t a B a s e , S h a r e d C o m p u t i n g a n d E N T E R P R I S E L E V E L T e l e c o m m u n i c a t i o n s STABILITY HERE I n d u s t r y S t a n d a r d s , C o m m e r c i a l O f f - t h e - S h e l f P r o d u c t s a n d S e r v i c e s G L O B A L L E V E L 2/25/2019

SOA Must Reflect Timing
Corporate Policy, Corporate Standards, Reference Models, Data Management and Tools, Integrated Systems Configuration Data Base, Shared Computing and Telecommunications, Security and Survivability Business A Business B Infrastructure Support Applications Development & Maintenance ENTERPRISE PROCESS BUSINESS APPLICATION LOCAL Graphic InfoWindow, Personal Tools, Inquiry Languages Customized Applications, Prototyping Tools, Local Applications and Files GLOBAL Industry Standards, Commercial Off-the-Shelf Products and Services PERSONAL Private Applications and Files Functional Process A Functional Process B Functional Process C Functional Process D LONG TERM STABILITY & TECHNOLOGY COMPLEXITY SHORT TERM ADAPTABILITY & SIMPLICITY 2/25/2019

SOA Must Reflect Conflicting Interests
Personal Local Organizations Missions Enterprise 2/25/2019

Organization of Infrastructure Services
(Enterprise Information) Data Services Security Services Computing Services Communication Services Application Services 2/25/2019

Organization of Data Services
Discovery Services Management Services Collaboration Services Interoperability Services Semantic Services 2/25/2019

Data Interoperability Policies
Data are an enterprise resource. Single-point entry of unique data. Enterprise certification of all data definitions. Data stewardship defines data custodians. Zero defects at point of entry. De-conflict data at source, not at higher levels. Data aggregations from sources data, not from reports. 2/25/2019

Data Concepts Data Element Definition Data Element Registry
Text associated with a unique data element within a data dictionary that describes the data element, give it a specific meaning and differentiates it from other data elements. Definition is precise, concise, non-circular, and unambiguous (ISO/IEC Metadata Registry specification) Data Element Registry A label kept by a registration authority that describes a unique meaning and representation of data elements, including registration identifiers, definitions, names, value domains, syntax, ontology and metadata attributes. (ISO ). 2/25/2019

Data and Services Deployment Principles
Data, services and applications belong to the Enterprise. Information is a strategic asset. Data and applications cannot be coupled to each other. Interfaces must be independent of implementation. Data must be visible outside of the applications. Semantics and syntax is defined by a community of interest. Data must be understandable and trusted. 2/25/2019

Organization of Security Services
Transfer Services Protection Services Certification Services Systems Assurance Authentication Services 2/25/2019

Security Services = Information Assurance
Conduct Attack/Event Response Ensure timely detection and appropriate response to attacks. Manage measures required to minimize the network’s vulnerability. Secure Information Exchanges Secure information exchanges that occur on the network with a level of protection that is matched to the risk of compromise. Provide Authorization and Non-Repudiation Services Identify and confirm a user's authorization to access the network. 2/25/2019

Organization of Computing Services
Facilities Resource Planning Control & Quality Configuration Services Financial Management 2/25/2019

Computing Services Provide Adaptable Hosting Environments
Global facilities for hosting to the “edge”. Virtual environments for data centers. • Distributed Computing Infrastructure Data storage, and shared spaces for information sharing. • Shared Computing Infrastructure Resources Access shared resources regardless of access device. 2/25/2019

Organization of Communication Services
Interoperability Services Spectrum Management Connectivity Arrangements Continuity of Services Resource Management 2/25/2019

Network Services Implementation
From point-to-point communications (push communications) to network-centric processes (pull communications). Data posted to shared space for retrieval. Network controls assure data synchronization and access security. 2/25/2019

Communication Services
Provide Information Transport Transport information, data and services anywhere. Ensures transport between end-user devices and servers. Expand the infrastructure for on-demand capacity. 2/25/2019

Organization of Application Services
Component Repository Code Binding Services Maintenance Management Portals Experimental Services 2/25/2019

Application Services and Tools
• Provide Common End User Interface Tools Application generators, test suites, error identification, application components and standard utilities. Common end-user Interface Tools. , collaboration tools, information dashboards, Intranet portals, etc. 2/25/2019

Example of Development Tools
Business Process Execution Language, BPEL, is an executable modeling language. Through XML it enables code generation. Traditional Approach BPEL Approach - Hard-coded decision logic Externalized decision logic - Developed by IT Modeled by business analysts - Maintained by IT Maintained by policy managers - Managed by IT Managed by IT - Dependent upon custom logs Automatic logs and process capture - Hard to modify and reuse Easy to modify and reuse 2/25/2019

A Few Key SOA Protocols Universal Description, Discovery, and Integration, UDDI. Defines the publication and discovery of web service implementations. The Web Services Description Language, WSDL, is an XML-based language that defines Web Services. SOAP is the Service Oriented Architecture Protocol. It is a key SOA in which a network node (the client) sends a request to another node (the server). The Lightweight Directory Access Protocol, or LDAP is protocol for querying and modifying directory services. Extract, Transform, and Load, ETL, is a process of moving data from a legacy system and loading it into a SOA application. 2/25/2019

CS6703 GRID AND CLOUD COMPUTING Unit 1

Similar presentations

Presentation on theme: "CS6703 GRID AND CLOUD COMPUTING Unit 1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS6703 GRID AND CLOUD COMPUTING Unit 1

Similar presentations

Presentation on theme: "CS6703 GRID AND CLOUD COMPUTING Unit 1"— Presentation transcript:

Similar presentations

About project

Feedback