Presentation on theme: "Fundamentals of Grid Computing IBM Redbooks paper Viktors Berstis Presented by: Saeed Ghanbari Saeed Ghanbari."— Presentation transcript:
Fundamentals of Grid Computing IBM Redbooks paper Viktors Berstis Presented by: Saeed Ghanbari Saeed Ghanbari
What is Grid Computing? The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid. –The definitive definition of a Grid is provided by Ian Foster in his article "What is the Grid? Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved. –Plaszczak/Wellner define Grid technology as "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." –IBM : "A Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across multiple administrative domains based on the resources availability, capacity, performance, cost and users' quality-of-service requirements"
Topics to be covered What grid computing can do Grid concepts and components Grid construction Using a grid –A user’s perspective –An administrator’s perspective –An application developer’s perspective
What grid computing can do(1) Exploiting underutilized resources –Computing: Desktop: less than %5 Even servers in many organizations –Unused disk capacity –Implications: without undue overhead. remote machine must meet any special hardware, software, or resource requirements Parallel CPU capacity –Subjobs on different machines –Barriers often exist to perfect scalability.
What grid computing can do(2) Applications –Grid-enabled applications –no practical tools for transforming arbitrary applications to exploit the parallel capabilities of a grid.
What grid computing can do(3) Virtual resources and virtual organizations for collaboration –More capable than distributed computing Wider audience Open standards, hence highly heterogeneous systems –Data, equipment, software, services, licenses,… –Several real and virtual organizations
What grid computing can do(3) Access to additional resources –special equipment, software, licenses, and other services Resource balancing
What grid computing can do(4) Reliability –Now: redundancy in hardware –Future: Software –Utilize “autonomic computing” Management –More disperse IT infrastructure –Priority among projects
Grid concepts and components(1) Types of resources –Computation –Storage Primary/secondary storage Mountable networked filed system –AFS, NFS, DFS, GPFS Capacity increase Uniform name space Data Stripping
Grid concepts and components(2) Types of resources (cont) Communications –Redundant communication paths Software and licenses –License management software Special equipment, capacities, architectures, and policies –different architectures, operating systems, devices, capacities, and equipment. Jobs and applications –Application is a collection of jobs –Specific dependencies
Grid concepts and components(3) Types of resources (cont) Scheduling, reservation, and scavenging –scheduler automatically finds the most appropriate machine on which to run any given job –scavenging report its idle status to the grid management node. –SETI@home: Search for Extraterrestrial Intelligence at Home –Reserved dedicated resources
Grid concepts and components(4) Intragrid to Intergrid –cluster same hardware/software –Intragrid heterogeneous machines/software multiple department/same organization –Intergrid heterogeneous machines/software multiple department/multiple organization
Grid construction(1) Grid software components Management components –resource accounting load sensors –resource evaluation overall usage patterns –autonomic computing Donor software –each machine needs to enroll as a member of the grid and install some software that manages the grid’s use of its resources –authentication –monitoring –check pointing / resuming Submission software
Grid construction(2) Grid software components (cont.) Distributed grid management –hierarchy of clusters Schedulers –job priority system –react to immediate load –monitor the progress of scheduled jobs & re- submisson –reservation system –meta-scheduler Communications –jobs communicate with each other. The open standard Message Passing Interface (MPI)
Using a grid: A user’s perspective(1) Enrolling and installing grid software –authentication for security purposes –certificate authority –decide which resources to donate to the grid Logging onto the grid –grid login ID
Using a grid: A user’s perspective(2) Queries and submitting jobs –staging the input data –different architectures : multiple versions of the program –job execution sandbox –collect results Data configuration –data replication –networked file system caching feature enabled
Using a grid: A user’s perspective(3) Monitoring progress and recovery –Degree of recovery for subjobs that fail –Failures Programming error Hardware or power failure Communications interruption Excessive slowness –Recovery Scheduler User
Using a grid: An administrator’s perspective(1) Planning Installation Managing enrollment of donors and users Certificate authority –It is critical to ensure the highest levels of security in a grid because the grid is designed to execute code and not just share data Positively identify entities requesting certificates Issuing, removing, and archiving certificates Protecting the certificate authority server Maintaining a namespace of unique names for certificate owners Serve signed certificates to those needing to authenticate entities Logging activity
Using a grid: An administrator’s perspective(2) Resource management –setting permissions –Tracking resource usage –Implementing a billing system –policies to achieve better utilization
Using a grid: An application developer’s perspective(1) Applications that are not enabled for using multiple processors but can be executed on different machines. Applications that are already designed to use the multiple processors of a grid setting. Applications that need to be modified or rewritten to better exploit a grid –Tools for debugging and measuring the behavior of grid applications
Using a grid: An application developer’s perspective(2) Globus –developer’s toolkit Manage grid operations Measurement Repair Debug grid applications Open Grid Services Architecture (OGSA)
Enabling Grids for E-sciencE (EGEE) CERN's new particle accelerator –15 petabytes(15 million gigabytes) a year stack of CDs more than 20 km high!!! –200 sites around the globe –Over 20 000 computers –Runing up to 30 000 jobs per day Has already served for: –300 000 chemical compounds in search of potential drugs for Flu –Simulations of over 40 million potential drug molecules against malaria