The OurGrid Project Walfredo Cirne Universidade Federal de Campina Grande.

Slides:



Advertisements
Similar presentations
1 Bogotá, EELA-2 1 st Conference, On the Co-existence of Service and Opportunistic Grids Francisco Brasileiro Universidade Federal.
Advertisements

Operating System.
MyGrid: A User-Centric Approach for Grid Computing Walfredo Cirne Universidade Federal da Paraíba.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
High Performance Computing Course Notes Grid Computing.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
The OurGrid Project Walfredo Cirne Universidade Federal de Campina Grande.
Running Thor over MyGrid Walfredo Cirne Universidade Federal de Campina Grande.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Labs of The World, Unite!!! Walfredo Cirne Universidade Federal de Campina Grande.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández.
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
E-science grid facility for Europe and Latin America OurGrid E2GRIS1 Rafael Silva Universidade Federal de Campina.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
OurGrid: A Simple Solution for Running Bag-of-Tasks Applications on Grids Marcelo Meira, Walfredo Cirne (marcelo, Universidade.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Alexandre Duarte Gustavo Wagner Francisco Brasileiro Walfredo Cirne Multi-Environment Software Testing on the Grid Universidade Federal de Campina Grande.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
JEMMA: an open platform for a connected Smart Grid Gateway GRUPPO TELECOM ITALIA MAS2TERING Smart Grid Workshop Brussels, September Strategy &
1 Catania, 4 th EEGE User Forum/OGF 25, OurGrid integration with gLite based grids in EELA-2 Francisco Brasileiro Universidade.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Bogotá, EELA-2 1 st Conference, The OurGrid Approach for Opportunistic Grid Computing Francisco Brasileiro Universidade Federal.
E-science grid facility for Europe and Latin America Bridging the High Performance Computing Gap with OurGrid Francisco Brasileiro Universidade.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina.
E-infrastructure shared between Europe and Latin America Interoperability between EELA and OurGrid Alexandre Duarte CERN and UFCG 1 st.
Operating System Principles And Multitasking
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
Uppsala, April 12-16th 2010EGEE 5th User Forum1 A Business-Driven Cloudburst Scheduler for Bag-of-Task Applications Francisco Brasileiro, Ricardo Araújo,
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
E-infrastructure shared between Europe and Latin America Interoperability between EELA and OurGrid Alexandre Duarte CERN IT-GD EELA Project.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
Background Computer System Architectures Computer System Software.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
An operating system (OS) is a collection of system programs that together control the operation of a computer system.
Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
8 th International Desktop Grid Federation Workshop, Hannover, Germany, August 17 th, 2011 DEGISCO Desktop Grids for International Scientific Collaboration.
Deploying Research in the Real World: The OurGrid Experience
Operating System.
Example: Rapid Atmospheric Modeling System, ColoState U
Parallel Objects: Virtualization & In-Process Components
Grid Computing.
THE OPERATION SYSTEM The need for an operating system
Hierarchical Architecture
Interoperability & Standards
TYPES OFF OPERATING SYSTEM
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Chapter 2: System Structures
Lecture Topics: 11/1 General Operating System Concepts Processes
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
ShareGrid: architettura e middleware
Presentation transcript:

The OurGrid Project Walfredo Cirne Universidade Federal de Campina Grande

What is a Grid? Computational Grid (source of computational resources and services) Computational Grid (source of computational resources and services)

Solving a real problem To finish my Ph.D., I had to run hundreds of thousands of independent simulations Since my simulations were independent, I had the perfect application for the grid I was in top grid research lab, but could not use the grid −Grid solutions are not in place yet

The motivation for MyGrid Users of loosely-coupled applications could benefit from the Grid now However, they don´t run on the Grid today because the Grid Infrastructure is not widely deployed What if we build a solution that does not depend upon installed Grid infrastructure?

MyGrid MyGrid allows a user to run Bag-of-Tasks parallel applications on whatever resources she has access to −Bag-of-Tasks applications are those parallel applications formed by independent tasks One’s grid is all resources one has access to −No grid infrastructure software is necessary −Grid infrastructure software can be used (whenever available)

Bag-of-Tasks Applications Data mining Massive search (as search for crypto keys) Parameter sweeps Monte Carlo simulations Fractals (such as Mandelbrot) Image manipulation (such as tomography) And many others…

What is MyGrid? A broker (or application scheduler) A set of abstractions hide the grid heterogeneity from the user

An Example: Factoring with MyGrid init mg-services put $PROC./Fat.class $PLAYPEN grid1 java Fat output-$TASK collect mg-services get $PROC $PLAYPEN output-$TASK grid2 java Fat output-$TASK

Defining my personal Grid proc: name = ostra.lsd.ufcg.edu.br attributes = lsd, linux type = user_agent proc: name = memba.ucsd.edu attributes = lsd, solaris type = grid_script rem_exec = ssh %machine%command copy_to = scp %localdir/%file %machine:%remotedir copy_from = scp %machine:%remotedir/%file %localdir [...]

Making MyGrid Encompassing Home Machine Scheduler GridMachine Interface Globus Proxy UA Proxy Grid Script... Grid Machine Globus GRAM Grid Machine User Agent Grid Machine...

Dealing with Firewalls, Private IPs, and Space-Shared Machines Scheduler (Home Mac.) User Agent Grid Script Globus Proxy Grid Machine Gateway Space-Shared Gateway

The Scheduling Challenge Grid scheduling typically depends on information about the grid (e.g. machine speed and load) and the application (e.g. task size) However, getting grid information makes it harder to build an encompassing system −The GridMachine Interface would have to be richer, and thus harder to implement Moreover, getting application information makes the system harder to use and more complex −The user would have to provide task size estimates

Scheduling with no information Work-queue with Replication −Tasks are sent to idle processors −When there are no more tasks, running tasks are replicated on idle processors −The first replica to finish is the official execution −Other replicas are cancelled −Replication may have a limit The key is to avoid having the job waiting for a task that runs in a slow/loaded machine

Work-queue with Replication 8000 experiments Experiments varied in −grid heterogeneity −application heterogeneity −application granularity Performance summary:

WQR Overhead Obviously, the drawback in WQR is cycles wasted by the cancelled replicas Wasted cycles:

Data Aware Scheduling WQR achieves good performance for CPU- intensive BoT applications However, many important BoT applications are data-intensive These applications frequently reuse data −During the same execution −Between two successive executions There are knowledge-dependent schedulers that explore data reutilization

Storage Affinity The “affinity” between a task and a site is the number of bytes within task input that is already stored at there −The heuristic is based on easy-to-get static information (size and location of data) The task with largest “affinity” is prioritized −The idea is avoid unnecessary data transfers Replication is used to cope with mistakes

Storage Affinity Results 3000 experiments Experiments varied in −grid heterogeneity −application heterogeneity −application granularity Performance summary: Storage AffinityXSufferageWQR Average (seconds) Standard Deviation

Proof of Concept During a 40-day period, we ran 600,000 simulations using 178 processors located in 6 different administrative domains widely spread in the USA We only had GridScript and WorkQueue MyGrid took 16.7 days to run the simulations My desktop machine would have taken 5.3 years to do so Speed-up is for 178 processors

HIV research with MyGrid B,c,FB,c,F HIV-2 HIV-1 M O ABCDFGHJKABCDFGHJK N ? prevalent in Europe and Americas prevalent in Africa majority in the world 18% in Brazil

HIV protease + Ritonavir Subtype B RMSD Subtype F

The HIV Research Grid 55 machines in 6 administrative domains in the US and Brazil −The machines were accessed via User Agent, UA + Grid Machine Gateway, UA + ssh tunnel, and Grid Scripts Task = 3.3 MB input, 1 MB output, 4 to 33 minutes of dedicated execution Ran 60 tasks in 38 minutes Speed-up is 29.2 for 55 machines −Considering an 18.5-minute average machine

MyGrid Status MyGrid is open source and is available at −About 150 downloads −2.0 version released two months ago −Base of the PAUÁ Grid, currently being deployed by HP Brazil Bag-of-tasks parallel applications can currently benefit from the Grid −However, firewalls, private IPs and the such make it much harder than we initially thought

More resources −People want to use more resources than they have access to Good “debugging” −Abstractions are wonderful when they work, but when they fail... :-( More security −Local resources −Use of grid machine as attack launchpad Richer programming model Demands of MyGrid Users  OurGrid  MyGridDoctor  SWAN

OurGrid: A Network of Favors Getting access to resources is out of scope of today’s grid solutions −Grid economy will solve this problem some day But BoT applications can use lots of resource now Let’s trade off generality for simplicity −There are at least 2 resource providers −Applications that shall use the resources need no QoS guarantees P2P resource sharing community −Network of Favors

Making people collaborate It’s important to encourage collaboration within OurGrid (i.e., resource sharing) −In file-sharing, most users free-ride OurGrid uses a P2P Reputation Scheme −All peers maintain a local balance for all known peers −Peers with greater balances have priority −The emergent behavior of the system is that by donating more, you get more resources −No additional infrastructure is needed

A B C D E OurGrid resource sharing [1] ConsumerQuery (broadcast) ProviderWorkRequest ConsumerFavor ProviderFavorReport * * * = no idle resources now broker B60 D45

OurGrid resource sharing [2] A B C D E B60 D45 E 0 ConsumerQuery ProviderWorkRequest * * = no idle resources now * broker

Free-rider consumption Epsilon is the fraction of resources consumed by free-riders

Equity among collaborators

Sandboxing for BoT applications In the OurGrid Community, a peer runs unknown code that comes from the Grid This an obvious security concern −Threat to local data and resources −Use of machine as drone to attack others We leverage from the fact BoT applications communicate only to receive input and send output to run the guest application in a very tight sandbox, with no network access

Adding a second line of defense We also reboot to add a second layer of protection to the user data and resources This has the extra advantage of enabling us to use an OS different from that chosen by the user −That is, even if the user prefers Windows, we can still have Linux Booting back to the user OS can be done fast by using hibernation

SWAN architecture

OurGrid overall architecture 1,..., n User Interface Site Manager SWAN Sandboxing MyGrid SWAN

Collaboration/Interest on OurGrid HP Brazil R&D HP Labs Bristol HP Partners −LNCC, UniSantos, UniFor, Instituto Atlântico −CESAR/UFPE, Instituto Eldorado, IPT, AMR −PUCRS, UniSinos, UFRGS, USP Others −UCSD, UnB, UFBA, UCS, UniCap, UFPB, UFAL...

Questions?

Thank you! Merci! Danke! Grazie! Gracias! Obrigado! More at