Labs of The World, Unite!!! Walfredo Cirne Universidade Federal de Campina Grande.

Slides:



Advertisements
Similar presentations
Operating System.
Advertisements

Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
MyGrid: A User-Centric Approach for Grid Computing Walfredo Cirne Universidade Federal da Paraíba.
High Performance Computing Course Notes Grid Computing.
The OurGrid Project Walfredo Cirne Universidade Federal de Campina Grande.
The OurGrid Project Walfredo Cirne Universidade Federal de Campina Grande.
The Grid Background and Architecture. 1. Keys to success for IT technologies Infrastructure Open Standards.
Running Thor over MyGrid Walfredo Cirne Universidade Federal de Campina Grande.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Virtualization for Cloud Computing
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Operating Systems  A collection of programs that  Coordinates computer usage among users  Manages computer resources  Handle Common Tasks.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
E-science grid facility for Europe and Latin America OurGrid E2GRIS1 Rafael Silva Universidade Federal de Campina.
Operating Systems. Without an operating system your computer would be useless! A computer contains an Operating System on its Hard Drive. This is loaded.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 13 Understanding and Installing Windows 2000 and Windows NT.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Operating System Concepts Chapter One: Introduction What is an operating system? Simple Batch Systems Multiprogramming Systems Time-Sharing Systems Personal-Computer.
OurGrid: A Simple Solution for Running Bag-of-Tasks Applications on Grids Marcelo Meira, Walfredo Cirne (marcelo, Universidade.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Alexandre Duarte Gustavo Wagner Francisco Brasileiro Walfredo Cirne Multi-Environment Software Testing on the Grid Universidade Federal de Campina Grande.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
1 Catania, 4 th EEGE User Forum/OGF 25, OurGrid integration with gLite based grids in EELA-2 Francisco Brasileiro Universidade.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
1 Bogotá, EELA-2 1 st Conference, The OurGrid Approach for Opportunistic Grid Computing Francisco Brasileiro Universidade Federal.
E-science grid facility for Europe and Latin America Bridging the High Performance Computing Gap with OurGrid Francisco Brasileiro Universidade.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
E-infrastructure shared between Europe and Latin America Interoperability between EELA and OurGrid Alexandre Duarte CERN and UFCG 1 st.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Uppsala, April 12-16th 2010EGEE 5th User Forum1 A Business-Driven Cloudburst Scheduler for Bag-of-Task Applications Francisco Brasileiro, Ricardo Araújo,
Oxford eScience OxGrid: Virtualisation at Oxford Rhys Newman Manager of Interdisciplinary Grid Development, Oxford University Campus Grid Workshop – Edinburgh.
E-infrastructure shared between Europe and Latin America Interoperability between EELA and OurGrid Alexandre Duarte CERN IT-GD EELA Project.
E-science grid facility for Europe and Latin America JRA1 role and its interaction with SA1 and NA3 Francisco Brasileiro Universidade Federal.
Virtual Server Server Self Service Center (S3C) JI July.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Harvesting Free Windows CPU Cycles for Linux Applications using Sandboxing Rasmus Andersen Dept. of Computer Science, University of Copenhagen, Denmark.
8 th International Desktop Grid Federation Workshop, Hannover, Germany, August 17 th, 2011 DEGISCO Desktop Grids for International Scientific Collaboration.
Deploying Research in the Real World: The OurGrid Experience
Grid Computing.
Interoperability & Standards
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Chapter 1: Introduction
Outline Chapter 2 (cont) OS Design OS structure
ShareGrid: architettura e middleware
Presentation transcript:

Labs of The World, Unite!!! Walfredo Cirne Universidade Federal de Campina Grande

eScience Computers are changing scientific research −Enabling collaboration −As investigation tools (simulations, data mining, etc...) As a result, many research labs around the world are now computation hungry Buying more computers is just part of answer Better using existing resources is the other

Solution 1: Globus Grids promise “plug on the wall and solve your problem” Globus is the closest realization of such vision −Deployed for dozens of sites But it requires highly-specialized skills and complex off-line negotiation Good solution for large labs that work in collaboration with other large labs −CERN’s LCG is a good example of state-of-art

Solution 2: Voluntary Computing have been a great success, harnessing the power of millions of computers However, to use this solution, you must −have a very high visibility project −be in a well-known institution −invest a good deal of effort in “advertising” −have a very good computer science team to run “the server”

And what about the thousands of small and middle research labs throughout the world which also need lots of compute power?

Solution 3: OurGrid OurGrid is easy to install and automatically configures itself Labs can freely join the system without any human intervention OurGrid is a peer-to-peer grid −Each lab correspond to a peer in the system To keep it doable, we focus on Bag-of-Tasks application

Bag-of-Tasks Applications Data mining Massive search (as search for crypto keys) Parameter sweeps Monte Carlo simulations Fractals (such as Mandelbrot) Image manipulation (such as tomography) And many others…

OurGrid Components OurGrid: A peer-to-peer network that performs fair resource sharing among unknown peers MyGrid: A broker that schedules BoT applications SWAN: A Xen-based sandbox that makes it safe running a computation for an unknown peer

OurGrid Architecture User Interface Application Scheduling User Interface Application Scheduling Site Manager Grid-wide Resourse Sharing Site Manager Grid-wide Resourse Sharing SWAN Sandboxing MyGrid 1 SWAN... n

An Example: Factoring with MyGrid task: init: put./Fat.class $PLAYPEN remote: java Fat output-$TASK final: get $PLAYPEN/output-$TASK results task: init: put./Fat.class $PLAYPEN remote: java Fat output-$TASK final: get $PLAYPEN/output-$TASK results task: init: put./Fat.class $PLAYPEN remote: java Fat output-$TASK final: get $PLAYPEN/output-$TASK results....

MyGrid GUI

Network of Favors OurGrid forms a peer-to-peer community in which peers are free to join It’s important to encourage collaboration within OurGrid (i.e., resource sharing) −In file-sharing, most users freeride OurGrid uses the Network of Favor −All peers maintain a local balance for all known peers −Peers with greater balances have priority −The emergent behavior of the system is that by donating more, you get more resources −No additional infrastructure is needed

Network of Favors NoF is a local reputation system based on past history of service among peers. Peer A keeps track of work it has done for others v(A,B) and work done for it v(B,A). Reputation calculated based on past service provided. Users who are the best contributors of past processing power get higher priority access to cycles.

Network of Favors T A (B) : reputation of peer B at node A Function of v(A,B) and v(B,A) Example 1: T A (B) = v(B,A) - v(A,B) Vulnerable to identity changing attack. Example 2: T A (B) = max(0, v(B,A) - v(A,B)) Robust to identity changing attack, but cannot distinguish between malicious ID-changing node who never donates and collaborating peer who has. Example 3: T A (B) = max(0,v(B,A) - v(A,B) + log(v(B,A)) Add history term that reflects for peer A the history of its donations from B.

A B C D E NoF at Work [1] ConsumerFavor ProviderFavorReport * * * = no idle resources now broker B60 D45

NoF at Work [2] A B C D E B60 D45 E 0 ConsumerQuery ProviderWorkRequest * * = no idle resources now * broker

Free-rider Consumption Epsilon is the fraction of resources consumed by free-riders

Equity Among Collaborators

Scheduling with No Information Grid scheduling typically depends on information about the grid (e.g. machine speed and load) and the application (e.g. task size) However, getting good information is hard Can we schedule without information and deploy the system now? −This would make the system much easier to deploy and simple to use

Work-queue with Replication Tasks are sent to idle processors When there are no more tasks, running tasks are replicated on idle processors The first replica to finish is the official execution Other replicas are cancelled

Evaluation 8000 experiments Experiments varied in −grid heterogeneity −application heterogeneity −application granularity Performance summary:

WQR Overhead Obviously, the drawback in WQR is cycles wasted by the cancelled replicas Wasted cycles:

Data Aware Scheduling WQR achieves good performance for CPU- intensive BoT applications However, many important BoT applications are data-intensive These applications frequently reuse data −During the same execution −Between two successive executions

Storage Affinity Storage Affinity uses replication and just a bit of static information to achieve good scheduling for data intensive applications Storage Affinity uses information on which data servers have already stored a data item

Storage Affinity Results 3000 experiments Experiments varied in −grid heterogeneity −application heterogeneity −application granularity Performance summary: Storage AffinityX-SuffrageWQR Average (seconds) Standard Deviation

SWAN: OurGrid Security Running an unknown application that comes from an unknown peer is a clear security threat We leverage the fact that Bag-of-Tasks applications only communicate to receive input and return the output Input/output is done by OurGrid itself The remote task runs inside a Xen virtual machine, with no network access, and disk access only to a designated partition

SWAN Architecture Guest OS Grid OS Grid Middleware Grid Application Guest OS Grid OS Grid Middleware Grid Application Guest OS Grid OS Grid Middleware Grid Application

Adding a second line of defense We can also reboot to add a second layer of protection to the user data and resources This has the extra advantage of enabling us to use an OS different from that chosen by the user −That is, even if the interactive user prefers Windows, we can still have Linux Booting back to the user OS can be done fast by using hibernation

SWAN + Mode Switcher

Making it Work for Real...

OurGrid Status OurGrid free-to-join grid is in production since December 2004 −We currently have 300 machines spread around 20 sites −See status.ourgrid.org for the current status of the system OurGrid is open source (GPL) and is available at −We’ve had external contributions

HIV research with OurGrid B,c,FB,c,F HIV-2 HIV-1 M O ABCDFGHJKABCDFGHJK N ? prevalent in Europe and Americas prevalent in Africa majority in the world 18% in Brazil

HIV protease + Ritonavir Subtype B RMSD Subtype F

Performance Results for the HIV Application 55 machines in 6 administrative domains in the US and Brazil Task = 3.3 MB input, 1 MB output, 4 to 33 minutes of dedicated execution Ran 60 tasks in 38 minutes Speed-up is 29.2 for 55 machines −Considering an 18.5-minute average machine

Conclusions We have an free-to-join grid solution for Bag- of-Tasks applications working today Real users provide invaluable feedback for systems research Delivering results to real users is really cool! :-)

Next Steps Improve virtualization −to allow site-wide file sharing, and communication of tasks within a site −to enable communication among tasks spread in the grid −to make it possible for jobs to provide services outside the grid Move beyond Bag-of-Tasks applications Better present our components in a more diverse SOA setting Investigate how to make use of information that is becoming available in the grid Investigate how to better manage failures in grids

Questions?

Thank you! Merci! Danke! Grazie! Gracias! Obrigado! More at