Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Infrastructure.

Similar presentations


Presentation on theme: "Grid Infrastructure."— Presentation transcript:

1 Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

2 Acknowledgements Presentation is based on slides from: –Roberto Barbera, University of Catania and INFN (EGEE Tutorial Roma, 02.11.2005) –Mike Mineter, Concepts of grid computing –Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (Naregi Symposium 2005 – Tokyo) –Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (APAC, 27 September 2005) –Guy Warner, NeSC Training Team (An Induction to EGEE for GOSC and the NGS NeSC, 8th December 2004 ) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 2

3 What is it ? Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3 SERVERS Clients

4 IT all about IT Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 4

5 Hardware utilization Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 5

6 SOA & Web services Decompose processing into services Each service works independently Main components: –Universal Description, Discovery and Integration –Simple Object Access Protocol –Web Services Description Language W3C standard Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6

7 7

8 8

9 THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson) Google grid Microsoft's live.com Yahoo! Amazon.com eBay Salesforce.com Well, that's O(5) ;) Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9

10 Scaling Scale-up –Add more resources within the system –Does not requires changes in the applications –Limited extension –Singe point of failure Scape-out –Add more systems –Architecture dependent (needs change of code) –Economically Howto ? –Split the operation into groups –Perform each group on a different machine Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10

11 How fast can parallelization be ? Let: – α be the proportion of the process that can not be parallelized. –P – number of processors –S – System speedup Amdhals law: S = 1 / (α + (1- α ) / P ) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11

12 Cluster types High availability –Active-Active –Active-Passive –Heart beat Load Balancing Cluster –Round robin (weighted/non-weighted) –System status aware (session, cpu load, etc) Compute cluster –Queuing system (condor, hadoop, open-pbs, LSF, etc.) –Single system image (ScaleMP, SSI, Mosix, nomad,etc.) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12

13 Condor script  #################  # Sample script #  #################  Executable= /bin/hostname  when_to_transfer_output = ON_EXIT_OR_EVICT  Log= {file name}.log  Error = err.$(Process)  Output = out.$(Process)  Requirements= substr(Machine,0,4)=="dopp" && ARCH=="X86_64"  Arguments= +-u  notification= Complete  Universe= VANILLA  Queue 10

14 From a single PC to a Grid Farm of PCs Examples: Seti@home Africa@home Example: EGEE Enterprise grid: Mutualization of resources in a company Volunteer computing: CPU cycles made available by PC owners Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)

15 Batch to On-Line scale gLite & Globus Dedicated resources PBS Torque Utility computing (Condor) hadoop Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15

16 Key Cloud Services Attributes Off-Site, Thirds-party provider Access via Internet Minimal/no IT skills required to “implement” Provisioning - self-service requesting; near real-time deployment; dynamic & fine-grained scaling Fine-grained usage-based pricing model UI - browser and successors Web services APIs as System Interface Shared resources/common versions Source: IDC, Sep 2008

17 What is “Grid” Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17

18 What is Grid Computing ? Definition is not widely agreed Foster & Kesselman: Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved. Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18

19 Other definitions "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner) "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of- service requirements“ (Buyya )autonomous "a service for sharing computer power and data storage capacity over the Internet." (CERN)Internet Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19

20 Virtual Organization What’s a VO? –People in different organisations seeking to cooperate and share resources across their organisational boundaries Why establish a Grid? –Share data –Pool computers –Collaborate The initial vision: “The Grid” The present reality: Many “grids” Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20 Institute A VO1 Institute CInstitute BInstitute DInstitute E VO2 Institute F

21 The Grid Metaphor Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21

22 Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22

23 Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23

24 Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24

25 Middleware components – The batch approach Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 25 InformationService SE & CE info Publish Input “sandbox” + Broker Info ReplicaCatalogue DataSets info Logging & Book-keeping Author. &Authen. StorageElement ComputingElement Output “sandbox” ResourceBroker Job Status Job Submit Event Job Query Job Status Input “sandbox” Output “sandbox” “User interface”

26 UI Network Server Job Contr. Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node Characts. & status

27 UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status submitted Job Status UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs)

28 UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; submitted Job Statu s Job Description Language (JDL) to specify job characteristics and requirements

29 UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Input Sandbox files Job waiting submitted Job Status NS: network daemon responsible for accepting incoming requests

30 Job submission UI Network Server Job Contr. - CondorG Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status WM: acts to satisfy the request Job Workload manager

31 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where must this job be executed ?

32 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Matchmaker: responsible to find the “best” CE for a job

33 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ?

34 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker CE choice

35 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Job Adapter Job Adapter: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, PFN, etc.)

36 Job submission UI Network Server Job Contr. Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job Controller: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready

37 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job submitted waiting ready scheduled

38 “Compute element” – reminder! Homogeneous set of worker nodes Grid gate node Local resource management system: Condor / PBS / LSF master Globus gatekeeper Job request Info system Logging gridmapfile I.S. Logging

39 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job Input Sandbox files

40 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done

41 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status submitted waiting ready scheduled running done edg-job-get-output

42 Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done cleared

43 Job monitoring UI Log Monitor Logging & Bookkeeping Network Server Job Contr. - CondorG Workload Manager Computing Element RB node LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB LB: receives and stores job events; processes corresponding job status Log of job events edg-job-status edg-job-get-logging-info Job status

44 Grid Operation and Security by Eddie Aronovich, Mar 2008 44 Approaches to Security: 1 The Poor Security House

45 Grid Operation and Security by Eddie Aronovich, Mar 2008 45 Approaches to Security: 2 The Paranoid Security House

46 Grid Operation and Security by Eddie Aronovich, Mar 2008 46 Approaches to Security: 3 The Realistic Security House

47 Grid Operation and Security by Eddie Aronovich, Mar 2008 47 Mapping certificate to local user Site use local accounting system Pool of users dedicated for the Grid Each user is mapped using gridmap file or VOMS Mapping can implement local policy on external users

48 Grid Operation and Security by Eddie Aronovich, Mar 2008 48 Certificate Request Private Key encrypted on local disk Certificate Request Public Key ID Cert User generates public/private key pair. User send public key to CA along with proof of identity. CA confirms identity, signs certificate and sends back to user. slide based on presentation given by Carl Kesselman at GGF Summer School 2004 Public

49 Grid Operation and Security by Eddie Aronovich, Mar 2008 49 Inside the Certificate Standard (X.509) defined format. User identification (e.g. full name). Users Public key. A “signature” from a CA created by encoding a unique string (a hash) generated from the users identification, users public key and the name of the CA. The signature is encoded using the CA’s private key. This has the effect of: –Proving that the certificate came from the CA. –Vouching for the users identification. –Vouching for the binding of the users public key to their identification. Name Issuer: CA Public Key Signature

50 Grid Operation and Security by Eddie Aronovich, Mar 2008 50 Mutual Authentication sA sends their certificate; sB verifies signature in A’s certificate; sB sends to A a challenge string; sA encrypts the challenge string with his private key; sA sends encrypted challenge to B sB uses A’s public key to decrypt the challenge. sB compares the decrypted string with the original challenge sIf they match, B verified A’s identity and A can not repudiate it. A B A’s certificate Verify CA signature Random phrase Encrypt with A’ s private key Encrypted phrase Decrypt with A’ s public key Compare with original phrase

51 Grid Operation and Security by Eddie Aronovich, Mar 2008 51 Proxy certificate Avoid passphrase re-enter by creating a proxy Proxy consists of a new certificate and a private key Proxy certificate contains the owner's identity (modified) Remote party receives proxy's certificate (signed by the owner), and owner's certificate. Proxy certificate is life-time limited Chain of trust from the CA to proxy through the owner

52 Grids in Europe www.eu-egi.eu 52 EGEE08 Istanbul, Turkey www.eu-egi.eu Prof. Dieter KRANZLMUELLER, EGEE 08

53 To be continued Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 53


Download ppt "Grid Infrastructure."

Similar presentations


Ads by Google