Presentation is loading. Please wait.

Presentation is loading. Please wait.

NAREGI Middleware Beta 1 and Beyond

Similar presentations


Presentation on theme: "NAREGI Middleware Beta 1 and Beyond"— Presentation transcript:

1 NAREGI Middleware Beta 1 and Beyond
Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo Institute of Technology / NII

2 The Titech TSUBAME Production Supercomputing Cluster, Spring 2006
Voltaire ISR9288 Infiniband 10Gbps x2 (xDDR) x ~700 Ports Sun Galaxy 4 (Opteron Dual core 8-Way) 10480core/655Nodes 50.4TeraFlops OS Linux (SuSE 9, 10) NAREGI Grid MW Unified IB network 10Gbps+External Network 7th on June2006 Top500, TFlops 500GB 48disks 500GB 48disks 500GB 48disks NEC SX-8 Small Vector Nodes (under plan) ClearSpeed CSX600 SIMD accelerator 360 boards, 35TeraFlops(Current) Storage 1 Petabyte (Sun “Thumper”) 0.1Petabyte (NEC iStore) Lustre FS, NFS (v4?)

3 Titech TSUBAME ~80+ racks 350m2 floor area 1.2 MW (peak)

4 Titech Supercomputing Grid 2006
~13,000 CPUs, 90 TeraFlops, ~26 TeraBytes Mem, ~1.1 Petabytes Disk CPU Cores: x86: TSUBAME (~10600), Campus Grid Cluster (~1000), COE-LKR cluster (~260), WinCCS (~300) + ClearSpeed CSX600 (720 Chips) WinCCS TSUBAME すずかけ台 計算工学 C (予定) 数理・計算 C (予定) 35km, 10Gbps 大岡山 1.2km Campus Grid Cluster COE-LKR (知識) cluster

5 ~60 SC Centers in Japan - 10 Petaflop center by 2011
University Computer Centers (excl. National Labs) circa Spring 2006 10Gbps SuperSINET Interconnecting the Centers Hokkaido University Information Initiative Center HITACHI SR11000 5.6 Teraflops ~60 SC Centers in Japan University of Tsukuba FUJITSU VPP5000 CP-PACS 2048 (SR8000 proto) Kyoto University Academic Center for Computing and Media Studies Tohoku University Information Synergy Center FUJITSU PrimePower2500 10 Teraflops NEC SX-7 NEC TX7/AzusA University of Tokyo Information Technology Center Kyushu University Computing and Communications Center HITACHI SR8000 HITACHI SR Teraflops Others (in institutes) FUJITSU VPP5000/64 IBM Power5 p595 5 Teraflops National Inst. of Informatics - 10 Petaflop center by 2011 SuperSINET/NAREGI Testbed 17 Teraflops Tokyo Inst. Technology Global Scientific Information and Computing Center 2006 NEC/SUN TSUBAME 85 Teraflops Osaka University CyberMedia Center Nagoya University Information Technology Center NEC SX-5/128M8 HP Exemplar V2500/N 1.2 Teraflops FUJITSU PrimePower Teraflops

6 Scaling Towards Petaflops…
“Keisoku” >10PF(2011) 2010 Titech “PetaGrid” => Interim 2008 => 2010 NORM for a typical Japanese center? →HPC Software is the key! 10PF US 10P (2011~12?) US HPCS (2010) US Petascale (2007~8) 1PF Next Gen “PetaGrid” 1PF (2010) BlueGene/L 360TF(2005) TSUBAME Upgrade >200TF (2008-2H) 100TF Chinese National Machine >100TF (2007~8) Titech Supercomputing Campus Grid (incl TSUBAME )~90TF (2006) Earth Simulator 40TF (2002) Korean Machine >100TF (2006~7) 10TF Titech Campus Grid KEK 59TF BG/L+SR11100 1.3TF 1TF 2002 2004 2006 2008 2010 2012

7 Nano-Science : coupled simluations on the Grid as the sole future for true scalability
… between Continuum & Quanta. Material physics的 (Infinite system) ・Fluid dynamics ・Statistical physics ・Condensed matter theory   … Molecular Science ・Quantum chemistry ・Molecular Orbital method ・Molecular Dynamics    … m 10 -6 10 -9 実際に分子研などの協力のもとで、開発されたミドルウェアを来年度から ナノサイエンスの分野で実証していく計画 ナノサイエンスは、一言で言ってしまえば、連続体と量子の狭間のスケールを扱う科学分野であって、 一方では、無限系を扱う物性論のアプローチがあり、 もう一方には、分子軌道計算などの分子科学的なアプローチがある。 が、一方では、、もう一方では、、という限界が見えるスケールを扱うことになる、 本質的にマルチフィジックス解析が必要な分野。 翻ってHPCについて言えば(松岡先生の受け売り)昔のHPC環境は、、、等という問題を抱えていた。 これらの問題に対してグリッドミドルウェアは、 より多くのユーザの利用を容易にする、標準的ミドルの適用により、分断されていた計算資源を統合し、 単独サイトでは実行できない超大規模なMPI計算のようなメタコンや、 FMOのようなハイスルコンを可能にすることにより、 IT的な面ではより高性能な計算を実現し、サイエンスの面ではより広い研究分野の、より多くのユーザに対する 統合と協調をもたらすことで、新しい科学とその産業応用のインフラとなることが期待されている。 実行中の FMOはハイスループットコンピューティングの例 マルチフィジックス解析→グリッドアプリデモ・展示 これから、メタコンピューティング:サイトをまたがる大規模MPIジョブがどう実行できるか・どう実行されるか実演。 Limit of Computing Capability Limit of Idealization Multi-Physics Coordinates decoupled resources;    Meta-computing, High throughput computing,   Multi-Physics simulation w/ components and data from different groups within VO composed in real-time Old HPC environment: ・decoupled resources, ・limited users, ・special software, ... The only way to achieve true scalability!

8 Dist. Grid Info Service GridVM GridVM GridVM
LifeCycle of Grid Apps and Infrastructure Application Contents Service HL Workflow NAREGI WFML VO Application Developers&Mgrs  SuperScheduler Dist. Grid Info Service Workflows and Coupled Apps / User Many VO Users MetaComputing Place & register data on the Grid GridRPC/Grid MPI User Apps User Apps User Apps GridVM GridVM GridVM Distributed Servers Assign metadata to data Meta- data Meta- data Meta- data Data 1 Data 2 Data n Grid-wide Data Management Service (GridFS, Metadata, Staging, etc.)

9 Information Service (CIM) Major University Computing Centers
NAREGI Software Stack (beta ) - WS(RF) based (OGSA) SW Stack - Grid-Enabled Nano-Applications (WP6) Grid Visualization Grid PSE Grid  Programming (WP2) -Grid RPC -Grid MPI WP3 Grid Workflow (WFML (Unicore+ WF)) Distributed Information Service (CIM) Super Scheduler WP1 Data (WP4) Packaging (WSRF (GT4+Fujitsu WP1) + GT4 and other services) Grid VM (WP1) Grid Security and High-Performance Grid Networking (WP5) SuperSINET NII IMS Research Organizations Major University Computing Centers Computing Resources and Virtual Organizations

10 List of NAREGI “Standards” (beta 1 and beyond)
GGF Standards and Pseudo-standard Activities set/employed by NAREGI GGF “OGSA CIM profile” GGF AuthZ GGF DAIS GGF GFS (Grid Filesystems) GGF Grid CP (GGF CAOPs) GGF GridFTP GGF GridRPC API (as Ninf-G2/G4) GGF JSDL GGF OGSA-BES GGF OGSA-Byte-IO GGF OGSA-DAI GGF OGSA-EMS GGF OGSA-RSS GGF RUS GGF SRM (planned for beta 2) GGF UR GGF WS-I RUS GGF ACS GGF CDDLM Other Industry Standards Employed by NAREGI ANSI/ISO SQL DMTF CIM IETF OCSP/XKMS MPI 2.0 OASIS SAML2.0 OASIS WS-Agreement OASIS WS-BPEL OASIS WSRF2.0 OASIS XACML De Facto Standards / Commonly Used Software Platforms Employed by NAREGI Ganglia GFarm 1.1 Globus 4 GRAM Globus 4 GSI Globus 4 WSRF (Also Fujitsu WSRF for C binding) IMPI (as GridMPI) Linux (RH8/9 etc.), Solaris (8/9/10), AIX, … MyProxy OpenMPI Tomcat (and associated WS/XML standards) Unicore WF (as NAREGI WFML) VOMS Implement “Specs” early even if nascent if seemingly viable Necessary for Longevity and Vendor Buy-In Metric of WP Evaluation

11 Highlights of NAREGI Beta (May 2006, GGF17/GridWorld)
Professionally developed and tested “Full” OGSA-EMS incarnation Full C-based WSRF engine (Java -> Globus 4) OGSA-EMS/RSS WSRF components GGF JSDL1.0-extension job submission, authorization, etc. Support for more OSes (AIX, Solaris, etc.) and BQs Sophisticated VO support for identity/security/monitoring/accounting (extensions of VOMS/MyProxy, WS-* adoption) WS- Application Deployment Support via GGF-ACS Comprehensive Data management w/Grid-wide FS Complex workflow (NAREGI-WFML) for various coupled simulations Overall stability/speed/functional improvements To be interoperable with EGEE, TeraGrid, etc. (beta2) Release next week at GGF17, press conferences, etc.

12 Ninf-G: A Reference Implementation of the GGF GridRPC API
What is GridRPC? Programming model using RPCs on a Grid Provide easy and simple programming interface The GridRPC API is published as a proposed recommendation (GFD-R.P 52) What is Ninf-G? A reference implementation of the standard GridRPC API Built on the Globus Toolkit Now in NMI Release 8 (first non-US software in NMI) Easy three steps to make your program Grid aware Write IDL file that specifies interface of your library Compile it with an IDL compiler called ng_gen Modify your client program to use GridRPC API user ① Call remote procedures ② Notify results Utilization of remote supercomputers Call remote libraries Internet Large scale computing across supercomputers on the Grid

13 GridMPI MPI applications run on the Grid environment
Metropolitan area, high-bandwidth environment:  10 Gpbs,  500 miles (smaller than 10ms one-way latency) Parallel Computation Larger than metropolitan area MPI-IO computing resource site A computing resource site B Wide-area Network Single (monolithic) MPI application over the Grid environment

14 Grid Application Environment(WP3)
Bio VO Portal GUI  NAREGI Portal Portal GUI Nano VO Grid PSE Grid Workflow Grid Visualization Deployment UI Register UI Workflow GUI Visualization GUI Gateway Services Deployment Service Application Contents Service compile deploy un-deploy Application Repository (GGF-ACS) Workflow Service CFD Visualization Service CFD Visualizer Molecular Visualization Service Molecular Viewer Parallel Visualization Service Parallel Visualizer File /Execution Manager NAREGI- WFML JM I/F module BPEL+JSDL Core Grid Services Underlying Grid Services File Transfer (RFT) Grid File System Workflow Engine & Super Scheduler Distributed Information Service VOMS MyProxy ・・・ ・・・ WSRF

15 WP-3: User-Level Grid Tools & PSE
Grid PSE - Deployment of applications on the Grid - Support for execution of deployed applications Grid Workflow - Workflow language independent of specific Grid middleware - GUI in task-flow representation Grid Visualization - Remote visualization of massive data distributed over the Grid - General Grid services for visualization Each NAREGI middleware provides user interfaces. Grid PSE deploys application program and executable to the computer resources on the Grid. PSE uses information about application program and computer resources and automatically selects candidates of appropriate resources on the Grid and displays them. Workflow user interface is independent of specific Grid middleware. User can define computational procedure of application program on the screen of Workflow tool by connecting each executable and input/output data. Workflow automatically selects appropriate computer resources dynamically at the time of execution. Grid visualization can display remote massive data distributed over the Grid.

16

17 The NAREGI SSS Architecture (2007/3)
Grid-Middleware PETABUS (Peta Application services Bus) Application Specific Service Application Specific Service Application Specific Service WESBUS (Workflow Execution Services Bus) NAREGI- SSS JM EPS CSG BPEL Interpreter Service CESBUS (Coallocation Execution Services Bus; a.k.a. BES+ Bus) FTS-SC GRAM-SC UniGridS-SC AGG-SC with RS (Aggregate SCs) BESBUS (Basic Execution Services Bus) Grid Resource Globus WS-GRAM I/F (with reservation) UniGridS Atomic Services (with reservation) GridVM

18 NAREGI beta 1 SSS Architecture An extended OGSA-EMS Incarnation
Abbreviation SS: Super Scheduler JSDL: Job Submission Description Document JM: Job Manager EPS: Execution Planning Service CSG: Candidate Set Generator RS: Reservation Service IS: Information Service SC: Service Container AGG-SC: Aggregate SC GVM-SC: GridVM SC FTS-SC: File Transfer Service SC BES: Basic Execution Service I/F CES: Co-allocation Execution Service I/F (BES+) CIM: Common Information Model GNIS: Grid Network Information Service NAREGI-WP3 WorkFlowTool, PSE, GVS NAREGI- WFML Submit Cancel Status Delete JSDL NAREGI JM(SS) Java I/F module CreateActivity(FromBPEL) GetActivityStatus RequestActivityStateChanges Submit Status Delete Cancel WFML2BPEL BPEL2WFST JM-Client BPEL (include JSDL) S Invoke EPS CES SS Invoke SC NAREGI JM (BPEL Engine) JSDL SelectResource FromJSDL CreateActivity(FromJSDL) GetActivityStatus RequestActivityStateChanges JSDL MakeReservation CancelReservation JSDL JSDL R S EPS CES GenerateCandidate Set AGG-SC /RS MakeReservation CancelReservation GetGroups- OfNodes JSDL JSDL JSDL CSG R S Generate SQL Query From JSDL R S JSDL CES R S JSDL CES JSDL CES JSDL JSDL Fork/Exec FTS-SC SC(GVM-SC) Fork/Exec SC(GVM-SC) Fork/Exec is-query GRAM4 specific globusrun-ws globusrun-ws OGSA-DAI GRAM4 specific Fork/Exec GFarm server SC GridVM SC GridVM uber-ftp GNIS WS-GRAM CIM IS WS-GRAM globus-url-copy DB PostgreSQL PBS, LoadLeveler PBS, LoadLeveler Co-allocation FileTransfer

19 3, 4: Co-allocation and Reservation
Meta computing scheduler is required to allocate and to execute jobs on multiple sites simultaneously. The super scheduler negotiates with local RSs on job execution time and reserves resources which can execute the jobs simultaneously. Super Scheduler Abstract JSDL (10) Concrete JSDL (8) (1) 14:00- (3:00) + (2) (3)Local RS Local RS2 (EPR) (4)Abstract Agreement Instance EPR Concrete JSDL (8) 15:00-18:00 Execution Planning Services Abstract JSDL (10) create an agreement instance Abstract JSDL (10) Candidates: Local RS 1 EPR (8) Local RS 2 EPR (6) Reservation Service スーパースケジューラは、この図のブルーのモジュールで構成されている。 ①Execution Planning Service (EPS) はジョブの実行全体を管理しており、 NAREGI ワークフローツールから投入された抽象 JSDL を ②Candidate Set Generator (CSG) に渡す。 ③CSG は分散情報サービスからジョブ要件に適した資源候補を得る。 ④資源候補をEnd Point Reference (EPR) としてEPSに戻すが、単一資源では実行できない場合、この例のように複数の資源に分割することが可能である。 ⑤EPS は戻ってきた EPR を元に具象 Agreement Instance を作成する。 ⑥EPS は戻ってきた EPR を元に具象 JSDL を作成し Reservation Service に渡す。 ⑦RS はローカルRSに期待実行時間を持つ具象 JSDLを送り予約を行う。 ⑧ローカルRSは予約に成功すると具象 Agreement Instance を作成する。 ⑨⑩クラスタ2についても、クラスタ1と同様に予約する。 ⑪抽象 Agreement Instance に具象 Agreement Instance を記述する。 具体的には 3-1 IMPI実行サイト決定→サイトρ:1 CPU 3-2 20CPUクラスタ探索→該当なし 3-3 Divide Brokering:ノード数の多い順にクラスタを探す→サイトα:8ノード 3-4 残り2ノード分を探索→サイトμ:2ノード Concrete JSDL (2) 15:00-18:00 Concrete JSDL (8) create an agreement instance with Meta-Scheduling Local RS 1 Service Container Cluster (Site) 1 Candidate Set Generator Concrete JSDL (2) create an agreement instance Local RS 2 Service Container GridVM Cluster (Site) 2 Distributed Information Service Local RS #: Local Reservation Service # Distributed Information Service

20 NAREGI Info Service (beta) Architecture
・ CIMOM Service classifies info according to CIM based schema. ・ The info is aggregated and accumulated in RDBs hierarchically. ・ Client library utilizes OGSA-DAI client toolkit. ・ Accounting info is accessed through RUS. Information Service Node User Admin. Viewer Aggregator Service CIM Providers Data Service Java-API OS Client (Resource Broker etc.) Processor Client Library Light- weight CIMOM Service Grid VM File System RDB Job Queue Performance Resource Usage Service Ganglia  ● Chargeable Service (GridVM etc.) ACL Node A Node B Node C Parallel Query … RUS::insertURs … Hierarchical filtered aggregation Client (publisher) Cell Domain Information Service Cell Domain Information Service

21 NAREGI IS: Standards Employed in the Architecture
User Admin. Viewer Information Service Node Information Service Node Client (OGSA- RSS etc.) Distributed Information Service GT4.0.1 Tomcat Aggregator Service Java-API OGSA-DAI WSRF2.1  Client library CIM spec. CIM/XML CIM Providers APP OGSA-DAI Client toolkit OS APP WS-I RUS RDB Processor Light- weight CIMOM Service Grid VM File System CIM Schema 2.10 /w extension Job Queue ACL Performance GridVM (Chargeable Service) Ganglia  ● Node A Node B Node C RUS::insertURs ... Distributed Query … … Hierarchical filtered aggregation GGF/ UR Client (OGSA- BES etc.) GGF/ UR Cell Domain Information Service Cell Domain Information Service

22 GridVM Features Platform independence as OGSA-EMS SC
WSRF OGSA-EMS Service Container interface for heterogeneous platforms and local schedulers “Extends” Globus4 WS-GRAM Job submission using JSDL Job accounting using UR/RUS CIM provider for resource information Meta-computing and Coupled Applications Advanced reservation for co-Allocation Site Autonomy WS-Agreement based job execution (beta 2) XACML-based access control of resource usage Virtual Organization (VO) Management Access control and job accounting based on VOs (VOMS & GGF-UR) One of the required functionalities of GridVM is to facilitate platform independence. To this end, GridVM abstracts heterogeneous platforms such as operating systems and local schedulers and provide other middlewares with uniform interfaces for the use of resource of each site. And for the inter-operability, these interface and the functionalities have been realized based on the relevant standards. One of the NAREGI’s targeted applications is a meta-computing based on multi-physics. This type of application requires some kind of co-Allocation and GridVM provides an advance reservation service as an infrastructure for that. A site is an autonomous domain itself. Its resources should be used under its administrative policy and need to be protected from illegal usages caused deliberately or inadvertently by grid jobs. GridVM is one of enforcement engine of such site policies. It accepts grid jobs based on agreements reflecting local policies and at the run-time, the jobs are monitored and enforced under the policies. One of the requirements of NAREGI is a VO hosting. As functionalities for that, GridVM controls resource usage of the grid jobs based on the VO to which a job belongs to and generates job accounting records with VO attribute.

23 NAREGI GridVM (beta) Architecture
Virtual execution environment on each site Virtualization of heterogeneous resources Resource and job management services with unified I/F Super Scheduler Information Service Advance reservation, Monitoring, Control Resource Info. GRAM4 WSRF I/F GRAM4 WSRF I/F GridVM Scheduler GridVM Scheduler Local Scheduler Local Scheduler GridVM Engine GridVM Engine GridMPI Accounting Sandbox Job Execution site Policy site Policy AIX/LoadLeveler Linux/PBSPro

24 NAREGI GridVM: Standards Employed in the Architecture
Super Scheduler Information Service GT4 GRAM-integration and WSRF-based extension services CIM-based resource info. provider Job submission based on JSDL and NAREGI extensions GRAM4 WSRF I/F GRAM4 WSRF I/F GridVM Scheduler GridVM Scheduler Local Scheduler Local Scheduler GridVM Engine GridVM Engine GridMPI UR/RUS-based job accounting site Policy site Policy xacml-like access control policy

25 GT4 GRAM-GridVM Integration
Integrated as an extension module to GT4 GRAM Aim to make the both functionalities available SS Site globusrun RSL+JSDL’ GridVMJobFactory Extension Service GridVMJob I’ll talk a little bit about how GridVM relates to GT4. This is the GridVM architecture showing the relationship with GT4. GridVM will be combined as an extension module to GT4 GRAM aiming to make the both functionality available. Blue boxes are GT4 modules, red ones are GridVM modules, and orange ones are adapter modules between the two. For authentication, authorization and basic job management like job submission and job cancel, the functionality of GRAM will be used. ( Because NAREGI system is based on JSDL, we are going to put NAREGI JSDL in the RSL extension at job submission) Other functionalities like advance reservation and sandboxing are given by GridVM. Basic job management + Authentication, Authorization SUDO Delegate GRAM services GRAM Adapter GridVM scheduler PBS-Pro LoadLeveler … Delegation Transfer request Scheduler Event Generator Local scheduler RFT File Transfer GridVM Engine

26 Next Steps for WP1 – Beta2 Stability, Robustness, Ease-of-install
Standard-setting core OGSA-EMS: OGSA-RSS, OGSA-BES/ESI, etc. More supported platforms (VM) SX series, Solaris 8-10, etc. More batchQs – NQS, n1ge, Condor, Torque “Orthogonalization” of SS, VM, IS, WSRF components Better, more orthogonal WSRF-APIs, minimize sharing of states E.g., reservation APIs, event-based notificaiton Mix-and-match of multiple SS/VM/IS/external components, many benefits Robutness Better and realistic Center VO support Better interoperability with external grid MW stack, e.g. Condor-C

27 VO and Resources in Beta 2
Decoupling of WP1 components for pragmatic VO deployment RO3 Client Client Client VO-APL2 RO1 VO-APL1 IS SS SS IS RO2 "Peter Arzberger" VO-RO1 IS SS VO-RO2 IS SS IS IS GridVM IS GridVM IS GridVM IS GridVM IS GridVM IS GridVM IS Policy VO-R01 Policy VO-R01 VO-APL1 Policy VO-R01 VO-APL1 VO-APL2 Policy VO-R01 VO-APL1 VO-APL2 Policy VO-R02 VO-APL2 Policy VO-R02 A.RO  B.RO N.RO1 a.RO b.RO n.RO2

28 NAREGI Data Grid beta1 Architecture (WP4)
Grid Workflow Job 1 Job 2 Job n Data Grid Components Data 1 Data 2 Data n Import data into workflow Data Access Management Place & register data on the Grid Job 1 Job 2 Metadata Management Assign metadata to data Grid-wide Data Sharing Service Meta- data Meta- data Job n Meta- data Data Resource Management Data 1 Data 2 Data n Store data into distributed file nodes Currently GFarm v.1.x Grid-wide File System

29 NAREGI WP4: Standards Employed in the Architecture
Workflow (NAREGI WFML =>BPEL+JSDL) Data Access Management Job 1 Job n Import data into workflow Tomcat 5.0.28 Data 1 Data n Place data on the Grid Super Scheduler (SS) (OGSA-RSS) Globus Toolkit 4.0.1 Data Staging OGSA-RSS FTS SC Metadata Construction GGF-SRM (beta2) Computational Nodes GridFTP Data Resource Management OGSA-DAI WSRF2.0 Job 1 Job 2 Job n OGSA-DAI WSRF2.0 PostgreSQL 8.0 PostgreSQL 8.0 Data 1 Data 2 Data n Data Specific Metadata DB Data Resource Information DB Filesystem Nodes Gfarm 1.2 PL4 (Grid FS)

30 NAREGI-beta1 Security Architecture (WP5)
VOMS MyProxy MyProxy+ VOMS Proxy Certificate VOMS Proxy Certificate User Management Server(UMS) VOMS Proxy Certificate User Private Key Client Environment GridVM Super Scheduler Portal WFT SS client NAREGI CA GridVM PSE VOMS Proxy Certificate VOMS Proxy Certificate GVS GridVM Data Grid Grid File System (AIST Gfarm) disk node disk node disk node

31 NAREGI-beta1 Security Architecture WP5-the standards
Subset of WebTrust Programs for CA GRID CP (GGF CAOPs) VO、Certificate Management Service Information Service Resources Info incl. VO Resource Info. (Incl. VO info) CP/CPS Audit Criteria VOMS MyProxy Proxy Certificate with VO query (requirements +VO info) NAREGI CA resources in the VO ProxyCertificate with VO voms-myproxy-init Super Scheduler Proxy Certificate with VO Get VOMS Attribute Put ProxyCertificate with VO 1.ユーザはNAREGI-CAに証明書発行要求を行い、ユーザ証明書を取得する。 2.ユーザは取得したユーザ証明書を証明書管理サーバへ格納する。 3.ユーザはsshまたはポータル経由で証明書管理サーバへアクセスし、VO情報付きのProxy証明書をMyProxyサーバへ  格納する。(VO情報付きProxy証明書はVOMSを利用することにより生成する) 4.ユーザはブラウザを用いてポータルへアクセスし、グリッドアプリケーションを利用する際にMyProxyサーバから  VO情報付きProxy証明書を取得する。(この時に入力するパスフレーズは3の時にユーザが設定したものである) 5.Proxy証明書に含まれている秘密鍵でジョブに署名を行い、グリッドアプリケーションからSSへジョブを送信する。  このとき、SSに対して直接delegationを行わないようにする(ユーザ権限でSSが起動しないようにする)ため、VO情   報付きProxy証明書をMyProxy+へ格納する。(この時のパスフレーズはジョブのハッシュ値である) 6.SSは受信したジョブのハッシュ値を算出し、MyProxy+からVO情報付きProxy証明書を取得する。 CA Service Certificate Management Server User Certificate Private Key Proxy Certificate withVO globusrun-ws   GridVM services (incl. GSI) Request/Get Certificate ssh + voms-myproxy-init VO Info、Execution Info, Resource Info Client Environment Portal WFT PSE GVM SS client Proxy Certificate with VO Resource GridVM local Info. incl. VO log-in Signed Job Description

32 VO and User Management Service
Adoption of VOMS for VO management Using proxy certificate with VO attributes for the interoperability with EGEE GridVM is used instead of LCAS/LCMAPS Integration of MyProxy and VOMS servers into NAREGI with UMS (User Management Server) to realize one-stop service at the NAREGI Grid Portal using gLite implemented at UMS to connect VOMS server MyProxy+ for SuperScheduler Special-purpose certificate repository to realize safety delegation between the NAREGI Grid Portal and the Super Scheduler Super Scheduler receives jobs with user’s signature just like UNICORE, and submits them with GSI interface.

33 Computational Resource Allocation based on VO
Resource configulation Workflow VO1 VO2 pbg1042 4 CPU png2041 png2040 2 CPU 8 CPU pbg2039 2 CPU 1 CPU 4 CPU 2 CPU 8 CPU 1 CPU 4 CPU Different resource mapping for different VOs

34 Local-File Access Control (GridVM)
Provide VO-based access control functionality that does not use gridmap files. Control file-access based on the policy specified by a tuple of Subject, Resource, and Action. Subject is a grid user ID or VO name. Policy Permit: Subject=X, Resource=R, Action=read,write Deny: Subject=Y,Resource=R, Action=read GridVM DN Access Control Grid User X Resource R Local Account Grid User Y

35 Structure of Local-File Access Control Policy
+TargetUnit <GridVMPolicyConfig> <AccessControl> 1 <AccessProtection> +Default +RuleCombiningAlgorithm <AccessRule> +Effect 1..* <Resources> <AppliedTo> 0..1 <Actions> <Subjects> <Resource> <Action> 0..* <Subject> Who What Resouce Access Type Control user / VO read / write / execute permit / deny file / directory

36 Policy Example (1) Default Applying rules
<gvmcf:AccessProtection gvmac:Default="Permit" gvmac:RuleCombiningAlgorithm="Permit-overrides"> <!-- Access Rule 1: for all user --> <gvmcf:AccessRule gvmac:Effect="Deny"> <gvmcf:AppliedTo> <gvmac:Subjects> … <gvmac:Resources> <gvmac:Resource>/etc/passwd</gvmac:Resource> </gvmac:Resources> <gvmac:Actions> … <!-- Access Rule 2: for a specific user --> <gvmcf:AccessRule gvmac:Effect=“Permit">     <gvmcf:AppliedTo gvmcf:TargetUnit=“user"> <gvmcf:Subjects> <gvmcf:Subject>User1</gvmcf:subject> </gvmcf:Subjects> </gvmcf:AppliedTo > <gvmac:Actions> <gvmac:Action>read</gvmac:Action> </gvmac:Actions> Default Applying rules

37 Policy Example (2) VO name Resource name
      <gvmcf:AccessRule gvmac:Effect="Permit">         <gvmcf:AppliedTo gvmcf:TargetUnit="vo"> <gvmcf:Subjects> <gvmcf:Subject>bio</gvmcf:Subject> </gvmcf:Subjects >   </gvmcf:AppliedTo>         <gvmac:Resources> <gvmac:Resource>/opt/bio/bin</gvmac:Resource>         <gvmac:Resource>./apps</gvmac:Resource> </gvmac:Resources>         <gvmac:Actions>           <gvmac:Action>read</gvmac:Action>           <gvmac:Action>execute</gvmac:Action>         </gvmac:Actions>       </gvmcf:AccessRule> VO name Resource name

38 VO-based Resouce Mapping in Global File System (b2)
Next release of Gfarm (version 2.0) will have access control functionality. We will extend Gfarm metadata server for the data-resource mapping based on VO. VO1 file server file server Gfarm Metadata Server Client VO2 file server file server

39 Current Issues and the Future Plan
Current Issues on VO management VOMS platform gLite is running on GT2 and NAREGI middleware on GT4 Authorization control on resource side Need to implement new functions for resource control on GridVM, such as Web services, reservation, etc. Proxy certificate renewal Need to invent a new mechanism Future plans Cooperation with GGF security area members to realize interoperability with other grid projects Proposal of a new VO management methodology and trial of reference implementation.

40 NAREGI Application Mediator (WP6) for Coupled Applications
Mediator Components Workflow NAREGI WFT co-allocated jobs Support data exchange between coupled simulation Simulation A Simulation A Simulation A Mediator Mediator Mediator Job 1 Job n Data transfer management Simulation B Simulation A Simulation A ・Synchronized       file transfer Information Service 2枚目 1枚目をあまり代えずに、    どのGGFまたは近い(CIM等)の標準が用いられているか示す。       標準の名前は赤でハイライト。       これらはマージし1枚にしてもOK。             参考のため、社内資料で使用した図を最後に添付します。       WP4のテンプレートに焼き付ければ良いのではと思います。 Super Scheduler SQL ・Multiple protocol GridFTP/MPI GridVM GridVM GridVM SBC* -XML ・Global Job ID ・Allocated nodes ・Transfer Protocol etc. Data transformation management API Mediator A API Sim.A Mediator B Sim.B Data1 ・Semantic transform- ation libraries for different simulations OGSA-DAI WSRF2.0 Mediator A Data3 Sim.A Data2 Mediator B JNI Sim.B ・Coupled accelerator Sim.A Mediator A Globus Toolkit 4.0.1 GridFTP GridFTP *SBC: Storage-based communication MPI MPI MPI

41 NAREGI beta on “VM Grid”
Create “Grid-on-Demand” environment using Xen and Globus Workspace Vanilla personal virtual grid/cluster using our Titech Lab’s research results NAREGI beta image dynamic deployment

42 “VM Grid” – Prototype “S”
Request # of virtual grid nodes Fill in the necessary info in the form Confirmation page appears, follow instructions Ssh login to NMI stack (GT2+Condor) + selected NAREGI beta MW (Ninf-G, etc.) Entire Beta installation in the works Other “Instant Grid” research in the works in the lab

43 From Interoperation to Interoperability GGF16 “Grid Interoperations Now”
Charlie Catlett Director, NSF TeraGrid on Vacation Satoshi Matsuoka Sub Project Director, NAREGI Project Tokyo Institute of Technology / NII

44 Interoperation Activities
The GGF GIN (Grid Interoperations Now) effort Real interoperation between major Grid projects Four interoperation areas identified Security, Data Mgmt, Information Service, Job Submission (not scheduling) EGEE/gLite – NAREGI interoperation Based on the four GIN areas Several discussions, including 3 day meeting at CERN mid March, exchanges Updates at GGF17 Tokyo next week Some details in my talk tomorrow

45 Europe: EGEE, UK e-Science, …
The Ideal World: Ubiquitous VO & user management for international e-Science Different software stacks but interoperable Europe: EGEE, UK e-Science, … Grid Regional Infrastructural Efforts US: TeraGrid, OSG, Collaborative talks on PMA, etc. Japan: NII CyberScience (w/NAREGI), … Other Asian Efforts (GFK, China Grid, etc.)… NEES -ED Grid VO Standardization, commonality in software platforms will realize this HEP Grid VO Astro IVO

46 The Reality: Convergence/Divergence of Project Forces (original slide by Stephen Pickles, edited by Satoshi Matsuoka) EU-China Grid (China) AIST-GTRC interoperable infrastructure talks EGEE(EU) gLite / GT2 CSI (JP) NAREGI (JP) WSRF & OGSA, b: GT4/Fujitsu WSRF LCG(EU) GridPP(UK) interoperable infrastructure talks DEISA(EU) IBM Unicore common staff & procedures Globus(US) OSG(US) GGF Condor(US) NGS(UK) UniGrids(EU) TeraGrid(US) interoperable infrastructure talks common users Own WSRF & OGSA OMII(UK) NMI(US) APAC Grid (Australia) GT4 WSRF (OGSA?) WS-I+ & OGSA?

47 GGF Grid Interoperation Now
Started Nov. 17 by Catlett and Matsuoka Now participation by all major grid projects “Agreeing to Agree on what needs to be Agreed first” Identified 4 Essential Key Common Services Authentication, Authorization, Identity Management Individuals, communities (VO’s) Jobs: submission, auditing, tracking Job submission interface, job description language, etc. Data Management Data movement, remote access, filesystems, metadata mgmt Resource discovery and Information Service Resourche description schema, information services

48 “Interoperation” versus “Interoperability”
Interoperability “The ability of software and hardware on multiple machines from multiple vendors to communicate“ Based on commonly agreed documented specifications and procedures Interoperation “Just make it work together” Whatever it takes, could be ad-hoc, undocumented, fragile Low hanging fruit, future interoperability

49 Interoperation Status
GIN meetings GGF16 and GGF17 3-day meeting at CERN end of March Security Common VOMS/GSI infrastructure NAREGI more complicated use of GSI/Myproxy and proxy delegation but should be OK Data SRM commonality and data catalog integration GFarm and DCache consolidation Information Service CIM vs. GLUE schema differences Monitoring system differences fairly Schema translation (see next slides) Job Submission JDL vs. JSDL, Condor-C/CE vs. OGSA SS/SC-VM architectural differences, etc. Simple job submission only (see next slides)

50 Information Service Characteristics
Basic syntax: Resource description schemas (e.g., GLUE, CIM) Data representations (e.g., XML, LDIF) Query languages (e.g., SQL, XPath) Client query interfaces (e.g., WS Resource Properties queries, LDAP, OGSA-DAI) Semantics: What pieces of data are needed by each Grid (various previous works & actual deployment experiences already) Implementation: Information service software systems (e.g., MDS, BDII) The ultimate sources of this information (e.g., PBS, Condor, Ganglia, WS-GRAM, GridVM, various grid monitoring systems, etc.). At present different types of Information services are used in different grid projects in terms of resource description schemas, data models and so on. Many grid projects adopt GLUE schema to describe resources. But in these information service ways to represent the information in GLUE schema are different. Different data models have different query languages and interfaces for clients to retrieve information.

51 NAREGI Information Service
User Admin. Viewer Information Service Node Information Service Node Client (OGSA- RSS etc.) Distributed Information Service GT4.0.1 Tomcat Aggregator Service Java-API OGSA-DAI WSRF2.1  Client library CIM spec. CIM/XML CIM Providers In case of NAREGI IS implementation, the resource description is CIM based schema, the data model is relational, query language is SQL and the client library uses OGSA-DAI client toolkit with our extensions. We pick up only small part of CIM Schema version 2.10 specified by DMTF and extend classes when needed attributes are not found in them. All collected information is accumulated to distributed RDBs, where a CIM class corresponds to a table and an CIM instance of the class corresponds to a record in the table. OGSA-DAI is used to retrieve information. On the other hand, publisher interface is proprietary, where publishers deal with an attribute as a pair of name and value in string type and a CIM instance as set of the attributes. NAREGI IS is a very general type of IS so Resource Usage Service is implemented in the same system. Moreover, job monitoring service will be in it. APP OGSA-DAI Client toolkit OS APP WS-I RUS RDB Processor Light- weight CIMOM Service Grid VM File System CIM Schema 2.10 /w extension Job Queue ACL Performance GridVM (Chargeable Service) Ganglia  ● Node A Node B Node C RUS::insertURs ... Distributed Query … … Hierarchical filtered aggregation GGF/ UR Client (OGSA- BES etc.) GGF/ UR Cell Domain Information Service Cell Domain Information Service

52 Relational Grid Monitoring Architecture
Publish Tuples Producer application Producer Service Register SQL “INSERT” Registry Service Query Tuples SQL “SELECT” Locate Mediator Send Query Consumer application Consumer Service Receive Tuples SQL “CREATE TABLE” Schema Service Vocabulary Manager An implementation of the GGF Grid Monitoring Architecture (GMA) All data modelled as tables: a single schema gives the impression of one virtual database for VO

53 Syntax Interoperability Matrix
Grid Schema Data Query Lang Client IF Software Tera-Grid GLUE XML XPath WSRF RP Queries MDS4 OSG LDIF LDAP BDII NAREGI CIM 2.10+ext Relational SQL OGSA-DAI WS-I RUS CIMOM + OGSA-DAI EGEE/ LCG R-GMA i/f R-GMA Nordu ARC GIIS This shows interoperability matrix in terms of basic syntax and software implementation. 3 of 5 are interoperable but differences from the others are bigger than those among the 3. So it is impossible to get them equal on basic syntax in short term.

54 Low Hanging Fruit “Just make it work by GLUEing”
Identify the minimum common set of information required for interoperation in the respective information service Employ GLUE and extended CIM as the base schema for respective grids Each Info service in grid acts as a information provider for the other Embed schema translator to perform schema conversion Present data in a common fashion on each grid ; WebMDS, NAREGI CIM Viewer, SCMSWeb, …

55 Minimal Common Attributes
Define minimal common set of attributes required Each system components in the grid will only access the translated information ACS CSG CSG ACS Provisioning Provisioning IS EPS EPS IS Reservation Reservation Common attributes SC DC JM JM DC SC Accounting Accounting

56 Information Service Node
GLUE→CIM translation SQL “SELECT” Multi-Grid Information Service Node ・Development of information providers with translation from GLUE data model to CIM about selected common attributes such as up/down status of grid services OGSA-DAI Aggregator Service NAREGI CIM→GLUE producer Cell Domain Information Service for EGEE resources G-Lite / R-GMA Publish Tuples SQL “CREATE TABLE” OGSA -DAI RDB Aggregator Service CDIS for NAREGI SQL “INSERT” CDIS for NAREGI Schema Producer Service Lightweight CIMOM Register System OS NRG_Unicore Account Registry Service Query Tuples CIM provider skeleton GLUE-CIM mapping; selected Minimal Attributes SQL “SELECT” Locate Mediator Send Query GLUE→CIM translator Consumer Service Receive Tuples

57 Interoperability: NAREGI Short Term Policy
gLite Simple/Single Job (up to SPMD) Bi-Directional Submission NAREGI  gLite: GT2-GRAM gLite  NAREGI: Condor-C Exchange Resource Information GIN NAREGI  GIN Submission WS-GRAM BES/ESI TBD Status somewhat Confusing ESI middleware already developed ? Globus 4.X and/or UnicoreGS ?

58 Job Submission Standdards Comparison: Goals
ESI BES NAREGI SS SC(VM) Use JSDL ✔*1 WSRF OGSA Base Profile 1.0 Platform Job Management Service Extensible Support for Resource Models Reliability Use WS-RF modeling conventions Use WS-Agreement Advance reservation Bulk operations Generic management frameworks (WSDM) Define alternative renderings Server-side workflow *1: Extended

59 Job Factory Operations
ESI (0.6) BES (Draft v16) NAREGI (b) Original CreateManagedJob (there is a subscribe option) CreateActivityFromJSDL GetActivityStatus RequestActivityStateChanges StopAcceptingNewActivities StartAcceptingNewActivities IsAcceptingNewActivities GetActivityJSDLDocuments MakeReservations CommitReservations WS-ResourceProperties GetResourceProperty GetMultipleResourceProperties QueryResourceProperties SetResourceProperties WS-ResourceLifeTime ImmediateResourceDestruction ScheduledResourceDestruction Destroy SetTerminationTime WS-BaseNotification NotificationProducer Notify Subscribe Now working with Dave Snelling et. al. to converge to ESI-API (which is similar to WS-GRAM) , plus * Advance Reservation * Bulk submission

60 Interoperation: gLite-NAREGI
gLite-IS [GLUE] IS bridge NAREGI-IS [CIM] GLUE  CIM gLite-WMS [JDL] Condor-C gLite user NAREGI user NAREGI Portal gLite to NAREGI bridge NAREGI client lib ClassAd  JSDL NAREGI-WF generation Job Submit./Ctrl. Status propagation Certification ? NAREGI-SS [JSDL] NAREGI-SC Interop-SC JSDL  RSL Job Submit./Ctrl. Status propagation gLite-CE NAREGI GridVM GT2-GRAM

61 Interoperation: GIN (Short Term)
anotherGrid-IS [CIM or GLUE] IS bridge NAREGI-IS [CIM] GLUE  CIM anotherGrid [JSDL] GT4 another grid user NAREGI user Sorry !! One way now NAREGI Portal NAREGI-SS [JSDL] NAREGI-SC Interop-SC JSDL  RSL Job Submit./Ctrl. Status propagation anotherGrid-CE NAREGI GridVM WS-GRAM WS-GRAM


Download ppt "NAREGI Middleware Beta 1 and Beyond"

Similar presentations


Ads by Google