Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEXAS ADVANCED COMPUTING CENTER Grids: TACC Case Study Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University.

Similar presentations


Presentation on theme: "TEXAS ADVANCED COMPUTING CENTER Grids: TACC Case Study Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University."— Presentation transcript:

1 TEXAS ADVANCED COMPUTING CENTER Grids: TACC Case Study Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University of Texas at Austin (512)

2 2 Outline Overview of TACC Grid Computing ActivitiesOverview of TACC Grid Computing Activities Building a Campus Grid – UT GridBuilding a Campus Grid – UT Grid Addressing common Use CasesAddressing common Use Cases –Scheduling & Flow –Grid Portals ConclusionsConclusions

3 3 TACC Grid Program Building Grids at TACCBuilding Grids at TACC –Campus Grid (UT Grid) –State Grid (TIGRE) –National Grid (ETF) Grid Hardware ResourcesGrid Hardware Resources –Wide range of hardware resources available to research community at UT and partners Grid Software ResourcesGrid Software Resources –NMI Components, NPACKage –User Portals, GridPort –Job schedulers: LSF Multicluster, Community Scheduling Framework –United Devices (desktop grids) Significantly leveraging NMI Components and experienceSignificantly leveraging NMI Components and experience

4 4 TACC Resources: Providing Comprehensive, Balanced Capabilities HPC Cray-Dell cluster: 600 CPUs, 3.67 Tflops, 0.6 TB memory, 25 TB disk IBM Power4 system: 224 CPUs, 1.16 Tflops, 0.5 TB memory, 7.1 TB disk Data storage Sun SAN: 12TB across research and main campuses STK PowderHorn silo: 2.8 PB capacity Visualization SGI Onyx2: 24 CPUs, 25 GB memory, 6 IR2 graphics pipes Sun V880z: 4 CPUs, 2 Zulu graphics pipes Dell/Windows cluster: 18 CPUs, 9 NVIDIA NV30 cards (soon) Large immersive environment and 10 large, tiled displays Networking Nortel 10GigE DWDM: between machine room and vislab bldg. Force10 switch-routers: 1.2Tbps, in machine room and vislab bldg TeraBurst V20s: OC48 video capability for remote, collaborative 3D visualization

5 5 TeraGrid (National) NSF Extensible Terascale Facility (ETF) projectNSF Extensible Terascale Facility (ETF) project –build and deploy the world's largest, fastest, distributed computational infrastructure for general scientific research –Current Members:  San Diego Supercomputing Center, NCSA, Argonne National Laboratory, Pittsburg Supercomputing Center, California Institute of Technology –Currently has 40 Gbps backbone with hubs in Los Angeles & Chicago –3 New Members added in September 2003  The University of Texas (led by TACC)  Oakridge National Labs  Indiana U/Purdue U

6 6 Teragrid (National) UT awarded $3.2M to join NSF ETF in September 2003UT awarded $3.2M to join NSF ETF in September 2003 –Establish 10 Gbps network connection to ETF backbone –Provide access to high-end computers capable of 6.2 teraflops, a new terascale visualization system, and a 2.8-petabyte mass storage system –Provide access to geoscience data collections used in environmental, geological climate and biological research:  high-resolution digital terrain data  worldwide hydrological data  global gravity data  high-resolution X-ray computed tomography data Current software stack includes: Globus (GSI, GRAM, GridFTP), MPICH-G2, Condor-G, GPT, MyProxy, SRBCurrent software stack includes: Globus (GSI, GRAM, GridFTP), MPICH-G2, Condor-G, GPT, MyProxy, SRB

7 7 TIGRE (State-wide Grid) Texas Internet Grid for Research and EducationTexas Internet Grid for Research and Education –computational grid to integrate computing & storage systems, databases, visualization laboratories and displays, and instruments and sensors across Texas. –Current TIGRE particpants:  Rice  Texas A&M  Texas Tech University  Univ of Houston  Univ of Texas at Austin (TACC) –Grid software for TIGRE Testbed:  Globus, MPICH-G2, NWS, SRB  Other local packages must be integrated  Goal: track NMI GRIDS

8 8 UT Grid (Campus Grid) Mission: integrate and simplify the usage of the diverse computational, storage, visualization, data, and instrument resources of UT to facilitate new, powerful paradigms for research and education.Mission: integrate and simplify the usage of the diverse computational, storage, visualization, data, and instrument resources of UT to facilitate new, powerful paradigms for research and education. UT Austin Participants:UT Austin Participants: –Texas Advanced Computing Center (TACC) –Institute for Computational Engineering & Sciences (ICES) –Information Technology Services (ITS) –Center for Instructional Technologies (CIT) –College of Engineering (COE)

9 9 What is a Campus Grid? Important differences from enterprise gridsImportant differences from enterprise grids –Researchers generally more independent than in company with tight focus on mission, profits –No central IT group governs researchers’ systems  paid for out of grants, so distributed authority  owners of PCs, clusters have total reconfigure and participate if willing –Lots of heterogeneity; lots of low-cost, poorly-supported systems –Accounting potentially less important  Focus on increasing research effectiveness allows tackling problems early (scheduling, workflow, etc.)

10 10 UT Grid: Approach Unique characteristics present opportunitiesUnique characteristics present opportunities –Some campus researchers want to be on bleeding edge, unlike commercial enterprises –TACC provides high-end systems that researchers require –Campus users have trust relationships initially with TACC, but not each other How to build a campus grid:How to build a campus grid: –Build a hub & spoke grid first –Address both productivity and grid R&D

11 11 UT Grid: Logical View 1.Integrate distributed TACC resources first (Globus, LSF, NWS, SRB, United Devices, GridPort) TACC HPC, Vis, Storage (actually spread across two campuses)

12 12 UT Grid: Logical View 2.Next add other UT resources in one bldg. as spoke using same tools and procedures TACC HPC, Vis, Storage ICES Cluster

13 13 UT Grid: Logical View 2.Next add other UT resources in one bldg. as spoke using same tools and procedures TACC HPC, Vis, Storage ICES Cluster GEO Cluster

14 14 UT Grid: Logical View TACC HPC, Vis, Storage ICES Cluster GEO Cluster BIO Cluster PGE Cluster 2.Next add other UT resources in one bldg. as spoke using same tools and procedures

15 15 UT Grid: Logical View 3.Finally negotiate connections between spokes for willing participants to develop a P2P grid. TACC HPC, Vis, Storage ICES Cluster GEO Cluster BIO Cluster PGE Cluster

16 16 UT Grid: Physical View Research campus Main campus TACC PWR4 TACC Vis NOC Ext nets GAATN ACES Switch CMS TACC Storage Switch TACC Cluster TACC Cluster ICES Cluster PGE Cluster Switch PGE

17 17 UT Grid: Focus Address users interested only in increased productivityAddress users interested only in increased productivity –Some users just want to be more productive with TACC resources and their own (and others): scheduling throughput, data collections, workflow –Install ‘lowest common denominator’ software only on TACC production resources, user spokes for productivity: Globus 2.x, GridPort 2.x, WebSphere, LSF MultiCluster, SRB, NWS, United Devices, etc.

18 18 UT Grid: Focus Address users interested in grid R&D issuesAddress users interested in grid R&D issues –Some users want to conduct grid-related R&D: grid scheduling, performance modeling, meta- applications, P2P storage, etc. –Also install bleeding-edge software to support grid R&D on TACC testbed and willing spoke systems: Globus 3.0 and other OGSA software, GridPort 3.x, Common Scheduling Framework, etc.

19 19 Scheduling & Workflow Use Case: Researcher wants to run climate modeling job on a compute cluster and view results using a specified visualization resourceUse Case: Researcher wants to run climate modeling job on a compute cluster and view results using a specified visualization resource Grid middleware requirements:Grid middleware requirements: –Schedule job to “best” compute cluster –Forward results to specified visualization resource –Support advanced reservations on vis. resource Currently solved using LSF Multicluster & Globus (GSI, GridFTP, GRAM)Currently solved using LSF Multicluster & Globus (GSI, GridFTP, GRAM) Evaluating CSF meta-scheduler for future useEvaluating CSF meta-scheduler for future use

20 20 What is CSF? CSF (Community Scheduler Framework):CSF (Community Scheduler Framework):  Open source meta-scheduler framework contributed by Platform Computing to Globus for possible inclusion in the Globus Toolkit  Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum (OGSA)  Extensible framework for implementing meta-schedulers –Supports heterogeneous workload execution software (LSF, PBS, SGE)  Negotiate advanced reservations (WS-agreement)  Select best resource for a given job based on specified policies –Provides standard API to submit and manage jobs

21 21 LSF PBS Queuing Service GT3.0 Job Service Reservation Service VO A GT3.0 RM Adapter for PBS GT3.0 RM Adapter for LSF CA Queuing Service GT3.0 Job Service Reservation Service VO B GT3.0 RM Adapter for LSF CA Example CSF Configuration

22 22 Grid Portals Use Case: Researcher logs on using a single grid portal account which enables her toUse Case: Researcher logs on using a single grid portal account which enables her to –Be authenticated across all resources on the grid –Submit and manage job sequences on the entire grid –View account allocations and usage –View current status of all grid resources –Transfer files between grid resources GridPort provides base services used to create customized portals (e.g. HotPages). Technologies:GridPort provides base services used to create customized portals (e.g. HotPages). Technologies: –Security: GSI, SSH, MyProxy –Job Execution: GRAM Gatekeeper –Information Services: MDS, NWS, Custom information scripts –File Management: GridFTP

23 23

24 24 GridPort Application Portals UT/Texas Grids:UT/Texas Grids: –http://gridport.tacc.utexas.edu –http://tigre.hipcat.net NPACI/PACI/TeraGrid HotPages )NPACI/PACI/TeraGrid HotPages ) –https://hotpage.npaci.edu https://hotpage.npaci.edu –http://hotpage.teragrid.org –https://hotpage.paci.org Telescience/BIRN (Biomedical Informatics Research Network)Telescience/BIRN (Biomedical Informatics Research Network) –https://gridport.npaci.edu/Telescience https://gridport.npaci.edu/Telescience DOE Fusion Grid PortalDOE Fusion Grid Portal Will use GridPort based portal to run scheduling experiments using portals and CSF at upcoming Supercomputing 2003Will use GridPort based portal to run scheduling experiments using portals and CSF at upcoming Supercomputing 2003 Contributing and founding member of NMI Portals Project:Contributing and founding member of NMI Portals Project: –Open Grid Computing Environments (OGCE)

25 25 Conclusions Grid technologies progressing & improving but still ‘raw’Grid technologies progressing & improving but still ‘raw’ –Cautious outreach to campus community –UT campus grid under construction, working with beta users now Computational Science problems have not changed:Computational Science problems have not changed: –Users want easier tools, familiar user environments (e.g. command line) or easy portals Workflow appears to be desirable tool:Workflow appears to be desirable tool: –GridFlow/GridSteer Project under way –Working with advanced file mgmt and scheduling to automate distributed tasks

26 26 TACC Grid Computing Activities Participants Participants include most of the TACC Distributed & Grid Computing Group:Participants include most of the TACC Distributed & Grid Computing Group: –Ashok Adiga –Jay Boisseau –Maytal Dahan –Eric Roberts –Akhil Seth –Mary Thomas –Tomislav Urban –David Walling –As of Dec. 1, Edward Walker (formerly of Platform Computing)


Download ppt "TEXAS ADVANCED COMPUTING CENTER Grids: TACC Case Study Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University."

Similar presentations


Ads by Google