Presentation is loading. Please wait.

Presentation is loading. Please wait.

UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005.

Similar presentations


Presentation on theme: "UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005."— Presentation transcript:

1 UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

2 Outline Overview –Vision –Strategy –Approach Current Project Status, Near-Term Goals –UT Grid Production Compute Resources Roundup Rodeo –Interfaces to production resources: Grid User Portal Grid User Node –Tools to support resources: GridPort GridShell Metascheduling Prediction Services Future Work and Plans

3 UT Grid Vision: A Powerful, Flexible, and Simple Virtual Environment for Research & Education The UT Grid project vision is to create a cyberinfrastructure for research and education in which people can develop and test ideas, collaborate, teach, and learn through applications that seamlessly harness the diverse campus compute, visualization, storage, data, and instruments as needed from their personal systems (PCs) and interfaces (web browsers, GUIs, etc.).

4 UT Grid: Develop and Provide a Unique, Comprehensive Cyberinfrastructure… The strategy of the UT Grid project is to integrate… –common security/authentication –scheduling and provisioning –aggregation and coordination diverse campus resources… –computational (PCs, servers, clusters) –storage (Local HDs, NASes, SANs, archives) –visualization (PCs, workstations, displays, projection rooms) –data collections (sci/eng, social sciences, communications, etc.) –instruments & sensors (CT scanners, telescopes, etc.) from ‘personal scale’ to terascale… –personal laptops and desktops –department servers and labs –institutional (and national) high-end facilities

5 …That Provides Maximum Opportunity & Capability for Impact in Research, Education … into a campus cyberinfrastructure… –evaluate existing grid computing technologies –develop new grid technologies –deploy and support appropriate technologies for production use –continue evaluation, R&D on new technologies –share expertise, experiences, software & techniques that provides simple access to all resources… –through web portals –from personal desktop/laptop PCs, via custom CLIs and GUIs to the entire community for maximum impact on –computational research in applications domains –educational programs –grid computing R&D

6 Add Services Incrementally, Driven By User Requirements

7 Texas Two-Step: Hub & Spoke Approach Deploying P2P campus grid requires overcoming two trust issues –grid software: reliability, security, and performance –each other: not to abuse one’s own resources Advanced computing center presents opportunity to build centrally manage grid as step to P2P grid –already has trust relationships with users –so, when facing both issues, install grid software centrally first create centrally managed services create spokes from central hub –then, when grid software is trusted show usage and capability data to demonstrate opportunity show policies and procedures to ensure fairness negotiate spokes among willing participants

8 UT Grid: Logical View Integrate a set of resources (clusters, storage systems, etc.) within TACC first TACC Compute, Vis, Storage, Data (actually spread across two campuses)

9 UT Grid: Logical View Next add other UT resources using same tools and procedures ACES Data ACES PCs TACC Compute, Vis, Storage, Data ACES Cluster

10 UT Grid: Logical View Next add other UT resources using same tools and procedures ACES PCs GEO Cluster GEO Data TACC Compute, Vis, Storage, Data GEO Cluster ACES Data ACES Cluster

11 UT Grid: Logical View ACES PCs BIO DataBIO Instrument PGE Cluster PGE Instrument Next add other UT resources using same tools and procedures PGE Data TACC Compute, Vis, Storage, Data ACES Data ACES Cluster GEO Data GEO Cluster

12 UT Grid: Logical View Finally negotiate connections between spokes for willing participants to develop a P2P grid. ACES PCs BIO DataBIO Instrument PGE Cluster PGE Data PGE Instrument TACC Compute, Vis, Storage, Data ACES Data ACES Cluster GEO Data GEO Cluster

13 Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM Benefits for IBM –Increased knowledge of diverse grid user and application requirements in universities –Access to new software technologies developed for UT Grid –Early awareness of new distributed & grid computing R&D opportunities –Exposure & expertise in a variety of grid technologies, open source & commercial, which can be shared internally –Experience to be gained from maintaining a large distributed production grids –Collaboration with UT in conducting new distributed & grid computing R&D activities, including publications, proposals –Exposure among TACC’s collaborators and peers for expertise in grid deployment services, capabilities

14 Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM Benefits for UT Austin –greater access to all resources by entire community –more effective utilization of existing and future resources –unique capabilities presented by access, aggregation, coordination for research, education –enhanced collaborative capabilities among researchers, and among teachers & students Additional Benefits for TACC –increased expertise in grid deployment issues –early awareness of new distributed & grid computing R&D opportunities –platform for conducting new distributed & grid computing R&D activities

15 Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM Benefits for TACC Partners –UT Grid-supported technologies being integrated into TeraGrid: GridPort/user portal, GridShell/user node, etc. –Expertise being developed in scheduling will be used in TeraGrid –UT Grid developments will be used in TIGRE and SURA Grid by TACC partners in UT System, HiPCAT, U.S., Latin America by TACC industrial partners Benefits for Community –UT Grid producing IBM DeveloperWorks articles –UT Grid R&D will produce professional papers in Year 2 (and proposals)

16 TACC Grid Technology & Deployment Activities Provide Synergy Through Tech Transfer UT Grid –creating new tools for integrating compute, vis, storage and data across campus, from ‘personal scale’ to terascale –will exchange tools, experiences with TeraGrid & TIGRE to advance both and be interoperable with each TeraGrid –will utilize & promote UT Grid user portal & user node technologies, and scheduling & workflow results –will provide grid visualization and data collection services to UT Grid, benefiting TACC and IBM TIGRE –will utilize, promote UT Grid results and expertise to other state institutions, including industry –will provide additional experiences with UT Grid technologies from users from across state, helping to refine technologies

17 UT Grid Compute Resources PCs and workstations –Roughly 1/2 are Windows on Intel/AMD and 1/3 are Macs –Most of rest are Linux on Intel/AMD Networks of PCs and Workstations –Roundup: United Devices-managed network of PCs Non-dedicated, heterogeneous compute resources across campus Some managed by TACC, ITS, or other departments; some individually managed Windows, Linux & Mac desktop PCs –Rodeo: Condor-managed network of PCs Dedicated & non-dedicated, heterogeneous compute resources Some managed by TACC, ITS, or other departments; some individually managed Linux, Windows & Mac PCs, plus some workstations Clusters –Lonestar: 1024-processor Linux at TACC –Wrangler: 656-processor Linux cluster at TACC –Longhorn: 128-processors in 4-way IBM p655 nodes at TACC –Other smaller clusters at TACC –Various department/lab cluster from 4 to 128+ processors will be included –Resources have different resource managers (LSF, PBS, SGE) High-end Servers –Longhorn: IBM system 32 Power4 processors, 128 GB memory –Maverick: Sun system w/64 dual-core UltraSPARC 4 procs, 512 GB mem

18 Interfaces and Tools for these Resources For a broad, diverse campus community, access must be easy and from local resources –Users access Grid User Portal with standard web browser Grid User Portal submits to Rodeo via SOAP –UT-Grid Condor Web Services layer developed to facilitate –Condor portlet part of GridPort 4 release Grid User Portal submits to Roundup via Hosted Applications –Users access Grid User Node with SSH Grid User Node submits to Rodeo via GridShell –GridShell provides command line interface through shell façade –Abstracts user from underlying grid technology and complexity –Submits to specific resource or determines most appropriate resource using catalog services Grid User Node submits to Roundup –Batch job submission supported via GridShell –CLI for submitting hosted application jobs

19 Accessing UT Grid Compute Resources Hosted User Nodes & Portals

20 Current Status & Near Term Goals

21 Roundup

22 Roundup: Current Status Roundup is a production UT Grid resource –Production system with over 1000 PCs distributed in campus –Automated account request and creation –Production level consulting –Comprehensive user guide –Training classes offered at TACC Client downloads available for Windows, Mac, Linux from UT Grid web site Hosted Applications Installed –HMMer, BLAST, POV-Ray, Coorset, etc.

23 Roundup: Next Steps Near-term goals (few months): –Support additional production users –GSI Integration United devices GridMP has capability for multiple authentication schemes Need to add support extension for GSI –Evaluate MP Insight data warehousing and report generation package –Test and evaluate screen saver feature and start development of UT specific screen saver –Investigate possible solutions to enable sharing jobs across grids Multi-grid agents or job forwarding

24 Rodeo

25 Rodeo: Current Status Rodeo is a production UT Grid resource –Production system with over 500 PCs made up of dedicated clusters and PCs distributed on campus –Automated account request and creation –Production level consulting –Comprehensive user guide –Training classes offered at TACC Client downloads available for Windows, Mac, Linux from UT Grid web site

26 Rodeo: Current Status Currently the largest production users are: –UTCS (Department of Computer Sciences) –Graeme Henkleman (Chemistry) –Wolfgang Bangerth (Geosciences)

27 Rodeo: Next Steps Near term goals (few months): –Continue supporting production users –Expand on number of CPUs available to users –Explore ‘hosted’ applications possibilities

28 UT Grid Interfaces UT Grid will provide two types of interfaces: –Web-based Grid User Portal (GUP) accessible via any web browser –Customized desktop environments for Linux, Windows and Macintosh PCs to act as Grid User Nodes (GUN). Users can access all UT Grid resources using either the GUP or GUNs managed by UT Grid. They will also be able to download the necessary software to build and host their own customized grid user portals or convert their personal desktop systems into grid user nodes.

29 Motivation for a Grid User Portal Lower the barrier of entry for novice user Provide a centralized grid account management interface –Easy access to multiple resources through a single interface Simple GUI interface to complex grid computing capabilities –Provide simple alternatives to CLI for advanced users Present a “Virtual Organization” view of the Grid as a whole Increase productivity of UT researchers – do more science!

30 Grid User Portal: Current Status Added Roundup and Rodeo as production resources on TACC User Portal Developed JSR-168 Compliant portlets that can: –View information on resources within UT Grid, including status, load, jobs, queues, etc. –View network bandwidth and latency between systems, aggregate capabilities for all systems. –Submit user jobs –Manage files across systems, and move/copy multiple files between resources with transfer time estimates These portlets contribute to GridPort 4 release TACC leading portal effort in TeraGrid –This will impact TACC User Portal and therefore UTGrid

31 Grid User Portal: Next Steps New term plans (few months): –Complete new TACC User Portal (TUP) based on GridPort 4 including UT Grid resources UT Grid capabilities fully integrated into TUP Ability to customize environment to only expose UT Grid resources –Migrate portlets to WebSphere to ensure compatibility(?) –Grid Account Management Portlets

32 Grid User Node The Linux GUN current capabilities: –Information queries about grid resources –Job submission Parallel computing jobs (Dedicated Cluster Resources) Serial computing jobs (Roundup, Rodeo) –Monitoring job status –Reviewing job results –Resource brokering based on ClassAd catalogs –GridFTP enabled GSIFTP

33 Grid User Node: Current Status Production Linux, development Windows and Mac GUNs –Need to decide whether to do GUI versions Submission to Roundup and Rodeo “On-Demand” glide-in of UD resources into Condor pool Integrated “real-life” applications –NAMD –SNOOP3D –HMMeR –POVray

34 Grid User Node: Next Steps Near term goals: –Investigating distribution of GUN software stack using VDT –Prepare and present training class before the end of the year.

35 GridPort: Current Status GridPort 4 developed and released this month Available to UT Grid and national users as a grid portal toolkit to download and create user and application portals Based on JSR-168 compliant portlets Leveraged technology and knowledge in UT Grid to create Condor and Comprehensive file transfer portlets

36 GridPort: Next Steps Near term goals –GridPort4 will be part of the TeraGrid User Portal, to be in production in 1Q06 –Preparing demonstration and lab at Grid Workshop in Venezuela in April 2006 –Continue evolution of GridPort to include: Advanced job submission functionality Advanced user customization, and more –Investigating demo portal based in WebSphere

37 GridShell: Current Status GridShell developed and deployed on UT Grid (and TeraGrid) Available to UT Grid users in the GUN software stack. Able to submit jobs first to a Spoke (departmental cluster) and then to the Hub (TACC) if not enough resources are available at the Spoke. Collaborating with researchers at PSC and Caltech, we have extended GridShell to provide a single job submission interface (Condor) to the heterogeneous clusters on the TeraGrid.

38 GridShell: Next Steps Near term goals –Create a public download site for GridShell 1.0 (current version available only to NSF TeraGrid and UT Grid users). –Continue evolution of GridShell to include: Support submitting jobs to clusters with firewalls –Need to hire an additional developer and developers partnerships with external developers (e.g. GridPort)

39 MPS: Current Status Goal is to reduce turn around times of jobs by optimizing resource selection for data movements, queue wait times, and performance Components –Prediction Services Execution times, queue wait times, file transfer times –Resource Brokering Immediately select resources based on job requirements –Including predictions –Metascheduling Schedule complex jobs such as workflows Workload management

40 MPS: Next Steps Near term goals: –Create prediction web services Based on existing R&D Predictions based on –Historical information –Learning algorithms –Scheduling simulations –Integrate with Condor-G Provide additional information about clusters –The clusters themselves (e.g. number of CPUs) –The jobs submitted to the clusters Add call outs so matchmaker can request predictions User requests minimizing predicted response time as part of ranking –Demonstration with Graham Carey (ICES / UT Austin) Selecting which cluster at TACC to use Matchmaking capability using MPS to rank systems based on user request

41 Future Plans and Work Complete MPS work and integrate campus cluster with TACC clusters –First, just ‘upload’ larger jobs –Later, share jobs among spokes Integrate maverick as remote visualization resource into UT Grid –Overlapping software stack with PCs –Remote vis software downloads (incl. file transfer) –Vis portal Integrate campus data collections into UT Grid –Hosted collections in DBs –WebSphere Information Integrator? Prepare NSF proposal?


Download ppt "UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005."

Similar presentations


Ads by Google