Presentation is loading. Please wait.

Presentation is loading. Please wait.

TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Similar presentations


Presentation on theme: "TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,"— Presentation transcript:

1 TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California, San Diego

2 To clarify – “Cyberinfrastructure”...... is a coordinated set of hardware, software, and services, all integrated and working together “CI” encompasses networks, computers, data, sensors, handheld devices, other technologies, and the services or human “glue” that holds them all together. network data computer storage field instrument network computer data network computer viz computer sensors field data wireless The “computer” as an integrated set of resources

3 TeraGrid National Research Cyberinfrastructure includes: Computing systems, Data storage systems, and data repositories, Visualization environments, and People, all linked together by High Performance Networks. 3

4 TeraGrid.... Is an open scientific discovery infrastructure Provides leadership class resources at 11 partner sites Is an integrated, persistent computational resource Is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.

5 SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech USC/ISI UNC/RENCI UW Resource Provider (RP) Software Integration Partner Grid Infrastructure Group (UChicago) LSU U Tenn. The National TeraGrid

6 http://www.teragrid.org/ A complex collaboration of over a dozen organizations working together to provide cyberinfrastructure that goes beyond what can be provided by individual institutions, to improve research productivity and enable breakthroughs not otherwise possible. 6

7 TeraGrid... Uses high-performance network connections (10-30 Tb/sec) Integrates high-performance computers; data resources for analysis, visualization, and storage; data collection tools, high-end experimental facilities; and supporting expertise around the country Provides more than a petaflop of computing capability Consists of more than 30 petabytes of online and archival data storage, as well as systems to manage data acquisition and access Provides researchers access to over 100 discipline-specific databases.

8 What’s in it (TeraGrid) for me? Instruments that delivers high-end IT resources - computation, storage, visualization, and data/service –A computational facility – over a PetaFLOP in parallel computing capability –A data storage and management facility - over 30 PetaBytes of storage (disk and tape), over 100 scientific data collections –A high-bandwidth national data network Services: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources Access - without financial cost –Research accounts allocated via peer review –Startup and Education accounts automatic 8

9 TeraGrid Compute Power Computational Resources (size approximate - not to scale) Slide Courtesy Tommy Minyard, TACC SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR 2007 (504TF) 2009 (~1PF) Tennessee LONI/LS U 9

10 10 TG Data storage and management.1 (tape) TeraGrid provides persistent storage on disk and tape Backups of critical data stored remote from your home Allocatable tape-based storage systems: IU (Indiana University) - geographically distributed NCAR (National Center for Atmospheric Research) - also supports dual copy NCSA (National Center for Supercomputing Applications) SDSC (San Diego Supercomputer Center) Note: In addition, most sites have massive data storage systems that provide storage in support of computation Command line usage is reasonably straightforward with GridFTP, very easy with File Manager tool in the TeraGrid User Portal ©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

11 11 Data storage and management.2 (Disk) GPFS-WAN (General Parallel File System Wide Area Network). ~ 1 petabyte –Home at San Diego Supercomputer Center; may be accessed as if it were a local file system from NCAR, NCSA, IU, UC/ANL IU Data Capacitor –1 petabyte of spinning disk –Primarily for short term storage of data Long term disk storage allocations –Indiana University, National Center for Supercomputing Applications, San Diego Supercomputer Center ©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

12 TeraGrid Architecture Compute Service Viz Service Data Service Network, Accounting, … RP 1 RP 3 RP 2 TeraGrid Infrastructure (Network, Authorization, Accounting, …) POPS Science Gateways User Portal Command Line 12

13 13

14 ????? Translation please!

15 Enter: Science Gateways A Science Gateway –Enables scientific communities of users with a common scientific goal –Has a common interface –Leverages community investment Three common forms: –Web-based Portals –Application programs running on users' machines but accessing services in TeraGrid –Coordinated access points enabling users to move seamlessly between TeraGrid and other grids. 15

16 Today, there are approximately 29 gateways using the TeraGrid

17 How do Gateways help? Makes science more productive –Researchers use same tools –Complex workflows –Common data formats –Data sharing Brings TeraGrid capabilities to the broad science community –Lots of disk space –Lots of compute resources –Powerful analysis capabilities –A community-friendly interface to information and research tools

18 But it’s not just ease of use. What can scientists do that they couldn’t do previously? LEAD - access to radar data NVO – access to sky surveys OOI – access to sensor data PolarGrid – access to polar ice sheet data SIDGrid – analysis tools GridChem – developing multiscale coupling How would this have been done before gateways?

19 Gateways can further investments in other projects Increase access –To instruments Increase capabilities –To data analysis tools Improve workforce development –For underserved populations, through broad access to learning resources Increase outreach Increase public awareness –Public sees value in investments in large facilities Slice bread

20 Gateways Greatly Expand Access Almost anyone can investigate scientific questions using high end resources –Not just those in the research groups of those who request allocations –Gateways allow anyone with a web browser to explore Fosters new ideas, cross-disciplinary approaches Encourages students to experiment But used in production too –Significant number of papers resulting from gateways including GridChem, nanoHUB –Scientists can focus on challenging science problems rather than challenging infrastructure problems

21 Advanced support for Gateway Development Same peer review process used to request resources –30,000 CPUs –+ 6 months of help from a TG Gateway Team member –Reviews based on appropriate use of resources, science is not reviewed if already funded Petascale Multisite workflows Gateways Domain expertise

22 Support is Very Targeted Start with well-defined objectives –Focus on efficient or novel use of national CI resources Minimum.25 FTE for months to a year –Enough investment to really understand and help solve complex problems Must have commitment from PIs –Want to make sure work is incorporated into production codes and gateways Good candidates for targeted support include: –Large, high impact projects –Ability to influence new communities –Suggestions from NSF directorates on important projects Lessons learned move into training and documentation

23 When might a gateway be most appropriate? Researchers using defined sets of tools in different ways –Same executables, different input GridChem, CHARMM –Creating multi-scale or complex workflows –Shared datasets Common data formats –National Virtual Observatory –Earth System Grid –Some groups have invested significant efforts here caBIG, extensive discussions to develop common terminology and formats BIRN, extensive data sharing agreements Difficult to access data/advanced workflows –Sensor/radar input LEAD, GEON

24 TeraGrid Pathways Activities 2 Gateway components –Adapt gateways for educational use by underrepresented communities GEON – SDSC, Navajo Tech –Teach participants from underrepresented communities how to build gateways PolarGrid – IU, ECSU

25 Navajo Technical College and gateways Incorporating the use of gateways in their curricula GEON, GISolve areas of initial interest

26 Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. –Mechanics of Force Propagation in TonB- Dependent Outer Membrane Transport. Biophysical Journal 93:496-504 (2007). –Results of the simulation may be seen at www.life.uiuc.edu/emad/TonB-BtuB/btub- 2.5Ans.mpg www.life.uiuc.edu/emad/TonB-BtuB/btub- 2.5Ans.mpg Modeled mechanisms for transport of molecules through cell membrane. Used 400,000 CPU hours [45 processor-years] on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing Center Image courtesy of Emad Tajkhorshid, UIUC What you can do with the TeraGrid: Simulation of cell membrane processes 26

27 Predicting storms Hurricanes and tornadoes cause massive loss of life and damage to property TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed –Major Goal: assess how well ensemble forecasting predicts thunderstorms, including supercells  tornadoes. –Delivers “better than real time” prediction –Used 675,000 CPU hours for the season –Used 312 TB on HPSS storage at PSC Slide courtesy of Dennis Gannon, IU, and LEAD Collaboration 27

28 PolarGrid Cyberinfrastructure Center for Polar Science (CICPS) –Experts in polar science, remote sensing and cyberinfrastructure –Indiana, ECSU, CReSIS Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland –Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/p olargrid/images/4/42/C005 0-polargrid-big.m4v Source: Geoffrey Fox

29 Components of PolarGrid –Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster –Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. –Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system Access to expensive data High-end resources for analysis MSI student involvement Source: Geoffrey Fox

30 Recent Gateways using TeraGrid Significantly SCEC SIDGrid CIG


Download ppt "TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,"

Similar presentations


Ads by Google