TeraGrid-Wide Operations Von Welch Area Director for Networking, Operations and Security NCSA, University of Illinois April, 2009.

Slides:



Advertisements
Similar presentations
Test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05.
Advertisements

1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Science Gateway Security Recommendations Jim Basney Von Welch This material is based upon work supported by the.
Science Gateways on the TeraGrid Von Welch, NCSA (with thanks to Nancy Wilkins-Diehr, SDSC for many slides)
(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
National Center for Supercomputing Applications MyProxy and GSISSH Update Von Welch National Center for Supercomputing Applications University of Illinois.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
TeraGrid Science Gateway AAAA Model: Implementation and Lessons Learned Jim Basney NCSA University of Illinois Von Welch Independent.
Simo Niskala Teemu Pasanen
Core Services I & II David Hart Area Director, UFP/CS TeraGrid Quarterly Meeting December 2008.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
SOA – Development Organization Yogish Pai. 2 IT organization are structured to meet the business needs LOB-IT Aligned to a particular business unit for.
NOS Objectives, YR 4&5 Tony Rimovsky. 4.2 Expanding Secure TeraGrid Access A TeraGrid identity management infrastructure that interoperates with campus.
GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.
TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.
Scaling Account Creation and Management through the TeraGrid User Portal Contact: Eric Roberts
GIG Software Integration Project Plan, PY4-PY5 Lee Liming Mary McIlvain John-Paul Navarro.
1 TeraGrid ‘10 August 2-5, 2010, Pittsburgh, PA State of TeraGrid in Brief John Towns TeraGrid Forum Chair Director of Persistent Infrastructure National.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
1 Preparing Your Application for TeraGrid Beyond 2010 TG09 Tutorial June 22, 2009.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
1 PY4 Project Report Summary of incomplete PY4 IPP items.
The Minnesota State Colleges and Universities system is an Equal Opportunity employer and educator. Information Technology Enterprise Strategic Investment.
Federated Environments and Incident Response: The Worst of Both Worlds? A TeraGrid Perspective Jim Basney Senior Research Scientist National Center for.
TeraGrid Privacy Policy: What is it and why are we doing it… Von Welch TeraGrid Quarterly Meeting March 6, 2008.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
1 TeraGrid and the Path to Petascale John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
TeraGrid Operations Overview Mike Pingleton NCSA TeraGrid Operations December 2 nd, 2004.
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney Senior Research Scientist National Center for Supercomputing Applications University.
TeraGrid NOS Turnover Jeff Koerner Q meeting December 8, 2010.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
NOS Report Jeff Koerner Feb 10 TG Roundtable. Security-wg In Q a total of 11 user accounts and one login node were compromised. The Security team.
User-Facing Projects Update David Hart, SDSC April 23, 2009.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
TeraGrid-Wide Operations DRAFT #2 Mar 31 Von Welch.
TeraGrid User Portal Migration Project Summery Jeff Koerner Director of Operations TeraGrid GIG Matt Heinzel Director TeraGrid GIG September 2009.
TeraGrid Overview John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration University of Chicago/Argonne National Laboratory March 25,
Quality Assurance (QA) Working Group Update July 1, 2010 Kate Ericson (SDSC) Shava Smallen (SDSC)
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney, Terry Fleury, Von Welch TeraGrid Round Table Update May 21, 2009.
Software Integration Highlights CY2008 Lee Liming, JP Navarro GIG Area Directors for Software Integration University of Chicago, Argonne National Laboratory.
Monitoring Guy Warner NeSC Training.
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 NCSA TG RP Update 1Q07.
TeraGrid Program Year 5 Overview John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications University.
TG ’08, June 9-13, State of TeraGrid John Towns Co-Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
Gateways security Aashish Sharma Security Engineer National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign.
TeraGrid Software Integration: Area Overview (detailed in 2007 Annual Report Section 3) Lee Liming, JP Navarro TeraGrid Annual Project Review April, 2008.
Quality Assurance Working Group Doru Marcusiu, NCSA QA Working Group Lead TeraGrid Annual Review April, 2009.
TeraGrid Accounting System Progress and Plans David Hart July 26, 2007.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign This material is based upon work supported by the National Science.
TeraGrid User Portal and Online Presence David Hart, SDSC Area Director, User-Facing Projects and Core Services TeraGrid Annual Review April 6, 2009.
Bob Jones EGEE Technical Director
Joint Techs, Columbus, OH
Stephen Pickles Technical Director, GOSC
Federated Environments and Incident Response: The Worst of Both Worlds
TeraGrid Identity Federation Testbed Update I2MM April 25, 2007
Best practises and experiences in user support
Presentation transcript:

TeraGrid-Wide Operations Von Welch Area Director for Networking, Operations and Security NCSA, University of Illinois April, 2009

Highlights TeraGrid surpassed 1 petaflops of aggregate computing. –Aggregate compute power available is 3.5x times from 2007 to –Primarily result of Track 2 systems at TACC and NICS coming online. –NUs used and allocated is ~4x times from 2007 to Significant improvement in the instrumentation, including tracking of grid usage and data transfers. Inca providing historical tracking of software and service reliability along with a new interface for both users and administrators. An international security incident touched TeraGrid, resulting in a very strong incident response as well as improved procedures for a new attack vector. Improvements in authentication procedures and cross- resource single-sign-on. 2

Big Picture Resource Changes Sun Constellation Cluster (Ranger) at TACC, Feb ’08 –Initially 504 Tflops; upgraded in July 2008 to 580 Tflops Cray XT4 (Kraken) at NICS, Aug ’08 –166 Tflops and 18,000 computing core cores Additional resources that entered production in 2008: –Two Dell PowerEdge 1950 clusters: 668-node system at LONI (QueenBee) and the 893-node system at Purdue (Steele) –PSC’s SGI Altix 4700 shared-memory NUMA system (Pople) –FPGA-based resource at Purdue (Brutus) –Remote visualization system at TACC (Spur) Other improvements: –Condor Pool at Purdue also grew from 7,700 to more than 22,800 processor cores. –Indiana integrated its Condor resources with the Purdue flock, simplifying use. Decommissioned systems: –NCSA’s Tungsten, PSC’s Rachel, Purdue’s Lear, SDSC’s DataStar and Blue Gene, and TACC’s Maverick. 3

TeraGrid HPC Usage, B NUs in Q Kraken, Aug Ranger, Feb B NUs in 2007 In 2008: Aggregate HPC power increased by 3.5x NUs requested and awarded quadrupled NUs delivered increased by 2.5x In 2008: Aggregate HPC power increased by 3.5x NUs requested and awarded quadrupled NUs delivered increased by 2.5x 4

TeraGrid Operations Center Statistics Created 7,762 tickets –Immediately resolved 2,652 (34%) Took 675 phone calls –Immediately resolved 454 (67%) Manage TG Ticket System and 24x7 toll-free call center Respond to all users and provide front-line resolution if possible Routes remaining tickets to User Services, RP sites, etc. as appropriate. Maintain situational awareness across the TG project (upgrades, maintenance, etc.)

Instrumentation and Monitoring Monitoring and statistics gathering for TG services –E.g. Backbone, Grid Services (GRAM, GridFTP) Used for individual troubleshooting – e.g. LEAD Moving to views of the big picture. 6 Daily peak Bandwidth used Inca custom display for LEAD GrudFTP usage by day.

Inca Grid Monitoring System Automated, user-level testing to improve reliability by detecting Grid infrastructure problems. –Provides detailed information about tests and their execution to aid in debugging problems. –2,538 pieces of test data are being collected. Originally designed for TeraGrid, and now successfully used in other large-scale projects including ARCS, DEISA, and NGS Improvements Custom views: LEAD, User Portal Notifications of Errors Historical Views Recognizes scheduled downtime. 20 new tests written 77 TeraGrid tests were modified.

TeraGrid Backbone Network 8 TeraGrid 10 Gb/s backbone runs from Chicago to Denver to Los Angeles. Contracted from NLR. Dedicated 10 Gb/s link(s) from each RP to one of the three core routers. Usage: Daily BW peaks on backbone typically in 2-4 Gb/s range with ~3% increase/month.

Security Gateway Summit (with Science Gateways team) –Form best practices and common processes among sites –Develop understanding between sites and Gateway developers. User Portal Password Reset Procedure Risk Assessments for Science Gateways and User Portal TAGPMA leadership for PKI interoperability Uncovered large-scale attack in collaboration with EU Grid partners. –Established secure communications: Secure Wiki, SELS, Jabber (secure IM) – including EU partners. 9

Single Sign-on GSI-SSHTERM (from NGS) added to User Portal –Consistently in top 5 used applications. –Augments command-line functionality already in place. Started deploying catastrophic failover for Single Sign-on –Replicating NCSA MyProxy PKI to PSC. –Implemented client changes on RPs and User Portal for failover. Developed policies for coherent TeraGrid identities –Identities (X.509 distinguished names) come from allocations, RPs and users – complicated, error-prone process. –Tests written for TGCDB; Inca tests for RPs will follow. Started addition of Shibboleth support for User Portal –TeraGrid now member of InCommon (as a service provider) –Will migrate to new Internet Framework when ready. 10

Questions? 11