Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.

Slides:



Advertisements
Similar presentations
1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
Advertisements

2  Industry trends and challenges  Windows Server 2012: Modern workstyle, enabled  Access from virtually anywhere, any device  Full Windows experience.
TeraGrid Quarterly Meeting Dec 6-7, 2007 DVS GIG Project Year 4&5 Project List Kelly Gaither, DVS Area Director.
The System Center Family Microsoft. Mobile Device Manager 2008.
File Server Organization and Best Practices IT Partners June, 02, 2010.
High Performance Computing Course Notes Grid Computing.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Technical Review Group (TRG)Agenda 27/04/06 TRG Remit Membership Operation ICT Strategy ICT Roadmap.
(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The University of Texas Research Data Repository : “Corral” A Geographically Replicated Repository for Research Data Chris Jordan.
TG QM Arlington: GIG User Support Coordination Plan Sergiu Sanielevici, GIG Area Director for User Support Coordination
Network, Operations and Security Area Tony Rimovsky NOS Area Director
© 2009 IBM Corporation Delivering Quality Service with IBM Service Management April 13 th, 2009.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.
GIG Software Integration Project Plan, PY4-PY5 Lee Liming Mary McIlvain John-Paul Navarro.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Presenter: John Tkaczewski Duration: 30 minutes February Webinar: The Basics of Remote Data Replication.
CTSS 4 Strategy and Status. General Character of CTSSv4 To meet project milestones, CTSS changes must accelerate in the coming years. Process –Process.
By: Ashish Gohel 8 th sem ISE.. Why Cloud Computing ? Cloud Computing platforms provides easy access to a company’s high-performance computing and storage.
1 PY4 Project Report Summary of incomplete PY4 IPP items.
Kelly Gaither Visualization Area Report. Efforts in 2008 Focused on providing production visualization capabilities (software and hardware) Focused on.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
TeraGrid Quarterly Meeting Dec 5 - 7, 2006 Data, Visualization and Scheduling (DVS) Update Kelly Gaither, DVS Area Director.
HP and Microsoft Solutions for Microsoft Exchange Server 2007 with HP Servers and Storage Presented by: Plaza Dynamics.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
Last Updated 1/17/02 1 Business Drivers Guiding Portal Evolution Portals Integrate web-based systems to increase productivity and reduce.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Microsoft Management Seminar Series SMS 2003 Change Management.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
The User Perspective Michelle Osmond. The Research Challenge Molecular biology, biochemistry, plant biology, genetics, toxicology, chemistry, and more.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
NOS Report Jeff Koerner Feb 10 TG Roundtable. Security-wg In Q a total of 11 user accounts and one login node were compromised. The Security team.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
Education, Outreach and Training (EOT) and External Relations (ER) Scott Lathrop Area Director for EOT Extension Year Plans.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
Data, Visualization and Scheduling (DVS) TeraGrid Annual Meeting, April 2008 Kelly Gaither, GIG Area Director DVS.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
TeraGrid-Wide Operations Von Welch Area Director for Networking, Operations and Security NCSA, University of Illinois April, 2009.
March 2004 At A Glance The AutoFDS provides a web- based interface to acquire, generate, and distribute products, using the GMSEC Reference Architecture.
Software Integration Highlights CY2008 Lee Liming, JP Navarro GIG Area Directors for Software Integration University of Chicago, Argonne National Laboratory.
Microsoft Azure and ServiceNow: Extending IT Best Practices to the Microsoft Cloud to Give Enterprises Total Control of Their Infrastructure MICROSOFT.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
TG ’08, June 9-13, State of TeraGrid John Towns Co-Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
TeraGrid Software Integration: Area Overview (detailed in 2007 Annual Report Section 3) Lee Liming, JP Navarro TeraGrid Annual Project Review April, 2008.
Introduction to Data Management in EGI
Real IBM C exam questions and answers
Microsoft Virtual Academy
Presentation transcript:

Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009

PY 4 Data Area Characteristics Relatively Stable software and user tools Relatively dynamic site/machine configuration –New Sites and Systems –Older systems being retired TeraGrid Emphasis on broadening participation –Campus Champions –Science Gateways –Underrepresented disciplines 2

PY4 Areas of Emphasis Improve campus-level access mechanisms Provide support for gateways and other “mobile” computing models Improve clarity of documentation Enhance user ability to manage complex datasets across multiple resources Develop comprehensive plan for future developments in the Data area Production deployments of Lustre-WAN, path to Global file systems 3

Data Working Group Coordination Led by Chris Jordan Meets bi-weekly to discuss current issues Has membership from each RP Attendees are a blend of system administrators and software developers 4

Wide-Area and Global File Systems Providing a TeraGrid global file system is a highly user requested service Global file system implies that a file system is mounted on most TeraGrid resources –No single file system can be mounted across all TG resources Deploying wide-area file systems, however, are possible with technologies such as GPFS-WAN –GPFS-WAN has licensing issues and isn’t available for all platforms Lustre-WAN is promising for both licensing and compatibility reasons Additional technologies such as pNFS will be necessary to make a file system global 5

Lustre-WAN Progress There is an initial production deployment of Indiana’s Data Capacitor Lustre-WAN on IU’s BigRed, PSC’s Pople –Declared to be production in PY4 (involves testing and implementation of security enhancements) In PY4, had successful testing and commitment to production on LONI’s QueenBee, TACC’s Ranger/Lonestar, NCSA’s Mercury/Abe, and SDSC’s IA64 (expected to go into production before PY5) –Additional sites (NICS, Purdue) will begin testing Q4 PY4 Additionally, in PY4, ongoing work to improve performance and authentication infrastructure –Work in parallel with production deployment 6

CTSS Efforts in the Data Area In PY4, created data kits –data movement kit –data management kit –wide area file systems kit Currently reworking data kits to include: –new client-level kits to express functionality and accessibility more clearly –new server-level kits to report more accurate information on server configurations –broadened use cases –requirements for more complex functionality (managing, not just moving, data) –improved information services to support science gateways and automated resource selection 7

Data/Collections Management In PY4, tested new infrastructure for data replication and management across TeraGrid resources (iRODS) In PY4, made assessment of archive replication and transition challenges In PY4, gathered requirements for data management clients in CTSS 8

Data Collection Highlights Large data collections. –MODIS Satellite Imagery of the Earth. Remote sensing data from Center for Space Research. Grows by ~ 2.4 GB/day, widely used by earth scientists, many derivative products produced. (6 TB) –Purdue Terrestrial Observatory. Remote sensing data. (1.4 TB) –Alaska Herbarium collection. High-resolution scans of > 223,000 plant specimens from Alaska and the Circumpolar North. (1.5 TB) Hosting of Data Collection services within VMs (provides efficient delivery of services related to modest scale data sets) –Flybase – key resource for Drosophila genomics. Front end hosted within VM (2.3 GB) –MutDB – web services data resource – delivers info on known effects of mutations in genes (across taxa). 9

Data Architecture (1) Two primary categories of use for data movement tools in the TeraGrid –Users moving data to or from a location outside the TeraGrid –Users moving data between TeraGrid resources –(Frequently, users will need to do both within the span of a given workflow) Moving data to/from location outside the TeraGrid: –Tend to be smaller numbers of files and less overall data to move –Primarily encounter problems with usability due to availability or ease-of-use 10

Data Architecture (2) Moving data between TeraGrid resources –Datasets tend to be larger –Users are more concerned with performance, high- reliability and ease of use General trend that we have seen – as need for data movement has increased, both the complexity of the deployments and the frustrations of users have increased. 11

Data Architecture (3) This is an area in which we think we can have a significant impact –Users want reliability, ease of use, and in some cases high performance –How the technology is implemented should be transparent to the user. –User initiated data movement, particularly on large systems has proven to create problems with contention for disk resources 12

Data Architecture (4) Data Movement Requirements: –R1: Users need reliable, easy to use file transfer tools for user moving data from outside the TeraGrid to resources inside the TeraGrid. –R2: Users need reliable, high performance, easy to user file transfer tools for using moving data from one TeraGrid resource to another. –R3: Tools for providing transparent data movement are needed on large systems with low storage to flops ratio. (SSH/SCP with the High-performance networking patches (HPN-SCP), SCP-based transfers to gridftp nodes - RSSH) 13

Data Architecture (5) Users continue to request a single file system that is shared across all resources. Wide area file systems have proven to be a real possibility through the production operation of GPFS-WAN. There are still significant technical and licensing issues that prevent GPFS-WAN from becoming a global WAN-FS solution. 14

Data Architecture (6) Network architecture on the petascale systems is proving to be a challenge – only a few router nodes are connected to wide area networks directly and the rest of the compute nodes are routed through these. Wide area file systems often need direct connect access. It has become clear that no single solution will provide a production global wide are network file system. - R4: The “look and feel” or the appearance of a global wide area file system with high availability and high reliability. (LUSTRE-WAN, pNFS) 15

Data Architecture (7) Until recently, visualization and in many cases, data analysis have been considered a post- processing task requiring some sort of data movement. With the introduction of petascale systems, we are seeing data set sizes grow to size that prohibits data movement or makes it necessary to minimize the movement. It is anticipated that scheduled data movement is one way in which to guarantee that the data is present at the time it is needed. 16

Data Architecture (8) Visualization and data analysis tools have not been designed to be data aware and have made assumptions that the data can be read into memory and that the applications and tools don’t need to be concerned with exotic file access mechanisms. - R5: Ability to schedule data availability for post- processing tasks. (DMOVER) - R6: Availability of data mining/data analysis tools that are more data aware. (Currently working with VisIt developers to modify open source software. Leveraging work done on parallel Mesa) 17

Data Architecture (9) Many TeraGrid sites provide effectively unlimited archival storage to compute-allocated users. Almost none of these sites have a firm policy requiring or allowing them to delete data after a triggering event. The volume of data flowing into and out of particular archives is already increasing drastically, in some cases exponentially, beyond the ability of the disk caches and tape drives currently allocated. -R7: The TeraGrid must provide better organized, more capable, and more logically unified access to archival storage for the user community. (Proposal to NSF for unified approach to archival storage) 18

Plans for PY5 Implement Data Architecture recommendations –User portal integration –Data Collections infrastructure –Archival replication services –Continued investigation of new location-independent access mechanisms (Petashare, Reddnet) Complete production deployments of Lustre-WAN Develop plans for next-generation Lustre-WAN and pNFS technologies Work with CTSS team on continued improvements to Data kit implementations 19