Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002.

Slides:



Advertisements
Similar presentations
웹 서비스 개요.
Advertisements

Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Remote Visualisation System (RVS) By: Anil Chandra.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Tableau Software Australia
Welcome to Middleware Joseph Amrithraj
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
MobiShare: Sharing Context-Dependent Data & Services from Mobile Sources Efstratios Valavanis, Christopher Ververidis, Michalis Vazirgianis, George C.
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
1 Processes and Pipes COS 217 Professor Jennifer Rexford.
Web Services Andrea Miller Ryan Armstrong Alex. Web services are an emerging technology that offer a solution for providing a common collaborative architecture.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
Web Services Michael Smith Alex Feldman. What is a Web Service? A Web service is a message-oriented software system designed to support inter-operable.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Numerical Grid Computations with the OPeNDAP Back End Server (BES)
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
The Japanese Virtual Observatory (JVO) Yuji Shirasaki National Astronomical Observatory of Japan.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
2004/12/02Slide Number 1 of 15 Exposure Time Calculator (ETC) as a Web Service Donald McLean 2004 Technology Open House.
MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7 th December 2011.
CHAPTER TEN AUTHORING.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
What is the VSO? (and what isn’t it?). The VSO …  Allows you to search multiple archives in a single search  Keeps you from needing to keep track of.
ALICE, ATLAS, CMS & LHCb joint workshop on
A PPARC funded project Astronomical services: situated software vs. commodity software Guy Rixon, AstroGrid/AVO/IVOA Building Service Based Grids - GGF11.
Chapter 10 Intro to SOAP and WSDL. Objectives By study in the chapter, you will be able to: Describe what is SOAP Exam the rules for creating a SOAP document.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
GAYA Analyzer SDD Presentation. GAYA Analyzer Introduction OMS40G256 is a hardware device used for detection of radioactive radiation for medical imaging.
Construction Planning and Prerequisite
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
NEON School NEON Archive Observing School Alberto Micol ESA Space Telescope Operations Division 15 July 04 ESO & HST Archives.
Configuration Mapper Sonja Vrcic Socorro,
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
Web Services An Introduction Copyright © Curt Hill.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
EGEE is a project funded by the European Union under contract IST Introduction to Web Services 3 – 4 June
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
DEVELOPING WEB SERVICES WITH JAVA DESIGN WEB SERVICE ENDPOINT.
Wednesday NI Vision Sessions
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
A service Oriented Architecture & Web Service Technology.
MONITORING CMS TRACKER CONSTRUCTION AND DATA QUALITY USING A GRID/WEB SERVICE BASED ON A VISUALIZATION TOOL G. ZITO, M.S. MENNEA, A. REGANO Dipartimento.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Added Value to XForms by Web Services Supporting XML Protocols Elina Vartiainen Timo-Pekka Viljamaa T Research Seminar on Digital Media Autumn.
Course: Cluster, grid and cloud computing systems Course author: Prof
Software Overview Sonja Vrcic
Self Healing and Dynamic Construction Framework:
Exploring Azure Event Grid
DISTRIBUTED COMPUTING
Multiple Processor Systems
MWCN`03 Singapore 28 October 2003
Distributed System using Web Services
Presentation transcript:

Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002

Slide 2 Processing? Yes, but where?

Slide 3 VO Ready Archives Are our archives VO ready?? Resource and service descriptions are TBD and data quality standards/descriptions are not yet defined. There is no standard for the description of instrument modes and capabilities, nor for filters, grisms and other relevant optical elements. Once VO data standards have been established, the metadata has to be extracted from the archives or determined. This potentially means reducing all the data to a certain degree and to manually add a lot of observatory information.

Slide 4 Scalable Archive Computing: Why?

Slide 5 Scalable Archive Computing: Actors Two major customer groups: A)Archive internal health checking, archive QC, data migration, metadata extraction, preview and master calibration production. B)External users or systems On-the-fly reduction, cross-correlation, archive retrieval and visualization, VO

Slide 6 Scalable Archive Computing: How? Just 'add' a couple of the following buzzwords to NGAS: GRID, WebServices, UDDI, SOAP, dynamic process distribution MPI, GDFS, Gigabit Ethernet, Myrinet What is NGAS: Next Generation Archive System: Archiving system, which scales like the controlled data volume, i.e. archiving and retrieval time is independent from total data volume.

Slide 7 ● NGAS messages are delivered through HTTP using XML ● All NGAS commands are implemented as standard URLs

Slide 8

Slide 9 NGAS Processing PROCESS (http PUT) request passing XML in the body. PROCESS commands have to be registered in the NGAS config, but else they are just executed in threads as shell commands. NGAS master forwards PROCESS command to the node which holds the data. Tested with small pipeline producing preview frames. Future: Implement processing recipe for optimization of resource usage. Far Future: Implement possible usage of MPI.

Slide 10 Scaling Primitive example: The NGAS units are calculating checksums on all the files every second day. This process took about 10 hours when we had the first complete unit (~ frames). It takes now 10 hours as well (86000 frames)! With careful hardware, software and process configuration this kind of scaling is possible even for complicated processing requests. With smart data distribution and process data flow control it can be improved.

Slide 11 Connection to VO Initially NGAS provides the lowest level of VO data processing exactly where the bulk of the data is. Idle cycles can be offered to higher level processing. NGAS will publish registered commands as web services through an auth/auth interface (GRID). Data can be reduced and the results directly archived. Results are immediately available in the VO context, i.e. fully asynchronous, very large scale reduction is possible.

Slide 12 Access --- Data Archive access is modulated through low level description of the data using known types and units: Example: Access to a specific pixel of an image is usually done through sky coordinates, not in the native pixel space. Metadata provides conversion between the coordinate systems. Problem: Metadata might be incomplete, i.e. conversion inaccurate. VO access is modulated through high level description of services and resources using TBD types and units. Problem: Another layer of metadata, might be even more incomplete, i.e. conversion impossible or simply wrong! Metadata

Slide 13 VO Computing Metadata Archive Computing Metadata for processing description??!! Feasible: Single reduction steps. What's about complete pipelines with parts running on machines around the world?? Sounds like a metadata and configuration nightmare!

Slide 14 Conclusion NGAS can provide a scalable archive and processing environment. Using this we have to clean our house first → make ESO/ECF archives VO compliant, i.e. process most of the data. NGAS does not impose any constraint on the kind of data it handles and the data is still in normal files and on a standard file system. Offering 'VO processing' capabilities seems to be very challenging.