Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002.

Similar presentations


Presentation on theme: "Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002."— Presentation transcript:

1 Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002

2 Slide 2 Processing? Yes, but where?

3 Slide 3 VO Ready Archives Are our archives VO ready?? Resource and service descriptions are TBD and data quality standards/descriptions are not yet defined. There is no standard for the description of instrument modes and capabilities, nor for filters, grisms and other relevant optical elements. Once VO data standards have been established, the metadata has to be extracted from the archives or determined. This potentially means reducing all the data to a certain degree and to manually add a lot of observatory information.

4 Slide 4 Scalable Archive Computing: Why?

5 Slide 5 Scalable Archive Computing: Actors Two major customer groups: A)Archive internal health checking, archive QC, data migration, metadata extraction, preview and master calibration production. B)External users or systems On-the-fly reduction, cross-correlation, archive retrieval and visualization, VO

6 Slide 6 Scalable Archive Computing: How? Just 'add' a couple of the following buzzwords to NGAS: GRID, WebServices, UDDI, SOAP, dynamic process distribution MPI, GDFS, Gigabit Ethernet, Myrinet What is NGAS: Next Generation Archive System: Archiving system, which scales like the controlled data volume, i.e. archiving and retrieval time is independent from total data volume.

7 Slide 7 ● NGAS messages are delivered through HTTP using XML ● All NGAS commands are implemented as standard URLs

8 Slide 8

9 Slide 9 NGAS Processing PROCESS (http PUT) request passing XML in the body. PROCESS commands have to be registered in the NGAS config, but else they are just executed in threads as shell commands. NGAS master forwards PROCESS command to the node which holds the data. Tested with small pipeline producing preview frames. Future: Implement processing recipe for optimization of resource usage. Far Future: Implement possible usage of MPI.

10 Slide 10 Scaling Primitive example: The NGAS units are calculating checksums on all the files every second day. This process took about 10 hours when we had the first complete unit (~ 10000 frames). It takes now 10 hours as well (86000 frames)! With careful hardware, software and process configuration this kind of scaling is possible even for complicated processing requests. With smart data distribution and process data flow control it can be improved.

11 Slide 11 Connection to VO Initially NGAS provides the lowest level of VO data processing exactly where the bulk of the data is. Idle cycles can be offered to higher level processing. NGAS will publish registered commands as web services through an auth/auth interface (GRID). Data can be reduced and the results directly archived. Results are immediately available in the VO context, i.e. fully asynchronous, very large scale reduction is possible.

12 Slide 12 Access --- Data Archive access is modulated through low level description of the data using known types and units: Example: Access to a specific pixel of an image is usually done through sky coordinates, not in the native pixel space. Metadata provides conversion between the coordinate systems. Problem: Metadata might be incomplete, i.e. conversion inaccurate. VO access is modulated through high level description of services and resources using TBD types and units. Problem: Another layer of metadata, might be even more incomplete, i.e. conversion impossible or simply wrong! Metadata

13 Slide 13 VO Computing Metadata Archive Computing Metadata for processing description??!! Feasible: Single reduction steps. What's about complete pipelines with parts running on machines around the world?? Sounds like a metadata and configuration nightmare!

14 Slide 14 Conclusion NGAS can provide a scalable archive and processing environment. Using this we have to clean our house first → make ESO/ECF archives VO compliant, i.e. process most of the data. NGAS does not impose any constraint on the kind of data it handles and the data is still in normal files and on a standard file system. Offering 'VO processing' capabilities seems to be very challenging.


Download ppt "Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002."

Similar presentations


Ads by Google