11 Sep 2006 NVO Summer School Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY
11 Sep 2006 NVO Summer School The importance of data Data is the raison dêtre of the VO LSST is the data source nonpareil –data rates of 540MB/s ~16TB in 8 hrs –final archive > 3PB of data VO Wheel Well-established ways of handling distributed data: – SRB – PVFS – OGSA-DAI
11 Sep 2006 NVO Summer School Requirements A distributed storage mechanism that allows easy reference to data without concerns about physical location. Primary use cases: –User wants to easily publish and share own data –Data need to reside close to computation nodes Data use cases: –Client has data: stored locally: transfers it to service stored locally: service retrieves it stored elsewhere: service retrieves it –Service generates data: stores it locally: notifies client of location transfers it to the clients local store transfers it to a client-designated store
11 Sep 2006 NVO Summer School Logical architecture User view Logical namespace Physical storage
11 Sep 2006 NVO Summer School VOSpace Provides a uniform interface to existing or new data storage locations (Facade pattern) Structured/unstructured data both first level A peer network of VOSpace servers
11 Sep 2006 NVO Summer School Data structures - I Each data object is represented as a node: Nodes are identified by a vos://[service]/[name] identifier: – Why not ivo://nvo.caltech/vospace/mydata1? – RFC hierarchy
11 Sep 2006 NVO Summer School Unstructured DataNode Data structures - II Each node contains a map of key:value properties: T13:35:51Z There are currently four types of node: Node DataNode Structured DataNode readonly=tru e
11 Sep 2006 NVO Summer School Data structures - III Data nodes contain a list of data views (formats) that the node can accept and provide: …
11 Sep 2006 NVO Summer School Data structures - IV … –Why not use MIME type? Easier to define new astronomy specific data types
11 Sep 2006 NVO Summer School Data structures - V Data transfers are represented by transfers: The format of the data transfer is specified by a view: The protocol of the data transfer is specified by a protocol: … get
11 Sep 2006 NVO Summer School Data structures - VI The space has a list of which protocols the service can accept to fetch data and what protocol endpoints it provides: Why not use protocol schemes?
11 Sep 2006 NVO Summer School Operations - I Service metadata: –getProtocols(): –getViews():, –getProperties():,, Creating and manipulating nodes –createNode( ): –deleteNode(uri): - –listNodes(token, limit, detail, ): token, limit, –moveNode(uri, ): –copyNode(uri, ):
11 Sep 2006 NVO Summer School Operations - II Manipulating node metadata –getNode(uri): –setNode( ): Transferring data –pushToVoSpace(, ):, –pullToVoSpace(, ): –pushFromVoSpace(uri, ): - –pullFromVoSpace(uri, ):
11 Sep 2006 NVO Summer School Authentication and authorization WS-Security Access policies: –No access control –No authorization but authentication –Clients may not create or change nodes –Nodes are considered to be owner by the user who created them.
11 Sep 2006 NVO Summer School Forthcoming attractions Containers Links Asynchronous transfers Querying Replicas
11 Sep 2006 NVO Summer School Federation by links