Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Management Chris A. Mattmann OODT Component Working Group.

Similar presentations


Presentation on theme: "File Management Chris A. Mattmann OODT Component Working Group."— Presentation transcript:

1 File Management Chris A. Mattmann OODT Component Working Group

2 13-Apr-15FILE-MGMTCAM-2 What is File Management? Managing the locations and ancillary information about files, and collections of files –Ancillary information is metadata What’s a product? –A collection of some set of files, and/or collections of files So, you could have collections of other collections –Along with metadata about the product

3 13-Apr-15FILE-MGMTCAM-3 The state of things The existing CAS system does file management –For past missions and projects, it’s done the job well CAS implementation –Needs an update, and overall refactoring to allow for modularity and separation of concerns, and general technology and architectural updates In particular, a couple of new requirements and drivers for projects –Suggested some ways to extend and improve the CAS to satisfy the new requirements and drivers What are these new requirements and drivers?

4 13-Apr-15FILE-MGMTCAM-4 New Requirements and Drivers Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types –rather than the monolithic and inflexible existing method of ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types. Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion.

5 13-Apr-15FILE-MGMTCAM-5 New Requirements and Drivers Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API.Java's Transaction API If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons- dbcp, available from Apache.commons- dbcp Apache

6 13-Apr-15FILE-MGMTCAM-6 New Requirements and Drivers Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user- centric, and what are administrative-centric. Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels –rather than only supporting the existing method of flat product structures, where all files in a product are at the same tree level.

7 13-Apr-15FILE-MGMTCAM-7 New Requirements and Drivers Support metadata extraction based on product type or mime-type Support dynamic product types. The file management component should not need to know about every product type a priori

8 13-Apr-15FILE-MGMTCAM-8 New Requirements and Drivers You can read/add to the list –Available at: http://oodt.jpl.nasa.gov/wiki/display/oodt/Fil e+Management http://oodt.jpl.nasa.gov/wiki/display/oodt/Fil e+Management Please, speak your mind!

9 13-Apr-15FILE-MGMTCAM-9 File Management: Architectural implications Managing files –Data Store: follow the typical repository pattern –Manage information about Products, Product Types, and References to products Managing metadata –Metadata Store: follow the typical registry pattern –Manage product Metadata Key/Value pairs Separate out the data store and metadata store –This allows data and metadata to be managed independently

10 13-Apr-15FILE-MGMTCAM-10 Data Store

11 13-Apr-15FILE-MGMTCAM-11 Metadata Store

12 13-Apr-15FILE-MGMTCAM-12 How is this different from the existing CAS? Separation of concerns –Anything to do with data goes into the data store package –Anything to do with metadata goes into the metadata store package Modularity –Can have different backend implementations of standard interfaces for data stores and metadata stores Lucene as a backend for metadata, or if you prefer, traditional DB backend –Can have multiple data stores and metadata stores per CAS The existing CAS lumped these two capabilities together –Was difficult to reason about how to pull them apart

13 13-Apr-15FILE-MGMTCAM-13 What else do we need to do File Management? Need a way to transfer a product from the client to the File Management service –Client gives URIs of files, or collections of files, which identify References belonging to a Product

14 13-Apr-15FILE-MGMTCAM-14 Data Transfer Architecture

15 13-Apr-15FILE-MGMTCAM-15 Transferring files How does the transfer actually occur? You as a developer define how that happens –Implement the transferProduct(Product p) method –Can have many different types of data transfer Local –Use native system calls, or cp Remote –Use whatever protocol you want, XML-RPC, SOAP, WebDAV, etc. –Don’t use CORBA or RMI: they’re sooooo last year!

16 13-Apr-15FILE-MGMTCAM-16 Translating the URIs Translating the URIs from the client to the File Manager presents an interesting challenge –For example, where should file:///home/chris/myfile.file be transferred to on the File Manager’s system? file:///home/chris/myfile.file Leverage and extend existing CAS method –Existing CAS would have answered the above questions with ProductTypeRepositoryPath/ProductName/Versio nId/ –Why should that be the only answer?

17 13-Apr-15FILE-MGMTCAM-17 Versioners Have the concept of a Versioner interface Versioner is called by the File Manager before the product is transferred from the client to the File Manager system –Versioner uses the Product metadata, and the original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product

18 13-Apr-15FILE-MGMTCAM-18 Versioner Architecture

19 13-Apr-15FILE-MGMTCAM-19 Versioner Example Given an mp3 Product, with Metadata: –Mp3Artist: 50cent –Mp3Genre: rap And with references: –file:///home/chris/mp3s/gangsta-rap.mp3file:///home/chris/mp3s/gangsta-rap.mp3

20 13-Apr-15FILE-MGMTCAM-20 Versioner Example Use a MusicVersioner public class MusicVersioner implements Versioner{ public void createDataStoreReferences(Product p, Metadata m) throws VersioningException{ String origUri = ((Reference)p.getReferences().get(0)).getOrigReference(); String mp3RepoPath = getRepoPath(“Mp3ProductTypeName”); String dataStoreUri = mp3RepoPath + m.getElementMap().get(“Mp3Genre”) + “/” + m.getElementMap().get(“Mp3Artist”) + “/” + getFileName(origUri); ((Reference)p.getReferences().get(0).setDataStoreRef(dataStoreUri); }

21 13-Apr-15FILE-MGMTCAM-21 Versioner Example So –file:///home/chris/mp3s/gangsta-rap.mp3file:///home/chris/mp3s/gangsta-rap.mp3 …Yields –file:///path/to/mp3/repo/rap/50cent/gangsta- rap.mp3file:///path/to/mp3/repo/rap/50cent/gangsta- rap.mp3

22 13-Apr-15FILE-MGMTCAM-22 The File Manager So, how do we put all these different generic interfaces together? Well, something like the following –A File Manager has… One or more data stores, to store data to One or more metadata stores, to store metadata to A set of Versioners that are associated with Product Types in order to figure out how to generate the reference data store URIs for a particular product A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs An external interface to it (e.g., XML-RPC, WebDAV, etc.)

23 13-Apr-15FILE-MGMTCAM-23 What’s implemented so far? The basic components of the architecture Several default implementations of the interfaces –javax.sql.DataSource based implementations of DataStore and MetadataStore Uses Apache’s DBCP for connection pooling –Local Data Transfer using Apache’s commons-io component that can handle heirarchical product structures, as well as flat product structures –Several versioners, including one that versions Products using the existing CAS approach of ProductTypeRepositoryPath/ProductName/Version, along with one that versions a product’s references based on production date time –An external interface based on Apache’s XML-RPC

24 13-Apr-15FILE-MGMTCAM-24 What needs to be done? A lot! –Check out http://oodt.jpl.nasa.gov/vc/, and log in with your JPL Username and Password. Navigate to “SVN”, and check out the cas-filemgr component.http://oodt.jpl.nasa.gov/vc/ –Modify the code –Look for bugs –Contribute! I find new bugs everyday –Feel free to talk to me about it –Create issues in JIRA (http://oodt.jpl.nasa.gov/jira/)http://oodt.jpl.nasa.gov/jira/ Bug Fixes, RFIs, new features, you name it! Be sure to check out the apidocs –You can build these yourself by checking out cas-filemgr from our SVN repository, and then typing: maven site –Or you can visit: http://terra.jpl.nasa.gov/~mattmann/oco/javadoc/cas- filemgr/http://terra.jpl.nasa.gov/~mattmann/oco/javadoc/cas- filemgr/

25 13-Apr-15FILE-MGMTCAM-25 Questions?


Download ppt "File Management Chris A. Mattmann OODT Component Working Group."

Similar presentations


Ads by Google