NEESgrid Data Technologies Charles Severance January 8, 2004 NSF Site Visit
NEESgrid Data - Value Proposition n An RDF like store – Referential integrity long- term flexibility n Seamless data and meta data transport n Smooth integration of data with meta data n Extensible tools n Involved with sites through Experiment Based Deployment
NEESgrid Data – Current Elements n Local Repository n Central Repository n JAVA APIs – Run locally on the same system as a repository or over OGSA Web Services –NEES File Management Services –NEES Meta Data Services –NEES Data Mapping Services n Data Viewers –Streaming (numeric, X/Y graph) –Stored (X/Y graph, 2-D structure, video)
NEESdata NEESpop Local Repository Current Elements API Central Repository Data Teamlets Data Acquisition Workstation API Data Teamlets API Data/MD Ingest Tools Data tools Data viewers Grid and Web Services Data Servlets API Mapping
A Simple Experimental Scenario M DAQ System Glue Test Specimen Labview API Data Developer System Data M MD Researcher System Data MD
Repository Browser
Sample of the Video/Data Viewer Data Viewer
Mappings and the Data Viewer n NSDS (ISO 8601 Time channel) n Column data with time recorded as a column n Column – generate time n Column – generate time – trigger filter Channel units: g,g,in,kip Time ATL1 ATT T15:48: T15:48: public class NEESDataMap { public static boolean repoMap(File mainFile, File mappingFile, String mapping) { // Code here }
Data Ingestor
NEES Metadata Representation n NEES Markup Language (NEESML) –Provides an RDF-like structure capable of representing semantic information –XML is the syntax which is used –Logic is more “object oriented” Can define objects Can create objects Can reference objects n Meta data is many different things…. n Goal if we EVER want to build reusable data tools, we have to represent the semantics inside the meta data rather than just the information
NEESML Table 1: Primitive types in NEESML NameDescriptionExamples string Text “Hello, world.” “BN# x” int Integer long Long integer. Can exceed the size of an integer double Double precision floating point number date A moment in time, represented as a date and time stamp in UTC with 1ms resolution :40: :03:48.774
JAVA APIs n
Remaining Work n Second Generation Repository API n Project Browser n Electronic Notebook n Data Turbine n Video as data n Schema/XML Ingestion n RDF Model/Data Ingestion n Curation Tools ***
Repository API n The NFMS and NMDS APIs are being combined into a single Repository API –Rich support for access control n Access control will use Community Authentication Service (CAS) from the Grid technology n Aligning with JSR-170 Java Content Repository n
Project Browser n Joint effort between the NEESgrid SI team and Oregon State technical developers n Based on a project browser prototype at Oregon State University n Provides a user friendly interface to Metadata elements - compliment to the project browser and electronic notebook n
Electronic Notebook n Collaborative effort with the DOE SciDAC –Electronic notebook - metadata entry –Data mapping –Data provenance –Data display –Slide data/metadata jakarta.apache.org/slide/ n Ultimate integration will be via JSR-170 n n collaboratory.emsl.pnl.gov/docs/collab/sam/samtechoverview.html
Data Turbine n Commercial, free data streaming toolkit n Developed by NASA
Data Turbine (cont) n Existing data viewers will be adapted to access and display data from data turbine n Data acquisition software will be adapted to place information in Data Turbine Channels n Metadata elements will be developed to represent data turbine live, stored, and derived channels n New efforts (video as data) will be developed from the ground up using Data Turbine n outlet.creare.com/rbnb/
Video as Data n Follow on to initial demonstration at ORST n Experiment based development: Minnesota n Design phase nearly complete n Joint effort, NEESGrid SI, ORST, Minnesota, UC Davis, Texas, and others as design solidifies
DT Main System Video as data: User Views Still Image / Camera Control ~ <> ^ ^ <> Still Image Viewer <> Camera Control Gateway Data ViewerThumbnail + Audio + Data <>+
Schema/XML Ingestion n Several data efforts are expressing their data/models in Schema/XML (Cosmos, etc) n We are developing capabilities to parse XML and automatically extract relevant metadata for the repository n The entire XML file will be stored as data n This allows data developers to use tools like XMLSpy to develop their models.
RDF Integration n Some of the data and meta data task force members are using Protégé-2000 to develop their models and expressing them in RDF. n RDF and NEESML are very similar but not identical so it may be challenging to ingest any arbitrary RDF n We expect that we will be able to map a subset of RDF to NEESML for ingestion or adapt an RDF parser (Jena or Raptor) to ingest that subset directly into the repository
Curated Data Tools *** n Still evolving fine-grain requirements with community –Sites –Consortium DSAC has this as its focus n Some expected minimum requirements –Transfer between repositories –Workflow - implemented as ACLs (incoming, in- progress, published) –Will be extensions to Repository browser as well as a simple workflow tool
Conclusion n We are focusing on both core elements and the application of those elements n We are engaging the sites increasingly in the going forward development process n We have a lot of work - some of these efforts will continue post-transition with the sites taking an increasing role in the development