ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,

Slides:



Advertisements
Similar presentations
A Toolbox for Blackboard Tim Roberts
Advertisements

Atlas III Improvements Expands on Atlas II capabilities – Faceted Navigation – counts are displayed next to selectable attribute – Lunar Map interface.
Compliance on Demand. Introduction ComplianceKeeper is a web-based Licensing and Learning Management System (LLMS), that allows users to manage all Company,
Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Digital Video Archiving. ViArchive Overview ViArchive provides user friendly solutions for… – uploading video clips with metadata (searchable file info.
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
The Geant4 physics validation repository
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
SE Document Document Control Software. SE Document SE Document is a Document Management Software System to help you meet all document control requirements.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Product Offering Overview CONFIDENTIAL AND PROPRIETARY Copyright ©2004 Universal Business Matrix, LLC All Rights Reserved The duplication in printed or.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Introduction to Databases and Database Languages
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Submitted by: Madeeha Khalid Sana Nisar Ambreen Tabassum.
Chapter 11 Databases.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Overview of SQL Server Alka Arora.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Oracle Application Express 3.0 Joel R. Kallman Software Development Manager.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Tutorial 121 Creating a New Web Forms Page You will find that creating Web Forms is similar to creating traditional Windows applications in Visual Basic.
Re-Implementing ERM MENA-IUG 5 th Annual Conference 1-2 November 2010.
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
+ Plug-in Demo: Dropbox Judith Schwartz Fall, 2011.
Rescuing EDL Data Presented by Adrian Tinio Jet Propulsion Laboratory, Pasadena, CA.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
By N.Gopinath AP/CSE Cognos Impromptu. What is Impromptu? Impromptu is an interactive database reporting tool. It allows Power Users to query data without.
Omeka software exploration for LOR repository. Omeka  Omeka is opensource – that means there is no cost to use the software  Omeka can be hosted on.
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
Database Concepts Track 3: Managing Information using Database.
CaIntegrator2 – Part 1: Create a Study with Clinical Data Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
Running Kuali: A Technical Perspective Ailish Byrne (Indiana University) Jonathan Keller (University of California, Davis)
CS779 Term Project Steve Shoyer Section 5 December 9, 2006 Week 6.
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Collection Management Systems
Collections Management Museums What’s new in EMu ? Part II Bernard Marshall Chief Technology Officer KE Software.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
BOF-1147, JavaTM Technology and WebDAV: Standardizing Content Management Java and WebDAV Juergen Pill Team Leader Software AG Remy Maucherat Software Engineer.
REMI Database Antall Fernandes. REMI ● A relational database to facilitate data - metadata organization of various research studies. ● Interface into.
Software Application Overview
Statistical Information Systems Introducing SIS tool .Stat
LCG Monte-Carlo Events Data Base: current status and plans
Databases.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
DAT381 Team Development with SQL Server 2005
Metadata The metadata contains
Tutorial 7 – Integrating Access With the Web and With Other Programs
Integrated Statistical Production System WITH GSBPM
Palestinian Central Bureau of Statistics
Presentation transcript:

ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab, Batavia, Illinois

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data2 Management of Unstructured and Structured Data Structured vs. unstructured data Data management challenge in R&D environment Concept of integrated data management system WebDat Design Design goals Required functionality Interaction with the system Lifecycle of information access WebDat Implementation Technologies Deployment User interface On-line generated reports Web service Conclusion Features Summary Outline Metadata Contents Automatic processing

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data3 I. Management of Unstructured and Structured Data

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data4 Structured & Unstructured Data Information systems have grown around structured data made up of fields, columns, tables, indices; very predictable, static and ordered environment. The unstructured data have unknown organization and consist of documents, spreadsheets, rich media information; rather disorderly and fairly dynamic environment. Two major classes of data management systems evolved: DBMS to manage homogenous, well-typed structured data. CMS to manage non-traditional, heterogeneous, unstructured contents. Information regardless of its format, source, location has to be easily managed, searched and accessed. Access to all data is needed. Systems that will provide an integrated, uniform access to heterogeneous information are needed.

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data5 Levels of Structure in Data lower higher Flat files & batch processing XML & Web services Database & synchronization The level of structure of data provided by various data management and exchange models.

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data6 R&D Data Management Challenges There exist production systems as well as rapidly developed systems. Information ranges from well-structured, homogeneous, and stable (well-suited for DBMS) to unstructured collections of data or documents (well-suited for CMS). Data is owned and kept by several groups and individuals in various formats and locations. Systems have different level of completeness in data handling and configuration management. Information about DAQ systems varies since they range from configurable to single-purpose systems (possibly tailorable via parameter files).

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data7 Concept of Integrated System Create a set of common and necessary metadata that is configurable via UI. Use common metadata for unstructured data in files and structured data in databases. Define metadata first and create tests and upload data, contents and comments later. Allow for manual and programmatic access.

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data8 II. WebDat: Design

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data9 WebDat Design Goals Adapt to various levels of integration and maturity of DAQ systems. Allow registering and cataloging data and related documents for each test. Keep data and analysis results organized, searchable and accessible (allow searching and navigating to data). Authenticate and control access (passwords, groups). Integrate documents and data. Preserve information about the DAQ system, test procedure, data acquisition system, data reduction, etc. (as much as supplied) Develop a Web-based system using a database to organize data and documents pertaining to tests. The system will allow for sharing information and results. It will also:

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data10 WebDat Functionality Maintain information about systems and subjects Keep information about measurement infrastructure (facilities, stands, hardware, software, versions). Register new test types and series. Register new subject types, subject series and subjects. Register new tests/upload data and documents Register tests (measurements) and relate them to infrastructure. Store data for a registered test (user file upload, programmatic file upload or data inserts). Store any documents pertaining to the test (test plans, reports, screenshots, configurations, etc.). Add comments to stored tests about documents, results, etc. Retrieve/search/download data and documents Retrieve (download) all submitted documents and data files. Search tests using subject names, dates, test types, etc. View reports generated on-line from data (limited to ‘installed reports’). View statistical info (e.g., contents for given test, tests in last week, contents for given subject, contents for given test type).

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data11 Metadata Test/Run Test attributes: required test parameters entered by the user Test tags: keywords characterizing test Test types: a type of test defined by its keywords and tags Test series: a collection (series) of tests Subject Subject type: a type of subject (e.g., dipole, cavity) Subject series: a series of subjects Test/Run environment Location: facility-stand pair Software: system and version Hardware: systems and version Defined before data is generated and uploaded

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data12 Metadata

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data13 Unstructured and Structured Data A document An uploaded file (Word, Excel, jpg, gif, …) Data Files Data kept in relational model (loaded to tables) Comments (on-line entry) Test items – a collection of data and documents pertaining to a test (measurement).

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data14 Interactions with WebDat

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data15 Lifecycle of WebDat Information DAQ system definition Test subject definition Metadata Data Location definition Create test instance Upload data Tester comments Upload analysis results, images Download test data Analyst comments Test definition Comments View reports Search tests Add comments Test coordinator Analyst Tester/ program User Insert attributes

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data16 Data Insertion & Retrieval Web-based interface (interactive access) Insert (upload) data (drag & drop or browsing) Search for tests based on metadata (location, DAQ, test subject, tags, attributes) Download data View comments, preview data View reports Web service (programmatic access) Upload of data Query for test information Data retrieval

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data17 Automatic Processing Automatic processing of data upon upload (pre-processing). Examples include: Format change Format and contents verification Automatic analysis Insertion to a database Compress data Automatic processing of data upon download (post-processing). Examples include: Export/conversion to CSV or Excel format Uncompress data

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data18 II. WebDat: Implementation

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data19 Architecture

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data20 WebDat Deployment

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data21 WebDat UI

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data22 Reports Interactive and static reports. Ability to produce reports from file and database sources.

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data23 Other Features of WebDat Secure authentication and role-based authorization. Administration of users and groups. Test lifecycle management. On-the-fly compression of data.

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data24 Technologies JavaServer Faces MySQL database JasperReports Apache/Tomcat Apache Axis (Web service) Eclipse & Sun Java Studio Creator

ACAT 2008 J.M. Nogiec et al.WebDat: Bridging the gap between unstructured and structured data25 Summary There is a need and interest in uniform management of structured and unstructured data. R&D environment is one of the areas that would benefit from such approach. WebDat addresses this need by providing a system based on a common set of metadata for structured and unstructured data. WebDat provides consistent access to file and database data sources. WebDat provides for interactive (JSF-based UI) and programmatic (Web service) insertion and retrieval of data as well as pre and post processing of data.