Visualization and Search Approaches for Time-Oriented Scientific Primary Data A WGL-TIB Project Jürgen Bernard Technische Universität Darmstadt Fachgebiet.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

From the eyes of an Administrator A general overview of e-CFunds Administrative Site, including navigation and exploring the features of this powerful.
Chungnam National University DataBase System Lab
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
The use of SDMX at the ECB Xavier Sosnovsky European Central Bank Bonn,
IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor
IST Humboldt University Berlin, Germany – Computer and Media Service – Electronic Publishing Group Birgit Matthaei, 4th Sept. 2003, Bath,
SPS Information Management System (SPS IMS). 2 Why SPS IMS? Since 1995 > 10,000 SPS notifications > 2,000 other SPS documents > 300 specific trade concerns.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
5th FP7 Networked Media Concertation Meeting, Brussels 3-4 February 2010 A unIfied framework for multimodal content SEARCH Short Project Overview.
1 Building scientific Virtual Research Environments in D4Science Paul Polydoras University of Athens, Greece.
Language Specification using Metamodelling Joachim Fischer Humboldt University Berlin LAB Workshop Geneva
0 - 0.
Addition Facts
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Mind Mapping Techniques to Create Proposals APMP Colorado Chapter March 6, 2012 James J. Franklin San Diego PMI Chapter PMI is a registered trade and service.
© Arjen P. de Vries Arjen P. de Vries Fascinating Relationships between Media and Text.
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Collection description & Collection Description Focus JISC/DNER Moving Image & Sound Cluster Steering Group meeting, HEFCE Office, London, 24 September.
Chapter 10 Software Testing
ETIS+: European Transport Policy Information System - Development and Implementation of Data Collection Methodology for EU Transport Modelling Funded by.
Addition 1’s to 20.
Vanderbilt Business Objects Users Group 1 Linking Data from Multiple Sources.
Chapter 19 Design Model for WebApps
Chapter 13 The Data Warehouse
How the University Library can help you with your term paper
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Chapter 8 Improving the User Interface
Content-Based Image Retrieval
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
ADVISE: Advanced Digital Video Information Segmentation Engine
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Data Mining – Intro.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Chapter 1 Introduction to Data Mining
New Information Services in Germany: GetInfo and vascoda 232nd ACS-Meeting, Dr. Irina Sens, 09/06.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Introduction to the Semantic Web and Linked Data
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Capturing from the start: managing grey literature in a brand new research University Mohamed Ba-essa J. K. Vijayakumar.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining – Intro.
Digital Video Library - Jacky Ma.
MATLAB Distributed, and Other Toolboxes
Gedas Adomavicius Jesse Bockstedt
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Personalized Social Image Recommendation
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Self-Organizing Maps for Content-Based Image Database Retrieval
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
CHAPTER 7: Information Visualization
Data Mining: Concepts and Techniques
Data Mining.
Presentation transcript:

Visualization and Search Approaches for Time-Oriented Scientific Primary Data A WGL-TIB Project Jürgen Bernard Technische Universität Darmstadt Fachgebiet Graphisch-Interaktive Systeme (GRIS) Visual Analysis Group Fraunhoferstraße Darmstadt Germany Tel.: +49 (6151) 155 – 666 Fax: +49 (6151) 155 –

Outline 1.Motivation 2.Practical Approach 3.Outlook | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 2

Trends Content-based search and presentation available for many document types Text documents Digital image, video, audio, etc. Repositories for new kinds of non-textural documents Like scientific primary data PANGAEA, PsychData, Dryad, ELEXIR, KoLaWiss Information overload: need for visual retrieval and data exploration 1 Motivation | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 3

Time-oriented scientific primary data Massive amounts of time-oriented scientific primary data that may be valuable for future research Heterogenity of standards in different research disciplines Exploration of scientific primary data repositories currently restricted to meta- search 1 Motivation PANGAEA PanPlot Tool. ( | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 4

Development of a Visual Catalog for time series data Interactive-graphic access to huge amounts of scientific primary data Combination of content-based, and meta data retrieval Explorative data analysis by searching, browsing and zooming techniques User-adaptive search methods Establish higher data transparency and deeper user comprehension 1 Motivation | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 5

Possible Use Case Scenario Natural scientist detects interesting curve progression Hypothesis: curve progression indicates future event Search for similar curve progressions in related data sets Visual overview of the most similar data elements Filter result set, adapt the reference example 1 Motivation | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 6

Outline 1.Motivation 2.Practical Approach 3.Outlook | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 7

2.1 Overview | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 8 Back End Front End Bernard, J. and Brase, J. and Fellner, D. and Koepler, O. and Kohlhammer, J. and Ruppert, T. and Schreck, T. and Sens, I. A Visual Digital Library Approach for Time-Oriented Scientic Primary Data. Accepted at the 14th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), 2010

2.2 Data Import Parse primary data Initialy: use Pangaea data files In principle: inclusion of additional repositories by customized data parsers Define a generic time series data structure Feature based approach Store meta data as bag of words Special attributes: time stamp, parameter/unit Defined data base schema MySQL data base Efficient data management | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 9

2.3 Preprocessing Problem: time series primary data may not be directly applicable for clustering, consider: Outliers, noise Missing values format of time stamps in data files Inhomogenous time quantization, timestamp compatibility Possible approaches Aggregation (binning) Transformation Interpolation | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 10

2.4 Feature Extraction Descriptors: representations of time series in a reduced dimensionality space Basis for similarity measures (clustering, indexing, search) Problem: full sequence vs. sub sequence search Huge amount of descriptors published, suitable descriptor approaches: Binning, (current approach), Discrete Fourier Transformation (DFT), Discrete Wavelet Transformation (DWT), SAX-Descriptor (symbolic representation) | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 11 Lin, J. and Keogh, E. and Lonardi, S. and Chiu, B. : A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003

2.5 Visual Catalog Visualization of clustering results (global und local) Self-organizing Map as a basis for smart visualization of huge amounts of time series Provide a single grid view for details (zooming) We also explore other layout algorithms | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 12

2.5 Visual Catalog Visual-interactive time series search Query by example, query by sketch Problem: fullsequence vs. subsequence search Colormaps for the indication of similarity List-based visualizations for result sets Save session, export results | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 13

2.5 Visual Catalog Meta data search Additional search functionality, combination with content-based search Search by Bag of words Search by special attributes Physical unit, example: temperature, humidity, etc. Use time stamp: define time interval and quantization Combine multiple search operations: filtering | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 14 Filtering

Outline | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 15 1.Motivation 2.Practical Approach 3.Outlook

3 Outlook Summary WGL-TIB project: visual access to scientific primary data Project start: 01/2010, project duration: 3 years Facing time-series data Feature-based descriptor approach Interfaces for visualization, browsing and search | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 16

3 Outlook Requirements, use-cases and evaluation model (TIB Hannover) Test visual catalog prototype with real world problems Collaboration with scientific users to capture user view User in the loop approach Technical future work (GRIS, FhG) Establish an application prototype Similarity search operations, based on evaluation results Special user interface | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 17

Thank you for your attention Comments very welcome Do you have any questions? Acknowledgements | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 18

Related Work PANGAEA Publishing Network for Geoscientic & Environmental Data. ( PsychData National Repository for Psychological Research Data. ( (in German)) Dryad Digital Repository for Data Underlying Published Works. ( ELIXIR European Life Sciences Infrastructure for Biological Information. ( KoLaWiss: Society for Scientic Data Processing Goettingen: Cooperative long-term preservation for research centers (in German).Project Report (2009) PANGAEA PanPlot Tool. ( Bernard, J. and Brase, J. and Fellner, D. and Koepler, O. and Kohlhammer, J. and Ruppert, T. and Schreck, T. and Sens, I.: A Visual Digital Library Approach for Time- Oriented Scientic Primary Data. Accepted at the 14th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), 2010 Lin, J. and Keogh, E. and Lonardi, S. and Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, | Interactive-Graphics Systems | Visual Search and Analysis Group | Jürgen Bernard | 19