VOMegaPlot Efficient Plotting of Large VOTable Datasets.

Slides:



Advertisements
Similar presentations
© Copyright 2008 All rights reserved 2 VO-India Project Started in 2002 as a collaboration between IUCAA and Persistent Systems Ltd. Part of International.
Advertisements

Introduction to C Programming
Designed and Presented by Dr. Ayman Elshenawy Elsefy Dept. of Systems & Computer Eng.. Al-Azhar University
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
1 Multi-way Algorithm for Cube Computation CPS Notes 8.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Why ROOT?. ROOT ROOT: is an object_oriented frame work aimed at solving the data analysis challenges of high energy physics Object _oriented: by encapsulation,
Overview of Data Structures and Algorithms
Basic Algorithms on Arrays. Learning Objectives Arrays are useful for storing data in a linear structure We learn how to process data stored in an array.
Engineering H192 - Computer Programming Gateway Engineering Education Coalition Lect 20P. 1Winter Quarter Propulsion Lab with MATLAB Lecture 20.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Spatial Information Systems (SIS) COMP Raster-based structures (1)
Information Retrieval IR 4. Plan This time: Index construction.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Image processing Second lecture. Image Image Representation We have seen that the human visual system (HVS) receives an input image as a collection of.
Chapter 7: Arrays. In this chapter, you will learn about: One-dimensional arrays Array initialization Declaring and processing two-dimensional arrays.
MCTS Guide to Microsoft Windows 7
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
Today  Table/List operations  Parallel Arrays  Efficiency and Big ‘O’  Searching.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.
1099 Why Use InterBase? Bill Todd The Database Group, Inc.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Chapter 7: Sorting Algorithms Insertion Sort. Sorting Algorithms  Insertion Sort  Shell Sort  Heap Sort  Merge Sort  Quick Sort 2.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
CSC 211 Data Structures Lecture 13
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
File Processing - Hash File Considerations MVNC1 Hash File Considerations.
Review 1 Arrays & Strings Array Array Elements Accessing array elements Declaring an array Initializing an array Two-dimensional Array Array of Structure.
Objectives At the end of the class, students are expected to be able to do the following: Understand the searching technique concept and the purpose of.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
ElVis Improvements Summer 2008 Eric Zatz PPPL Summer Intern Mentor – Eliot Feibush August 11, 2008.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Course Code #IDCGRF001-A 5.1: Searching and sorting concepts Programming Techniques.
Engineering H192 - Computer Programming The Ohio State University Gateway Engineering Education Coalition Lect 20P. 1Winter Quarter Propulsion Lab with.
Lecture 9 Architecture Independent (MPI) Algorithm Design
Bigtable: A Distributed Storage System for Structured Data
Virtual Observatory India VOStat Statistical Analysis for the Virtual Observatory By Deoyani and Mohasin.
Creation and Visualization of 3D Scenes with the MRPT library January, 2007 Jose Luis Blanco Claraco Dept. of Automation and System Engineering University.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
MCTS Guide to Microsoft Windows 7
CSE-291 (Cloud Computing) Fall 2016
Types of geodatabases Introduction to GIS - Student notes
Database Performance Tuning and Query Optimization
Lesson 1: Introduction to Trifacta Wrangler
Applying Twister to Scientific Applications
Lesson 1 – Chapter 1B Chapter 1B – Terminology
MapView: visualization of short reads alignment on a desktop computer
Sonali Kale 30 Sep 2004 VOPlot 3D Sonali Kale 30 Sep Nov-18 VO-India.
Database Systems Instructor Name: Lecture-3.
Chapter 11 Database Performance Tuning and Query Optimization
Sharmad Navelkar Virtual Observatory India
Group 9 – Data Mining: Data
Presentation transcript:

VOMegaPlot Efficient Plotting of Large VOTable Datasets

VOPlot  VOPlot is a tool for visualizing astronomical data that is available in the VOTable format.VOTable  VOPlot reads the xml file in order to load entire data into memory and then processes it to draw various types of plots.  This approach of loading the entire data into memory cannot be used for VOTable files that are very large.

Approach for VOMegaPlot  VOMegaPlot preprocesses the xml file to create intermediate files which are subsequently used for plotting.  Entire data is divided into fixed sized blocks and individual blocks are loaded into memory thus reducing the memory requirement.  The number of intermediate files created is equal to the number of columns present in the xml file.

Pre-processing operation Creation of array blocks Col 1Col m ……. 1 2 n.... Col 2 File 2 ……. Original xml file with m columns and n rows Block 1 File 1 Block 2 Block k Block 1 Block 2 Block k File m Block 1 Block 2 Block k Intermediate files on disk

Algorithm for drawing a scatter plot 1)Input the columns to be plotted, say A vs. B 2)Load a set of corresponding blocks for both columns, A and B. 3)Take corresponding data elements from both the blocks and plot them. 4)After plotting all the points, discard the blocks. 5)If there exist more blocks of data repeat step 2, else stop.

Advantages  The complexity for plotting is to O(2n) where n is the no. of rows. This complexity is independent of the no. of columns in the xml file.  If the user has to plot only a subset of data (as in case of zoom operation) then there exists another set of files which can be used for this purpose.

Dealing with subset of data  Data for every column is stored in an indexed fashion.  This helps in accessing the subset of data without having to go through the entire set of data.  As a result, operations like zoom become much faster.

Pre-processing operation Creation of tree blocks Col 1Col m ……. 1 2 n.... Col 2 Indexed File for col Indexed File for col ……. Original xml file with m columns and n rows Intermediate files with indexed data 0.1– – Indexed File for col m

Pre-processing operation Creation of tree blocks (contd) … Indexed file for a column

Results TychoTycho-2UCAC2 Data size1 million rows and 56 columns 2.5 million rows and 32 columns 48.3 million rows and 9 columns Pre- processing time 18 minutes30 minutes3 hours 26 minutes Plotting time for scatter plot 9 seconds22 seconds5 minutes 46 seconds

Features of VOMegaPlot  Scatter Plot with zoom, reverse axis and logged axis  Projection Plot  Density Plot  Histogram

Scatter Plot Tycho-1 catalogue ( RA vs. Vmag)

Density Plot Tycho-1 catalogue ( RA vs. Vmag)

Density Plot Tycho-2 catalogue ( DEC vs RA)

Scatter Plot UCAC2 Catalogue (2m_J vs. U2Rmag)

Density Plot UCAC2 Catalogue (2m_J vs. U2Rmag)

Future Enhancements  Support for reading data stored in binary format  Block level compression while creating intermediate files  Client Server version

References  VOTable  VOPlot  VOMegaPlot tm tm  IUCAA  Persistent Systems Pvt. Ltd.

Sample VOTable Back