Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.

Slides:



Advertisements
Similar presentations
Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Advertisements

Spatial (or N-Dimensional) Search in a Relational World Jim Gray.
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Global Hands-On Universe meeting July 15, 2007 Authentic Data in the Classroom with the Sloan Digital Sky Survey Jordan Raddick (Johns Hopkins University)
Research School of Astronomy & AstrophysicsSlide 1 SkyMapper SkyMapper and the Stromlo Southern Sky Survey Stefan Keller, Brian Schmidt, Paul Francis and.
Oct. 18, Review: Telescopes – their primary purpose… Across the full EM spectrum (radio through very high energy gamma- rays) telescopes fundamentally.
The Open Science Grid: Bringing the power of the Grid to scientific research
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Brian Schmidt, Paul Francis, Mike Bessell, Stefan Keller.
Sloan Digital Sky Survey Astronomy April 2006 Margaret Flynn.
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Introducing Astronomy Education into High School Physics Curriculum Through the Use of the University of North Dakota Observatory Caitlin Nolby Space Studies.
Titan Graph Database Meet Bhatt(13MCEC02).
Database Lecture # 1 By Ubaid Ullah.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Simple Database.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Radio Galaxies and Quasars Powerful natural radio transmitters associated with Giant elliptical galaxies Demo.
15th Dec, 2005UKIDSS SciVer meeting, Edinburgh1 UKIDSS LAS Image Classification Analysis and Some Miscellaneous Issues Richard McMahon Bram Venemans Institute.
Indexing for Multidimensional Data An Introduction.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Public Access to Large Astronomical Datasets Alex Szalay, Johns Hopkins Jim Gray, Microsoft Research.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Partitioning Design For Performance and Maintainability Martin Cairns
The Pan-STARRS Data Challenge Jim Heasley Institute for Astronomy University of Hawaii.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
August 10, 2004 Apache Point Observatory, NM FINDING SUPERNOVAE IN A SLICE OF PI Dennis J. Lamenti San Francisco State University.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Kevin Cooke.  Galaxy Characteristics and Importance  Sloan Digital Sky Survey: What is it?  IRAF: Uses and advantages/disadvantages ◦ Fits files? 
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Deep Extragalactic Space The basic “yardstick” of distance is now the Megaparsec = 3.3 million light years Question: how do we determine the distances.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
Wide-field Infrared Survey Explorer (WISE) is a NASA infrared- wavelength astronomical space telescope launched on December 14, 2009 It’s an Earth-orbiting.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
CS 540 Database Management Systems
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Distributed Network Traffic Feature Extraction for a Real-time IDS
Physical Database Design and Performance
NOSQL.
ITD1312 Database Principles Chapter 5: Physical Database Design
Database Performance Tuning and Query Optimization
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Selected Topics: External Sorting, Join Algorithms, …
Chapter 11 Database Performance Tuning and Query Optimization
Wellington Cabrera Advisor: Carlos Ordonez
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still there are some misfits who continue to insist that there is no such thing as progress. - Ransom K. Ferm

Agenda Homework 1 Questions? SDSS Lecture Study Questions EOSDIS Demo

Apache Point Observatory, Sunspot, New Mexico Apache Point Observatory 2.5m main survey telescope 0.5m photometric telescope 3.5m telescope (not used by SDSS) not a telescope

Coarse Data Flow

Detailed Data Flow Data AcquisitionData Processing (Fermilab) Data Distribution

Data Acquisition

Good focus area ~ 30 full moons Camera Spectographs Data Acquisition

Data Acquisition: 2D Images 30 charge-coupled devices (CCDs) Each has 4 million pixels Each night: 200 gigabytes of data on a dozen tapes

Data Acquisition

Data Acquisition: Spectra

Spectra Source: National Optical Astronomy Observatory Sun Spectra with absorption lines

Data Processing

scanline strip = 6 scanlines stripe = 2 strips, offset frame (per CCD) 2048 x 1489 pixels 10% overlap field = frames in all 5 filters

Data Processing: Images

Data Processing: Spectra 2D  3D redshift = distance Classification Galaxy or Star? Wavelengths What substances are involved?

Data Processing: Spectra

Data Distribution

Data Distribution: Science Database SpecObj Telescope Configuration Admin PhotoObj

Data Distribution: Science Database 200 million objects (photos, spectra, etc.) Numerical attributes in a 100+ dimensional space Challenge: how can a relational database scale to large volume of data?

Improving Scalability SDSS data too large for one disk or one server Base-data objects spatially partitioned across servers High-traffic data replicated Parallel and distributed query system Scan machine – continuously scans dataset and evaluate user defined predicates (partitioned across multiple nodes) Hash machine – performs comparisons within data clusters

Overview of SDSS Schema SDSS schema browser: wser.asp wser.asp PhotoObjAll – record describing all attributes of each photometric object 100s of columns Millions of photos Need good indexing/materialized views

SDSS Schema (continued) PhotoObjAll table has many views: PhotoObj- all primary and secondary objects PhotoPrimary- all primary photo objects (best) Star Galaxy Sky Unknown PhotoSecondary PhotoFamily (neither primary nor secondary) Each view is Horizontal Partition (subset of rows)

Other views PhotoTag – Vertical partition of the PhotoObjAll table (subset of the columns) Contains only columns that are most often requested (60 columns, 10% of PhotoObjAll) Since rows are smaller (fewer columns), more rows can be loaded into memory and performance improves

Indexes Hierarchical Triangular Mesh (HTM) Spatially decomposes region of sky covered by SDSS data Enables faster spatial searches Database indexes Primary key index –primary key of the table Foreign key index -primary key of another table Covering index – index covering one or more columns of a table Speeds up searches if any of the fields included in WHERE clause mode, cy, cx, cz, htmID, type, flags, status, ra, dec, u, g, r, i, z, rho htmID, cx, cy, cz, type, mode, flags, status, ra, dec, u, g, r, i, z, rho run, camcol, type, mode, cx, cy, cz

SDSS Database Indexes PhotoObj and PhotoTag both indexed 2% subset of PhotoObj 50x faster than reading whole PhotoObj table 5x faster than reading whole PhotoTag table

Database Size for DR1 (GB)

Data Distribution CASJobs For long running queries Personal Sky Server 1% of total data packaged for one-click install education, testing, demonstrations Web services for specific functions

Data Distribution: Releases

Study Questions