Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.

Similar presentations


Presentation on theme: "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics."— Presentation transcript:

1 Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics and Astronomy, The Johns Hopkins University Jim Gray, Don Slutz Microsoft Research Robert J. Brunner, California Institute of Technology

2 Towards the Digital Sky Goal: interactive exploration of astronomical data efforts underway to capture digital images of the sky multiple wavelengths: x-rays, ultraviolet, visible, infrared diverse data types: images, text, numerical attributes data is big: set of multi-TB archives no need to wait for access to a telescope NGC 5033, from “Image of the week”1/5000 of first light image, May 27-28, 1998

3 Astronomy 101 Celestial Sphere ©Sky Publishing Corp Declination (degrees) Right ascension (time - h,m,s) Surface area - “square” degrees Unit of solid angle sphere = 41252.96 deg 2 Arcminute = 1/60 degree Arcsecond = 1/60 arcminute

4 Sloan Digital Sky Survey Goals (1999) ➲ Map ~10000 deg 2 of northern sky (~1/4 celestial sphere) ➲ Determine position and brightness of 100M celestial objects ➲ Measure distance to 1M galaxies, create 3D model ➲ Measure distance to 100K quasars ➲ Make data available to the public As of data release 6 (data through June 2006) ➲ Images, attributes of ~287M objects over 9583 deg 2 ➲ 1.27 million spectra of stars, galaxies, quasars and blank sky (for sky subtraction) over 7425 deg 2 ➲ Additional estimates of stellar temperatures, gravities, metallicities ➲ Data, search tools available on web (http://skyserver.sdss.org)

5 Where is the data acquired? ➲ Apache Point Observatory (APO), Sunspot, NM far away from large cities – dark night sky altitude: 9200 feet little water vapor few pollutants many cloudless, moonless nights! Photo: Fermilab Visual Media Services

6 Telescopes ➲ 2.5 meter reflecting light telescope wide angle: 3° field of view (diameter of ~30 full moons) camera: 120 Mpixel, 30 CCDs, each 2” square, 5 color filters 2 spectrographs measure spectra of ~600 objects at once generates up to 200 GB/night Photos: Fermilab Visual Media Services

7 Telescopes ➲ 0.5 meter photometric telescope used to monitor atmosphere during survey (temperature, pressure) calibrate brightness of objects captured by main telescope Photos: Fermilab Visual Media Services

8 Drift scan imaging ➲ Telescope is positioned once ➲ Images taken as sky moves past Reading of CCD lines synchronized with sky movement Exposure time: 55 sec Two scans (runs) form a stripe 5-color columns split into fields, 2048x1489 2B/pixel 5-color images (+ ~60 attributes) ➲ Output: photometric catalog Atlas images, 500+ attributes for each of 100M galaxies, 100M stars, 1M quasars Attributes: position, magnitude, size, color,... Image: Christoph Flohr, www.driftscan.com M45 The Pleaides

9 Spectroscopic survey ➲ Target specific objects automatically chosen from photometric survey 1M galaxies, 100K stars, 100K quasars Up to 5000 spectra collected per night ➲ Classify objects (stars, galaxies, quasars...) template matching against standard spectra for each object class examine spectra for object properties (e.g., chemical composition) ➲ Create 3D map of galaxy distribution Measure distance using Doppler shift

10 Data archives raw data FedEx tapes to FermiLab for processing, reduction operational archive processed data in instrumental form perform calibration information for target selection science archive object catalog: positions, magnitudes, colors, sizes, radial profiles, classifications, etc. for over 100 million objects housekeeping data: calibrations and logs atlas images in 5 colors for all identified objects one-dimensional spectra of all spectroscopic targets local archive replica of science archive public archive scientifically verified recalibrated (if necessary)

11 Typical queries Q1: Find all galaxies without unsaturated pixels within 1 arcsecond of a given point in the sky (right ascension and declination). spatial lookup Q2: Find all galaxies with blue surface brightness between and 30 and 40, and -10<super galactic latitude (sgb) <10, and declination less than zero. search for galaxies with a specified blue brightness in a given region of sky coordinate system needs translation Q3: Find all galaxies brighter than magnitude 22, where the local extinction is >0.75. local extinction indicates amount of dust in a given direction (dust masks light) Q15: Provide a list of moving objects consistent with an asteroid. Objects are classified as moving: 5 successive observations from the 5 color bands. SQL: select moving object where sqrt((deltax5-deltax1)2 + (deltay5-deltay1)2) < 2 arc seconds.

12 Database design Original design based on OODB (ObjectivityDB), changed to relational DB (Reported in SIGMOD 2002) Alexander S. Szalay, Jim Gray, Ani R. Thakar, Peter Z. Kunszt, Tanu Malik, Jordan Raddick, Christopher Stoughton, Jan vandenBerg. “ The SDSS SkyServer – Public Access to the Sloan Digital Sky Server Data”, SIGMOD 2002 80 million objects 5 color images target selection follow-up on selected targets

13 Schema: photographic objects PhotoObj: star & galaxy attributes records for 80 million objects each ~470 attributes (~2KB) heavily indexed (“tens of indices”) 30% of storage space devoted to indices Field processing used for objects in field, all frames Neighbors computed after the data is loaded For every object, list of objects within 1/2 arcminute (~10 objects) Views PhotoPrimary: photoObj with mode=1 (best instance of deblended object) Stars: PhotoPrimary with type='star' Galaxies: PrimaryObjects with type='galaxy'

14 Spatial Data Access Coordinate systems right-ascension and declination hierarchical triangular mesh (HTM): recursive partitioning of celestial sphere HTM recursively assigns a number to each point on the sphere Recursion 20 levels deep: smallest triangles < 0.1 arcsecond on a side HTM index is built as an extension of SQL Server’s B-trees Spatial queries use the HTM index to limit searches to small set of triangles

15 Thoughts on server architecture ➲ Use commodity servers and storage Processors, memory costs 10x lower than high end Storage cost 3x lower Deploy as much processing as one can afford ➲ Partition data spatially Repartition as servers added, removed ➲ Replicate high traffic data ➲ Exploit parallelism ➲ Deploy as network service initially

16 SkyServer SDSS DR1 is about 900GB (3.4B rows) SkyServer cluster ➲ Web front ends (3) Hardware: Dell Poweredge 1750 servers, 2GB memory, dual Gbit Ethernet, 2 36GB Ultra320 SCSI disks, RAID1 Software: Windows Server 2003, IIS 6.0 Microsoft Network Load Balancing ➲ Database servers (3) 1 DB server - short queries on the public website 2 DB servers - longer queries for registered users, failover Hardware: Dell 4600 database servers, 4GB memory, 1.2 TB of 10k rpm Ultra SCSI drives, 4 drives/SCSI channel, RAID0 Software: Windows Server 2003 and SQL Server 2000. Data rates: 400MBps (simple query), 160-200 Mbps (typical multi user load) ➲ Log server (1) same configuration as DB server? all back-ends on private network http://skyserver.sdss.org Table Records Bytes Field14k60MB Frame73k6GB PhotoObj14m31GB Profile14m9GB Neighbors111m5GB Plate9880KB SpecObj63k1GB SpecLine1.7m225MB SpecLineIndex1.8m142MB xcRedShift1.9m157MB elRedShift51k3MB Major tables, records and sizes. Indices double the storage. (SIGMOD 2002)


Download ppt "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics."

Similar presentations


Ads by Google