2 The Key Players Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Alex Szalay (JHU)Jim Gray (Microsoft)Catherine van Ingen (Microsoft)
3 What is Pan-STARRS? Pan-STARRS - a new telescope facility 4 smallish (1.8m) telescopes, but with extremely wide field of viewCan scan the sky rapidly and repeatedly, and can detect very faint objectsUnique time-resolution capabilityProject was started by IfA with help from Air Force, Maui High Performance Computer Center, MIT’s Lincoln Lab and Science Applications International Corp. SAIC has dropped out & the JHU database team has joined.
5 The PS1 Prototype – Walk before you run! Pan-STARRS pushes 4 areas of technology: wide-field imaging telescope, large format CCD mosaic camera, high throughput image processing pipeline, & data-intensive database server.We were advised to build a functional prototype, PS1, to test and integrate these new approaches.The prototype, PS1, is now nearing operational readiness on Haleakala, Maui.
6 The PS1 Science Consortium University of Hawaii, Institute for AstronomyMax Plank Society, Institutes in Garching & HeidelbergHarvard-Smithsonian Center for AstrophysicsLas Cumbres Observatory Global Telescope NetworkJohns Hopkins University, Department of Physics and AstronomyUniversity of Edinburgh, Institute of AstronomyDurham University, Extragalactic Astronomy & Cosmology Research GroupQueen’s University Belfast, Astrophysics Research CenterNational Central University, Taiwan
7 PS1 Key Science Projects Population of objects in the inner solar systemPopulation of objects in the outer solar system (beyond Jupiter)Low mass stars, brown dwarfs, & young stellar objectsSearch for exo-planets by stellar transitsStructure of the Milky Way and Local GroupDedicated deep survey of M31Massive stars and SN progenitorsCosmology investigations with variables and explosive transientsGalaxy propertiesActive galactic nuclei and high redshift quasarsCosmological lensingLarge scale structure
8 PS1 Observatory on Haleakala Telescope and Camera operational by interactive or queue control
9 1.4 Gigapixel Camera Assembly with L3 Corrector Lens as Dewar Window
13 Astronomy Is Happening Now! The project is not yet to the Operational Readiness Review (November 2008) but data taken with PS1 and processed through the system has been used to:Discover brown dwarf candidatesDiscover new asteroidsMonitor one of the medium deep target fields for supernovae.
14 The Published Science Products Subsystem of Pan-STARRS will: What is the PSPS?The Published Science Products Subsystem of Pan-STARRS will:Provide access to the data products generated by the Pan-STARRS telescopes and data reduction pipelinesProvide a data archive for the Pan-STARRS data productsProvide adequate security to protect the integrity of the Pan-STARRS data products & protect the operational systems from malicious attacks.
15 PSPS Design Driving Requirements Hold over 1.5x1011 detections and their supporting metadata for ~ 5.5x109 objects.Support ~ 100 TBytes of disk storage on hardware that is > 99% reliableServe as an archive for the Pan-STARRS data productsProvide security for the data stored within the system, both against accidental and intentional actions.Provide users access to the data stored in the system, and the ability to search it.Hold sufficient metadata to allow users to determine the observational legacy and processing history of the Pan-STARRS data products.The PSPS baseline configuration should accommodate future additions of databases (i.e., be expandable).
16 What is PSPS? From the PS1 System View PS1 PSPS will not receive image files, which are retained by IPPThree significant PS1 I/O threads:Responsible for managing the catalogs of digital dataIngest of detections and initial celestial object data from IPPIngest of moving object data from MOPSUser queries of detection/object data records
17 What is PSPS? From the PS1 PSPS View HumanWeb Based Interface – the “link” with the humanData Retrieval Layer – the “gate-keeper” of the data collectionsPS1 data collection managersObject Data ManagerSolar System Data ManagerOther (future/PS4) data collection managers; e.g.,“Postage stamp” cutoutsMetadata database (vice attributes managed in PS1 ODM)Cumulative sky image serverFiltered transient database (or other special clients)WBIOther S/W ClientDRLOther DMODMSSDMIPPMOPS
19 PSPS Components Overview/Terminology DRL: Data Retrieval LayerSoftware clients, not humans, are PDCsConnects to DMsPDC: Published Data ClientWBI: Web Based InterfaceExternal PDCs (non-PSPS)DM: Data Manager (generic)ODM: Object Data ManagerSSDM: Solar System Data Manager
25 Detailed Design Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPPData Loading Pipeline (DLP)Data Storage (DS)Schema and Test QueriesDatabase Management SystemScalable Data ArchitectureHardwareQuery Manager (QM: CasJobs for prototype)
26 Data Storage – DBMS Microsoft SQL Server 2005 Relational DBMS with excellent query optimizerPlusSpherical/HTM (C# library + SQL glue)Spatial index (Hierarchical Triangular Mesh)Zones (SQL library)Alternate spatial decomposition with dec zonesMany stored procedures and functionsFrom coordinate conversions to neighbor search functionsSelf-extracting documentation (metadata) and diagnostics
27 Data Storage – Scalable Architecture Monolithic database design (a la SDSS) will not do itSQL Server does not have cluster implementationDo it by handPartitions vs SlicesPartitions are file-groups on the same serverParallelize disk accesses on the same machineSlices are data partitions on separate serversWe use both!Additional slices can be added for scale-outFor PS1, use SQL Server Distributed Partition Views (DPVs)
28 Distributed Architecture The bigger tables will be spatially partitioned across servers called SlicesUsing slices improves system scalabilityTables are sliced into ranges of ObjectID, which correspond to broad declination rangesObjectID boundaries are selected so that each slice has a similar number of objectsDistributed Partitioned Views “glue” the data together
29 Distributed Partitioned Views Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers
30 Adding New Types of Data in the ODM Because of the interaction between our logical and physical schema, we do not consider it prudent to arbitrarily add new types of data to the ODM.One area where expansion does fit naturally into our design is the addition of new filters. These can accommodate new detections (perhaps not even coming from Pan-STARRS) that cover all or part (e.g., Medium Deep Survey fields) of the sky. This would allow including into the data tables observations from other sources (e.g., Galex Extended Mission, Spitzer Warm Mission, UKIRT, CFHT) that range from the far ultraviolet to the far infrared, provided the data are formatted consistently with the ODM logical schema.
31 Client Databases Client databases can be either Standalone databases attached to the DRL (as shown in the earlier slide)MyDB instances attached to the ODM internal network. These are SQL Server databases withOwnership by individuals, groups, or key projects/science clientsUnidirectional (ODM to MyDB) write privilegeBidirectional read privilegeTable access which can be defined at the user, group, or world level, allowing selected export of resultsThe ability to load data into the MyDB from outside the ODM
32 Some Lessons Learned“GrayWulf: Scalable Cluster Architecture for Data Intensive Computing” submitted to HICCS-09 conference.Big databases are not created equal -- user query patterns will dictate the data storage model/architecture.“When” matters -- PS1 has to do things with today’s technology & can’t count on Moore’s law. This also will affect how much data you’ll have to deal with.
33 Some Lessons Learned Resources are accessed by The Approach End users who perform analyses on shared databaseData valets who maintain shared databasesOperators who maintain compute & storageThe Approach“20 queries” capture science interestsBut which set of 20 queries? Not all users will want to access the tables in the same way. However, there are clear patterns of queries that are common to all users and we have designed to implement them.
34 Some Lessons Learned Resources are accessed by End users who perform analyses on shared databaseData valets who maintain shared databasesOperators who maintain compute & storageThe Approach“20 queries” capture science interestsDivide & Conquer determines partitioningThis is an area where our team has spent a great deal of effort. There are any possibilities available and it’s unclear which is the best. We’ve decided on a model with objects held in the main data base and detections and copies of some smaller tables in the slices. OK, then how do you choose to partition? What RAID model?
35 Some Lessons LearnedThis is a second area that has involved a great deal of design effort. In SDSS much of the work flow monitoring and error handling occurred in the loading phase – but the PS1 ODM will be loading all the time. We expect the most potential problems in the load/merge process!We’re taking a Sunny, Sticky, and Cloudy day approach to the testing and error handling implementation. Ultimately real data will define the Rainy day case – hopefully it won’t be a Cat 5 hurricane!Resources are accessed byEnd users who perform analyses on shared databaseData valets who maintain shared databasesOperators who maintain compute & storageThe Approach“20 queries” capture science interestsDivide & Conquer determines partitioningFaults Happen – handling must be designed into all data valet processes