Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani Thakar (JHU) Alex Szalay (JHU) Alex Szalay (JHU) Jim.

Similar presentations


Presentation on theme: "The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani Thakar (JHU) Alex Szalay (JHU) Alex Szalay (JHU) Jim."— Presentation transcript:

1

2 The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani Thakar (JHU) Alex Szalay (JHU) Alex Szalay (JHU) Jim Gray (Microsoft) Jim Gray (Microsoft) Catherine van Ingen (Microsoft) Catherine van Ingen (Microsoft)

3 What is Pan-STARRS? Pan-STARRS - a new telescope facility Pan-STARRS - a new telescope facility 4 smallish (1.8m) telescopes, but with extremely wide field of view 4 smallish (1.8m) telescopes, but with extremely wide field of view Can scan the sky rapidly and repeatedly, and can detect very faint objects Can scan the sky rapidly and repeatedly, and can detect very faint objects Unique time-resolution capability Unique time-resolution capability Project was started by IfA with help from Air Force, Maui High Performance Computer Center, MIT’s Lincoln Lab and Science Applications International Corp. SAIC has dropped out & the JHU database team has joined. Project was started by IfA with help from Air Force, Maui High Performance Computer Center, MIT’s Lincoln Lab and Science Applications International Corp. SAIC has dropped out & the JHU database team has joined.

4 The PS-4 Telescope Array Concept

5 The PS1 Prototype – Walk before you run! Pan-STARRS pushes 4 areas of technology: wide-field imaging telescope, large format CCD mosaic camera, high throughput image processing pipeline, & data-intensive database server. Pan-STARRS pushes 4 areas of technology: wide-field imaging telescope, large format CCD mosaic camera, high throughput image processing pipeline, & data-intensive database server. We were advised to build a functional prototype, PS1, to test and integrate these new approaches. We were advised to build a functional prototype, PS1, to test and integrate these new approaches. The prototype, PS1, is now nearing operational readiness on Haleakala, Maui. The prototype, PS1, is now nearing operational readiness on Haleakala, Maui.

6 The PS1 Science Consortium University of Hawaii, Institute for Astronomy University of Hawaii, Institute for Astronomy Max Plank Society, Institutes in Garching & Heidelberg Max Plank Society, Institutes in Garching & Heidelberg Harvard-Smithsonian Center for Astrophysics Harvard-Smithsonian Center for Astrophysics Las Cumbres Observatory Global Telescope Network Las Cumbres Observatory Global Telescope Network Johns Hopkins University, Department of Physics and Astronomy Johns Hopkins University, Department of Physics and Astronomy University of Edinburgh, Institute of Astronomy University of Edinburgh, Institute of Astronomy Durham University, Extragalactic Astronomy & Cosmology Research Group Durham University, Extragalactic Astronomy & Cosmology Research Group Queen’s University Belfast, Astrophysics Research Center Queen’s University Belfast, Astrophysics Research Center National Central University, Taiwan National Central University, Taiwan

7 PS1 Key Science Projects Population of objects in the inner solar system Population of objects in the inner solar system Population of objects in the outer solar system (beyond Jupiter) Population of objects in the outer solar system (beyond Jupiter) Low mass stars, brown dwarfs, & young stellar objects Low mass stars, brown dwarfs, & young stellar objects Search for exo-planets by stellar transits Search for exo-planets by stellar transits Structure of the Milky Way and Local Group Structure of the Milky Way and Local Group Dedicated deep survey of M31 Dedicated deep survey of M31 Massive stars and SN progenitors Massive stars and SN progenitors Cosmology investigations with variables and explosive transients Cosmology investigations with variables and explosive transients Galaxy properties Galaxy properties Active galactic nuclei and high redshift quasars Active galactic nuclei and high redshift quasars Cosmological lensing Cosmological lensing Large scale structure Large scale structure

8 PS1 Observatory on Haleakala Telescope and Camera operational by interactive or queue control

9 1.4 Gigapixel Camera Assembly with L3 Corrector Lens as Dewar Window

10 Gibbous Moon 1millisec exposure

11 M31 Poster at the January 2008 AAS Meeting

12 M51

13 Astronomy Is Happening Now! The project is not yet to the Operational Readiness Review (November 2008) but data taken with PS1 and processed through the system has been used to: The project is not yet to the Operational Readiness Review (November 2008) but data taken with PS1 and processed through the system has been used to: Discover brown dwarf candidates Discover brown dwarf candidates Discover new asteroids Discover new asteroids Monitor one of the medium deep target fields for supernovae. Monitor one of the medium deep target fields for supernovae.

14 What is the PSPS? The Published Science Products Subsystem of Pan-STARRS will: Provide access to the data products generated by the Pan-STARRS telescopes and data reduction pipelines Provide access to the data products generated by the Pan-STARRS telescopes and data reduction pipelines Provide a data archive for the Pan-STARRS data products Provide a data archive for the Pan-STARRS data products Provide adequate security to protect the integrity of the Pan-STARRS data products & protect the operational systems from malicious attacks. Provide adequate security to protect the integrity of the Pan-STARRS data products & protect the operational systems from malicious attacks.

15 PSPS Design Driving Requirements Hold over 1.5x10 11 detections and their supporting metadata for ~ 5.5x10 9 objects. Hold over 1.5x10 11 detections and their supporting metadata for ~ 5.5x10 9 objects. Support ~ 100 TBytes of disk storage on hardware that is > 99% reliable Support ~ 100 TBytes of disk storage on hardware that is > 99% reliable Serve as an archive for the Pan-STARRS data products Serve as an archive for the Pan-STARRS data products Provide security for the data stored within the system, both against accidental and intentional actions. Provide security for the data stored within the system, both against accidental and intentional actions. Provide users access to the data stored in the system, and the ability to search it. Provide users access to the data stored in the system, and the ability to search it. Hold sufficient metadata to allow users to determine the observational legacy and processing history of the Pan-STARRS data products. Hold sufficient metadata to allow users to determine the observational legacy and processing history of the Pan-STARRS data products. The PSPS baseline configuration should accommodate future additions of databases (i.e., be expandable). The PSPS baseline configuration should accommodate future additions of databases (i.e., be expandable).

16 What is PSPS? From the PS1 System View PS1 PSPS will not receive image files, which are retained by IPP PS1 PSPS will not receive image files, which are retained by IPP Three significant PS1 I/O threads: Three significant PS1 I/O threads: Responsible for managing the catalogs of digital data Responsible for managing the catalogs of digital data Ingest of detections and initial celestial object data from IPP Ingest of detections and initial celestial object data from IPP Ingest of moving object data from MOPS Ingest of moving object data from MOPS User queries of detection/object data records User queries of detection/object data records

17 What is PSPS? From the PS1 PSPS View Web Based Interface – the “link” with the human Web Based Interface – the “link” with the human Data Retrieval Layer – the “gate-keeper” of the data collections Data Retrieval Layer – the “gate-keeper” of the data collections PS1 data collection managers PS1 data collection managers  Object Data Manager  Solar System Data Manager Other (future/PS4) data collection managers; e.g., Other (future/PS4) data collection managers; e.g., “Postage stamp” cutouts “Postage stamp” cutouts Metadata database (vice attributes managed in PS1 ODM) Metadata database (vice attributes managed in PS1 ODM) Cumulative sky image server Cumulative sky image server Filtered transient database (or other special clients) Filtered transient database (or other special clients) DRL WBI Other S/W Client Human ODMSSDM Other DM IPPMOPS

18

19 PSPS Components Overview/Terminology DRL: Data Retrieval Layer Software clients, not humans, are PDCs Connects to DMs PDC: Published Data Client WBI: Web Based Interface External PDCs (non- PSPS) DM: Data Manager (generic) ODM: Object Data Manager SSDM: Solar System Data Manager

20 Prototype ODM Structure

21 ODM Components Query Manager (QM) Workflow Manager (WFM) Cluster Manager (CLM) PS1 ODM Database Performance Monitor

22

23

24 PS1 Schema Relationships

25 Detailed Design Reuse SDSS software as much as possible Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Loading Pipeline (DLP) Data Storage (DS) Data Storage (DS) Schema and Test Queries Schema and Test Queries Database Management System Database Management System Scalable Data Architecture Scalable Data Architecture Hardware Hardware Query Manager (QM: CasJobs for prototype) Query Manager (QM: CasJobs for prototype)

26 Data Storage – DBMS Microsoft SQL Server 2005 Microsoft SQL Server 2005 Relational DBMS with excellent query optimizer Relational DBMS with excellent query optimizer Plus Plus Spherical/HTM (C# library + SQL glue) Spherical/HTM (C# library + SQL glue) Spatial index (Hierarchical Triangular Mesh) Spatial index (Hierarchical Triangular Mesh) Zones (SQL library) Zones (SQL library) Alternate spatial decomposition with dec zones Alternate spatial decomposition with dec zones Many stored procedures and functions Many stored procedures and functions From coordinate conversions to neighbor search functions From coordinate conversions to neighbor search functions Self-extracting documentation (metadata) and diagnostics Self-extracting documentation (metadata) and diagnostics

27 Data Storage – Scalable Architecture Monolithic database design (a la SDSS) will not do it Monolithic database design (a la SDSS) will not do it SQL Server does not have cluster implementation SQL Server does not have cluster implementation Do it by hand Do it by hand Partitions vs Slices Partitions vs Slices Partitions are file-groups on the same server Partitions are file-groups on the same server Parallelize disk accesses on the same machine Parallelize disk accesses on the same machine Slices are data partitions on separate servers Slices are data partitions on separate servers We use both! We use both! Additional slices can be added for scale-out Additional slices can be added for scale-out For PS1, use SQL Server Distributed Partition Views (DPVs) For PS1, use SQL Server Distributed Partition Views (DPVs)

28 Distributed Architecture The bigger tables will be spatially partitioned across servers called Slices The bigger tables will be spatially partitioned across servers called Slices Using slices improves system scalability Using slices improves system scalability Tables are sliced into ranges of ObjectID, which correspond to broad declination ranges Tables are sliced into ranges of ObjectID, which correspond to broad declination ranges ObjectID boundaries are selected so that each slice has a similar number of objects ObjectID boundaries are selected so that each slice has a similar number of objects Distributed Partitioned Views “glue” the data together Distributed Partitioned Views “glue” the data together

29 Distributed Partitioned Views Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers

30 Adding New Types of Data in the ODM Because of the interaction between our logical and physical schema, we do not consider it prudent to arbitrarily add new types of data to the ODM. Because of the interaction between our logical and physical schema, we do not consider it prudent to arbitrarily add new types of data to the ODM. One area where expansion does fit naturally into our design is the addition of new filters. These can accommodate new detections (perhaps not even coming from Pan-STARRS) that cover all or part (e.g., Medium Deep Survey fields) of the sky. This would allow including into the data tables observations from other sources (e.g., Galex Extended Mission, Spitzer Warm Mission, UKIRT, CFHT) that range from the far ultraviolet to the far infrared, provided the data are formatted consistently with the ODM logical schema. One area where expansion does fit naturally into our design is the addition of new filters. These can accommodate new detections (perhaps not even coming from Pan-STARRS) that cover all or part (e.g., Medium Deep Survey fields) of the sky. This would allow including into the data tables observations from other sources (e.g., Galex Extended Mission, Spitzer Warm Mission, UKIRT, CFHT) that range from the far ultraviolet to the far infrared, provided the data are formatted consistently with the ODM logical schema.

31 Client Databases Client databases can be either Client databases can be either Standalone databases attached to the DRL (as shown in the earlier slide) Standalone databases attached to the DRL (as shown in the earlier slide) MyDB instances attached to the ODM internal network. These are SQL Server databases with MyDB instances attached to the ODM internal network. These are SQL Server databases with Ownership by individuals, groups, or key projects/science clients Ownership by individuals, groups, or key projects/science clients Unidirectional (ODM to MyDB) write privilege Unidirectional (ODM to MyDB) write privilege Bidirectional read privilege Bidirectional read privilege Table access which can be defined at the user, group, or world level, allowing selected export of results Table access which can be defined at the user, group, or world level, allowing selected export of results The ability to load data into the MyDB from outside the ODM The ability to load data into the MyDB from outside the ODM

32 Some Lessons Learned “GrayWulf: Scalable Cluster Architecture for Data Intensive Computing” submitted to HICCS-09 conference. “GrayWulf: Scalable Cluster Architecture for Data Intensive Computing” submitted to HICCS-09 conference. Big databases are not created equal -- user query patterns will dictate the data storage model/architecture. Big databases are not created equal -- user query patterns will dictate the data storage model/architecture. “When” matters -- PS1 has to do things with today’s technology & can’t count on Moore’s law. This also will affect how much data you’ll have to deal with. “When” matters -- PS1 has to do things with today’s technology & can’t count on Moore’s law. This also will affect how much data you’ll have to deal with.

33 Some Lessons Learned Resources are accessed by Resources are accessed by End users who perform analyses on shared database End users who perform analyses on shared database Data valets who maintain shared databases Data valets who maintain shared databases Operators who maintain compute & storage Operators who maintain compute & storage The Approach The Approach “20 queries” capture science interests “20 queries” capture science interests But which set of 20 queries? Not all users will want to access the tables in the same way. However, there are clear patterns of queries that are common to all users and we have designed to implement them.

34 Some Lessons Learned Resources are accessed by Resources are accessed by End users who perform analyses on shared database End users who perform analyses on shared database Data valets who maintain shared databases Data valets who maintain shared databases Operators who maintain compute & storage Operators who maintain compute & storage The Approach The Approach “20 queries” capture science interests “20 queries” capture science interests Divide & Conquer determines partitioning Divide & Conquer determines partitioning This is an area where our team has spent a great deal of effort. There are any possibilities available and it’s unclear which is the best. We’ve decided on a model with objects held in the main data base and detections and copies of some smaller tables in the slices. OK, then how do you choose to partition? What RAID model?

35 Some Lessons Learned Resources are accessed by Resources are accessed by End users who perform analyses on shared database End users who perform analyses on shared database Data valets who maintain shared databases Data valets who maintain shared databases Operators who maintain compute & storage Operators who maintain compute & storage The Approach The Approach “20 queries” capture science interests “20 queries” capture science interests Divide & Conquer determines partitioning Divide & Conquer determines partitioning Faults Happen – handling must be designed into all data valet processes Faults Happen – handling must be designed into all data valet processes This is a second area that has involved a great deal of design effort. In SDSS much of the work flow monitoring and error handling occurred in the loading phase – but the PS1 ODM will be loading all the time. We expect the most potential problems in the load/merge process!We’re taking a Sunny, Sticky, and Cloudy day approach to the testing and error handling implementation. Ultimately real data will define the Rainy day case – hopefully it won’t be a Cat 5 hurricane!

36 And Finally


Download ppt "The Key Players Maria Nieto-Santisteban (JHU) Maria Nieto-Santisteban (JHU) Ani Thakar (JHU) Ani Thakar (JHU) Alex Szalay (JHU) Alex Szalay (JHU) Jim."

Similar presentations


Ads by Google