Presentation is loading. Please wait.

Presentation is loading. Please wait.

December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick.

Similar presentations


Presentation on theme: "December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick."— Presentation transcript:

1 December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick Duda 1 Ray Plante 1 Cristina Beldica 1 Huan Lin 3 Douglas Tucker 3 1 NCSA 2 UIUC Astronomy 3 Fermilab Astronomers Grid Computing, Middleware, Portals Database development, maintenance, Archive web portal NVO lead at NCSA Senior Developer Oversight Group Randy Butler, Mike Freemon, and Jay Alameda (NCSA)

2 December 11, 2006 DES DM - Mohr Architecture Overview Components: Pipelines Archive Portals Development: 30 FTE-yrs total Current status: 13 FTE-yrs to date

3 December 11, 2006 DES DM - Mohr Where are we today? Iterative/Spiral Development Oct ‘04-Sep’05: initial design and development Oct ‘04-Sep’05: initial design and development basic image reduction, cataloguing, catalog and image archive, etc basic image reduction, cataloguing, catalog and image archive, etc Oct ‘05-Jan’06: DC 1 = deployed DES DM system v1 Oct ‘05-Jan’06: DC 1 = deployed DES DM system v1 Used Teragrid to reduce 700GB of simulated raw data [Fermilab] into 5TB of images, weight maps, bad pixel maps, catalogs Used Teragrid to reduce 700GB of simulated raw data [Fermilab] into 5TB of images, weight maps, bad pixel maps, catalogs Catalogued, ingested and calibrated 50M objects Catalogued, ingested and calibrated 50M objects Feb’06-Sep’06: refine & develop Feb’06-Sep’06: refine & develop full science processing through coaddition, greater automation, ingestion from HPC platforms, quality assurance, etc full science processing through coaddition, greater automation, ingestion from HPC platforms, quality assurance, etc Oct’06-Jan ‘07: DC 2 = deploy DES DM system v2 Oct’06-Jan ‘07: DC 2 = deploy DES DM system v2 Use NCSA and SDSC Teragrid platforms to process 500deg 2 in griz with 4 layers of imaging in each (equiv to 20% of SDSS imaging dataset, 350M objects) Use NCSA and SDSC Teragrid platforms to process 500deg 2 in griz with 4 layers of imaging in each (equiv to 20% of SDSS imaging dataset, 350M objects) Use DES DM system on workstation to reduce Blanco Cosmology Survey data (http://cosmology.uiuc.edu/BCS) from MOSAIC2 camera Use DES DM system on workstation to reduce Blanco Cosmology Survey data (http://cosmology.uiuc.edu/BCS) from MOSAIC2 camerahttp://cosmology.uiuc.edu/BCS Evaluate ability to meet DES data quality requirements Evaluate ability to meet DES data quality requirements DC1 Astrometry DC1 Photometry

4 December 11, 2006 DES DM - Mohr DES Archive Components of the DES Archive Components of the DES Archive Archive nodes: filesystems that can host DES data files Archive nodes: filesystems that can host DES data files Large number-- no meaningful limit Large number-- no meaningful limit Distributed-- assumed to be non-local Distributed-- assumed to be non-local Database: tracks data using metadata describing the files and file locations Database: tracks data using metadata describing the files and file locations Archive web portal: allows external (NVO) users to select and retrieve data from the DES archive Archive web portal: allows external (NVO) users to select and retrieve data from the DES archive Try it at https://des.cosmology.uiuc.edu:9093/des/ Try it at https://des.cosmology.uiuc.edu:9093/des/

5 December 11, 2006 DES DM - Mohr Archive Filesystem Structure host:/${root}/Archiveraw/ ${nite}/ (des2006105, des20061006, etc) src/original data from telescope raw/split and cross-talk corrected data log/logs from observing and processing red/${runid}/ xml/location of main OGRE workflows etc/location of SExtractor config files, etc bin/all binaries required for job data/${nite}/ cal/biases, flats, illumination correction, etc raw/simply a link to appropriate raw data log/processing logs ${band1}/reduced images and catalogs for ${band1} ${band2}/and so on for each band … cal/ calibration data (bad pixel masks, pupil ghosts) coadd/ holds co-added data within ${project}, ${tilename}, ${runid}

6 December 11, 2006 DES DM - Mohr DES Database Image metadata: Image metadata: Many header parameters (including WCS params) Many header parameters (including WCS params) All image tags that uniquely identify the DES archive location All image tags that uniquely identify the DES archive location ${archive_site} (fnal, mercury, gpfs-wan, bcs, etc) ${archive_site} (fnal, mercury, gpfs-wan, bcs, etc) ${imageclass}= (raw, red, coadd, cal) ${imageclass}= (raw, red, coadd, cal) ${nite}, ${runid}, ${band}, ${imagename} ${nite}, ${runid}, ${band}, ${imagename} ${ccd_number}, ${tilename}, ${imagetype} ${ccd_number}, ${tilename}, ${imagetype} As long as we adopt a fixed archive structure we can very efficiently track extremely large datasets As long as we adopt a fixed archive structure we can very efficiently track extremely large datasets Simulation metadata: Simulation metadata: We could easily extend the DES archive to track simulation data We could easily extend the DES archive to track simulation data Need to adopt some logical structure and we could be up and running very rapidly Need to adopt some logical structure and we could be up and running very rapidly

7 December 11, 2006 DES DM - Mohr Data Access Framework With DC2 we are fielding grid data movement tools that are integrated with the DES archive With DC2 we are fielding grid data movement tools that are integrated with the DES archive ar_copy: copies dataset from one archive node to another ar_copy: copies dataset from one archive node to another ar_verify: file by file comparison of datasets on two archive nodes ar_verify: file by file comparison of datasets on two archive nodes ar_remove: deletes dataset from archive node ar_remove: deletes dataset from archive node These tools update file locations within the DES database These tools update file locations within the DES database Data selected using file tags: Data selected using file tags: ar_copy -imclass=raw -nite=des20051005 -imagetype=src mercury gpfs-wan ar_copy -imclass=raw -nite=des20051005 -imagetype=src mercury gpfs-wan ar_copy -imclass=red -runid=DES20061120_des20061010_01 mercury mss ar_copy -imclass=red -runid=DES20061120_des20061010_01 mercury mss Underlying grid-ftp tools can vary with archive node Underlying grid-ftp tools can vary with archive node Most sites use Trebuchet, data movement tools integrated with the Elf/OGRE middleware development project at NCSA Most sites use Trebuchet, data movement tools integrated with the Elf/OGRE middleware development project at NCSA FNAL uses globus-url-copy, because there’s an incompatibility with Trebuchet listing FNAL uses globus-url-copy, because there’s an incompatibility with Trebuchet listing Metadata in the DES db encode the grid-ftp technology as well as combinations of buffer sizes, number of parallel streams, etc for moving “large” and “small” files Metadata in the DES db encode the grid-ftp technology as well as combinations of buffer sizes, number of parallel streams, etc for moving “large” and “small” files Recent test by Greg Daues achieved 100MB/s for single copy… Typically we’ve combined 5 or 6 copies in parallel to achieve total data movement off Mercury of about 50MB/s Recent test by Greg Daues achieved 100MB/s for single copy… Typically we’ve combined 5 or 6 copies in parallel to achieve total data movement off Mercury of about 50MB/s

8 December 11, 2006 DES DM - Mohr Archive Portal: https://des.cosmology.uiuc.edu:9093/des/ You will be redirected to NVO Login

9 December 11, 2006 DES DM - Mohr Archive Portal: Image Query

10 December 11, 2006 DES DM - Mohr DC2 Overview Transferred 10 nights of simulated data from FNAL Enstore Transferred 10 nights of simulated data from FNAL Enstore Roughly 3000 DECam exposures {500 deg2 in griz 4 layers deep plus 50 flats/biases each night} Roughly 3000 DECam exposures {500 deg2 in griz 4 layers deep plus 50 flats/biases each night} Currently: Processed 8 of 10 nights Currently: Processed 8 of 10 nights Use Convert_Ingest pipeline to split data {crosstalk corr in this stage} Use Convert_Ingest pipeline to split data {crosstalk corr in this stage} Typically 20 jobs, each running a couple of hours Typically 20 jobs, each running a couple of hours Raw data are 600GB for each night Raw data are 600GB for each night Submit 62 processing jobs for each of these nights Submit 62 processing jobs for each of these nights Each night produces 3.4TB, ~35 million catalogued objects for ingestion Each night produces 3.4TB, ~35 million catalogued objects for ingestion Each job takes around 11hrs… 1 CPU-month to reduce a night of data Each job takes around 11hrs… 1 CPU-month to reduce a night of data Stages: zerocombine, flatcombine, imcorrect, astrometry, remapping, cataloguing, fitscombine, ingestion Stages: zerocombine, flatcombine, imcorrect, astrometry, remapping, cataloguing, fitscombine, ingestion Currently some jobs fail because of failures in astrometric refinement… Currently some jobs fail because of failures in astrometric refinement… Ingest objects into the db Ingest objects into the db Move data from processing platforms to storage cluster and mass storage Move data from processing platforms to storage cluster and mass storage Then determine photometric solution for each band/night Then determine photometric solution for each band/night Update zeropoints for all objects/images for that night Update zeropoints for all objects/images for that night Total data production: 4.8TB raw, 27TB reduced, ~240 million objects Total data production: 4.8TB raw, 27TB reduced, ~240 million objects Still to do: complete processing, co-add all data, extract summary statistics Still to do: complete processing, co-add all data, extract summary statistics

11 December 11, 2006 DES DM - Mohr DC2 Challenges Scale of data- almost overwhelming overwhelming Scale of data- almost overwhelming overwhelming 330GB arrive… 3.4TB produced by next day 330GB arrive… 3.4TB produced by next day Ingesting 35 million objects is a challenge-- takes 10 hours if ingest rate is 1000 objects/s Ingesting 35 million objects is a challenge-- takes 10 hours if ingest rate is 1000 objects/s Exploring sqlldr alternatives-- most come with a price Exploring sqlldr alternatives-- most come with a price Moving processed data off compute nodes is a challenge- takes about 10 hours if transfer rate is 100MB/s Moving processed data off compute nodes is a challenge- takes about 10 hours if transfer rate is 100MB/s New data movement tools making this more reliable and automatic New data movement tools making this more reliable and automatic Astrometry problems persist Astrometry problems persist With BCS data we find that astrometry errors are bad enough to produce double sources in a few percent of the images== this translates to at least one failure per co-added image With BCS data we find that astrometry errors are bad enough to produce double sources in a few percent of the images== this translates to at least one failure per co-added image Taking advice of Emmanuel Bertin to run SCAMP on a per exposure basis rather than a per image basis-- new astrometric refinement framework currently being tested Taking advice of Emmanuel Bertin to run SCAMP on a per exposure basis rather than a per image basis-- new astrometric refinement framework currently being tested

12 December 11, 2006 DES DM - Mohr DC2 Photometry and Astrometry Nightly spot checks-- no exhaustive testing so far Nightly spot checks-- no exhaustive testing so far Astrometry scatter plots look much like DC1 Astrometry scatter plots look much like DC1 Photometry scatter plots don’t look as good, but we think we have figured out why Photometry scatter plots don’t look as good, but we think we have figured out why Diffraction spikes/halos added to stars in ImSim2 Diffraction spikes/halos added to stars in ImSim2 Done in such a way as to augment total stellar flux Done in such a way as to augment total stellar flux This leads to an offset in our photometry at the few percent level This leads to an offset in our photometry at the few percent level Detailed statistics await further testing Detailed statistics await further testing What is full distribution of astrometric and photometric errors? What is full distribution of astrometric and photometric errors? How do both depend on seeing, location on the chip, intrinsic galaxy parameters, etc… How do both depend on seeing, location on the chip, intrinsic galaxy parameters, etc…

13 December 11, 2006 DES DM - Mohr Coaddition Framework Three steps to coaddition Three steps to coaddition Remapping images to std reference frame Remapping images to std reference frame Determining relative flux scale for overlapping remapped images Determining relative flux scale for overlapping remapped images Combining remapped images (with filtering) Combining remapped images (with filtering) DES DM enables a simple automated coadd DES DM enables a simple automated coadd Coadd tiling stored as metadata in the db Coadd tiling stored as metadata in the db db tools: db tools: find all tiles associated with image find all tiles associated with image find all images associated with tile find all images associated with tile Execution Execution Reduced images immediately remapped (SWarp) to each tile they overlap (and catalogued) Reduced images immediately remapped (SWarp) to each tile they overlap (and catalogued) Flux scales determined through (1) db object matching in overlapping images, (2) photometric calibration and (3) relative throughput of chips 1- 62 Flux scales determined through (1) db object matching in overlapping images, (2) photometric calibration and (3) relative throughput of chips 1- 62 Image combine (SWarp) happens en masse using archive to find correct image combinations Image combine (SWarp) happens en masse using archive to find correct image combinations Co-add Tiling DECam Imaging Layers

14 December 11, 2006 DES DM - Mohr BCS Coadd Tests Test framework by creating 46 coadd tiles that draw images from 10 different nights Test framework by creating 46 coadd tiles that draw images from 10 different nights griz, 36’X36’ with 0.26” pixels griz, 36’X36’ with 0.26” pixels <1hr job on server with 14 drive RAID5 disk array <1hr job on server with 14 drive RAID5 disk array Issues: Issues: Flux scaling ignored Flux scaling ignored Combine algorithm = sum Combine algorithm = sum Science quality? Science quality? Some astrometry failures (double sources) Some astrometry failures (double sources) z (3 deep) i (3 deep) r (2 deep) g (2 deep) 4’

15 December 11, 2006 DES DM - Mohr Weak Lensing Framework [Mike Jarvis, Bhuv Jain, Gary Bernstein, Erin Sheldon] Science Strategy: Science Strategy: start from complete object lists and measure shear for each object jointly using all available reduced data start from complete object lists and measure shear for each object jointly using all available reduced data Draft DES DM strategy: Draft DES DM strategy: Measure shapes of all objects on reduced images as part of standard reduction and cataloguing Measure shapes of all objects on reduced images as part of standard reduction and cataloguing Use isolated stars to model PSF distortions across the survey Use isolated stars to model PSF distortions across the survey Catalog on coadded images to create complete object lists Catalog on coadded images to create complete object lists Use archive tools to select all reduced objects (and images) for joint shear measurements that include PSF corrections Use archive tools to select all reduced objects (and images) for joint shear measurements that include PSF corrections Implementation just in infancy Implementation just in infancy Shape measurements: one more module for pipeline, db schema change Shape measurements: one more module for pipeline, db schema change Modeling PSF distortions: computational (not data) challenge Modeling PSF distortions: computational (not data) challenge Complete object lists: Coadd catalogs already available in db Complete object lists: Coadd catalogs already available in db Final shear measurements: a data challenge Final shear measurements: a data challenge Apply data parallel approach grouping by sky coordinates (coadd tiling) Apply data parallel approach grouping by sky coordinates (coadd tiling)


Download ppt "December 11, 2006 DES DM - Mohr The DES DM Team Tanweer Alam 1 Dora Cai 1 Joe Mohr 1,2 Jim Annis 3 Greg Daues 1 Choong Ngeow 2 Wayne Barkhouse 2 Patrick."

Similar presentations


Ads by Google