Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice.

Similar presentations


Presentation on theme: "Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice."— Presentation transcript:

1 Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice and issues  Astronomy: Bob Mann, Alex Szalay  Earth Sciences: Jim Frew, Dave Maier  Others… Draw up list of issues Draw up list of issues Discussion Discussion

2 Some provenance & data derivation issues in astronomy Bob Mann Institute for Astronomy, Edinburgh Univ. & National e-Science Centre

3 Bob MannChicago Provenance Workshop Outline Trends in astronomy & implications for provenance Trends in astronomy & implications for provenance Two provenance issues Two provenance issues  Recording provenance in the FITS data format  Provenance in database federation Alex Szalay: Provenances in pipelines and databases Provenances in pipelines and databases Annotations in astronomy databases Annotations in astronomy databases

4 Bob MannChicago Provenance Workshop Evolution in astronomical practice “Collectivisation & the empowerment of the individual” “Collectivisation & the empowerment of the individual”  Fewer individual observational programmes and more sky surveys  More people access the data, via archives “The specialist is dead, long live the generalist” “The specialist is dead, long live the generalist”  Use multi-wavelength data  Expertise in classes of astronomical object, not observational techniques

5 Bob MannChicago Provenance Workshop Implications for provenance More science being done with data that the individual scientist didn’t take More science being done with data that the individual scientist didn’t take …& about which the scientist knows less More reliance on pipeline processing More reliance on pipeline processing More science with catalogues of source attributes derived from primary data More science with catalogues of source attributes derived from primary data More science being done through combining data from multiple sources – more later More science being done through combining data from multiple sources – more later

6 Bob MannChicago Provenance Workshop FITS: Flexible Image Transport System Format of a FITS file (http://fits.gsfc.nasa.gov) Format of a FITS file (http://fits.gsfc.nasa.gov)  Primary Header: metadata describing instrument, observation & file contents  Primary Data Array: array of dimensions – usually a 2D image + none or more Extensions:  Array, ASCII Table or Binary Table, each with Header (New FITS-inspired XML format – VOTable)

7 Bob MannChicago Provenance Workshop FITS header entries Keyword-value pairs + optional comment Keyword-value pairs + optional comment e.g. PLTSCALE= '67.14 ' / [arcsec/mm] plate scale Three types of header keyword Three types of header keyword  Mandatory – e.g. NAXIS  Optional – e.g. DATAMAX  Additional – i.e. user-defined, but not from restricted list (mandatory + optional)

8 Bob MannChicago Provenance Workshop Provenance in FITS headers Many optional keywords related to provenance: Many optional keywords related to provenance:  ORIGIN, DATE-OBS, TELESCOP, INSTRUME, OBSERVER, REFERENC plus HISTORY – ` plus HISTORY – ` The text should contain a history of steps and procedures associated with the processing of the associated data. Any number of HISTORY card images may appear in a header.’ (FITS Standard)

9 Bob MannChicago Provenance Workshop Example FITS header extracts (1) SIMPLE = T / file does conform to FITS standard BITPIX = 32 / number of bits per data pixel NAXIS = 2 / number of data axes NAXIS1 = 648 / length of data axis 1 NAXIS2 = 648 / length of data axis 2 EXTEND = T / FITS dataset may contain extensions BUNIT = 'Primary Array' / Units of the image XPROC0 = 'evselect table=''product/P PNU002PIEVLI0000.FIT:EVENTS'' w&‘ CONTINUE 'ithfilteredset=no filteredset=''filtered.fits'' keepfilteroutput=no&‘ CONTINUE ' destruct=yes flagcolumn=''EVFLAG'' flagbit=-1 filtertype=''expres&‘ CONTINUE 'sion'' expression=''GTI(intermediate/GlobalHK-all-1-Attitude_GTI-X0&‘ CONTINUE ' fits, TIME) && GTI(intermediate/pnEvents-epn-1-EPIC_flare&‘ CONTINUE '_GTI-U fits:STDGTI, TIME) && (RAWY>12) && (PATTERN 12) && (PATTERN<=4) &’ CONTINUE ' (PI in (200:12000]) && (PI>=500 || (PI =500 || (PI<500 && FLAG & 0x8 == 0 && P&’ CONTINUE 'ATTERN==0)) && (FLAG & 0x2fa0024) == 0'' dssblock='''' writedss=yes&‘ CONTINUE ' cleandss=no updateexposure=yes filterexposure=yes blockstocopy=''&' CONTINUE ''' attributestocopy='''' energycolumn=''PHA'' withzcolumn=no zcolu&‘ … New Keyword Multi-line entry

10 Bob MannChicago Provenance Workshop Example FITS header extracts (2) XTENSION= 'IMAGE ' / Image extension BITPIX = 16 / Bits per pixel NAXIS = 2 / Number of axes … HISTORY This is the end of the header written by the ING observing-system. WAT0_001= 'system=image' WAT1_001= 'wtype=zpx axtype=ra projp1=1.0 projp3=220.0' WAT2_001= 'wtype=zpx axtype=dec projp1=1.0 projp3=220.0' … TRIM = 'Sep 2 16:14 Trim data section is [51:2098,1:4100]' BP-FLAG = 'Sep 2 16:14 Bad pixel file is /home/jrl/wfcred/stds/A bad' BT-FLAG = 'Sep 2 16:14 Overscan section is [1:50,1:4128] with mean= ' BI-FLAG = 'Sep 2 16:14 Zero level correction image is /data/cass03a/was/mframe‘ FF-FLAG = 'Sep 2 16:14 Flat field image is /data/cass03d/was/mframes/r_ ‘ ILLUMCOR= 'Sep 2 16:14 Illumination image is tmpill.pl with scale= ' … End of header entries generated at telescope Keywords describing data reduction process

11 Bob MannChicago Provenance Workshop Example FITS header extracts (3) SIMPLE = T / file does conform to FITS standard BITPIX = 16 / number of bits per data pixel … NHKLINES= 146 / Number of lines from house-keeping file HKLIN001= 'JOB.JOBNO UKJ349' / HKLIN002= 'JOB.DATE-MES 1998:09:29' / … HISTORY = 'SuperCOSMOS image analysis and mapping mode (IAM and MM)' / HISTORY = 'data written by xydcomp_ss.' / HISTORY = 'Any questions/comments/suggestions/bug reports should be sent' / HISTORY = 'to / House-keeping = provenance metadata

12 Bob MannChicago Provenance Workshop FITS provenance - summary Header keywords designed for recording provenance information – esp. HISTORY Header keywords designed for recording provenance information – esp. HISTORY HISTORY cards written in free text – not readily machine-interpretable HISTORY cards written in free text – not readily machine-interpretable Project-specific provenance keywords not readily interpretable at all outside project Project-specific provenance keywords not readily interpretable at all outside project

13 Bob MannChicago Provenance Workshop Provenance in database federation Sky survey databases in many wavebands Sky survey databases in many wavebands New science from federating them New science from federating them Need to associate entries in different DBs Need to associate entries in different DBs Unified Column Descriptors (UCDs): Unified Column Descriptors (UCDs):  Taxonomy based on collation of column names from hundreds of databases Location on sky provides natural indexing Location on sky provides natural indexing

14 Bob MannChicago Provenance Workshop Matching by proximity not always adequate Need to know more about astrophysical properties of two populations to know which of the red objects is the most likely counterpart to the cyan source

15 Bob MannChicago Provenance Workshop Recording association provenance Might want to record associations in DBs Might want to record associations in DBs Users want to know whether to trust them Users want to know whether to trust them Complex probabilistic association algorithms Complex probabilistic association algorithms  Difficult to describe easily Associations may change in light of new data Associations may change in light of new data  Can users challenge them via annotation?

16 Bob MannChicago Provenance Workshop Summary Astronomers record lots of provenance info Astronomers record lots of provenance info  Want machine-interpretability Some astronomical provenance is complex Some astronomical provenance is complex  Want means of describing algorithms Starting to get links between databases and online copies of scientific papers Starting to get links between databases and online copies of scientific papers No culture of annotation by users - yet No culture of annotation by users - yet

17 Bob MannChicago Provenance Workshop


Download ppt "Bob MannChicago Provenance Workshop Non-bio (necro-?) sciences (Jim Frew, Bob Mann) Examples of current practice and issues Examples of current practice."

Similar presentations


Ads by Google