Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thoughts on Data Management Nicholas Schwarz Software Services Group Advanced Engineering Support (AES) Division Advanced Photon Source (APS) 25 June 2013.

Similar presentations


Presentation on theme: "Thoughts on Data Management Nicholas Schwarz Software Services Group Advanced Engineering Support (AES) Division Advanced Photon Source (APS) 25 June 2013."— Presentation transcript:

1 Thoughts on Data Management Nicholas Schwarz Software Services Group Advanced Engineering Support (AES) Division Advanced Photon Source (APS) 25 June 2013

2 What is Data Management? Data Management is the development and execution of  architectures,  practices and procedures, and  policies that properly manage our data lifecycle needs. Thoughts on Data Management - SSG - 14 June 2013 2

3 Architecture The architecture is the unambiguous definition of data, and the data storage and distribution infrastructure, i.e. hardware and software. Data Examples  Data are files on disk  Data are a list of names and telephone numbers  Data are a tuple of real numbers  Data are … Hardware and Software Examples  Each sector has a dserv with storage  There is central storage  There is one internal and one external GlobusOnline endpoint  A web-based system is used to set ownership permissions Thoughts on Data Management - SSG - 14 June 2013 3

4 Practices and Procedures Standard practices and procedures are required so that data can be handled properly. These practices and procedures must be embedded in regular operations processes. Examples  All measurement data must be saved to the local sector’s dserv every 24 hours  Selected measurement data must be transferred to central storage  Data on central storage must be saved in /data/managed/esaf123456  Data to be archived indefinitely must be flagged for archival within 7 days of the end of the experiment period Thoughts on Data Management - SSG - 14 June 2013 4

5 Policies Data policies dictate what is done with data so that data management helps meet the organization’s goals and operates within its requirements. Examples  All systems must comply with requirements in ANL-593  Only members of an ESAF can access data collected with that ESAF  APS firewalls must not change  APS must not loose data when outside network connection is lost  Data management at one sector must not interfere with data collection at another sector  All measurement data must be kept for 90 days  All metadata should be kept indefinitely  Old metadata must be accessible within 48 hours of a request Thoughts on Data Management - SSG - 14 June 2013 5

6 Interdependency Data polices, practices and procedures, and architecture drive each other. Examples Policy: data management at one sector must not interfere with data collection at another sector Architecture: distributed server (dserv) for each sector Architecture: The only commonality of APS data is that it is stored in files Architecture: Data ownership enforcement mechanism is based on file system permissions Policy: APS must not loose data when outside network connection is lost Practices and procedures: Data is stored internal to the APS Thoughts on Data Management - SSG - 14 June 2013 6

7 Thoughts / Questions / Tasks Define what data management is to the APS. Thoughts on Data Management - SSG - 14 June 2013 7

8 Perspectives Data management depends on your perspective…  User / Scientist –Do science –Output measured primarily by publications (patents)  Facility –Produce x-rays (maximize uptime) –Maximize data collection Thoughts on Data Management - SSG - 14 June 2013 8

9 User / Scientist Perspective Thoughts on Data Management - SSG - 14 June 2013 9 Laboratory Microscope Data Synchrotron Derived Data Publication  Multiple figures  Different types of data

10 User / Scientist Perspective Thoughts on Data Management - SSG - 14 June 2013 10 Synchrotron Derived Data Even a single figure with synchrotron data may have data from multiple facilities.

11 User / Scientist Perspective Thoughts on Data Management - SSG - 14 June 2013 11 Normalize Intensity Cell Finding Algorithm Data Fusion Synchrotron Derived Data Process of analyzing data generates new knowledge and data (and metadata).

12 Facility Perspective SourcesTypeExample NAdministrative DataPI, User Dates Description ESAF, BTR, GUP … NExperiment / Measurement DataSample and sample conditions Area Detector images Point detector scalars Motor positions Energy (Undulator, Monochromator) … NBeamline / Sector Data BL 1-XX, BL 2-XX, …, BL 35-XX Sector 1, Sector 2, …, Sector 35 Energy (Undulator, Monochromator) … 1Accelerator DataMachine Data Status Orbit, Power Supply … Thoughts on Data Management - SSG - 14 June 2013 12

13 Thoughts on Data Management - SSG - 14 June 2013 13 Publication Data Source 1Data Source 2Data Source N Synchrotron 1 DataSynchrotron 2 DataSynchrotron N Data Administrative Data Sample / Experiment / Measurement Metadata Accelerator Data Analysis Measured Data Facility User / Scientist

14 Thoughts / Questions / Tasks What’s the perspective of the APS? APS is a (one-of-many) scientific instruments As a facility, what can the APS do to enable science without knowing what goes on outside the facility, and with little control of what goes on outside the facility?  Every facility agrees and does the exact same thing? –Data formats, equipment, passwords, etc.  Help facilitate transition of data from facility to user? Thoughts on Data Management - SSG - 14 June 2013 14

15 Data Management at the APS 1.What is/are our architecture (data, hardware, software), practices and procedures, and policies for data management? 1.As a facility, what can the APS do to enable science without knowing what goes on outside the facility, and with little control of what goes on outside the facility? 1.What are our limitations? 1.What do we hope to be? –Streamlined facility so the user can realize their perspective Thoughts on Data Management - SSG - 14 June 2013 15

16 APS Architecture - Data Many types of data at the APS  Administrative Data – well defined  Accelerator Data – well defined  Beamline Data - varies  Measurement/Experiment Data – defined based on technique/beamline/user –Great variability: commonality is files on disk –Database entries for protein crystallography One experiment has data from all of these categories Thoughts on Data Management - SSG - 14 June 2013 16

17 APS Policies Goal: Streamlined facility so users can realize their science perspective Policies  Maximize data collection  ANL-593  Operate without outside network  Firewalls can not change  Data ownership (only data owners can see their data)  Data should be deleted after some set amount of time  Many, many more to follow… Implications  No Cloud-only based solution  Critical services work internally  User access is tied to APS computer access Thoughts on Data Management - SSG - 14 June 2013 17

18 Data Management Roles Data AdministratorGroup ManagerUser Experiment (or Project) Directoryrw Data administrator owns all group directories enforced at creation time r Group manager is in experiment group Experiment directory is rx for group r User is in experiment group Experiment directory is rx for group Data in Experiment (or Project) Directory rw Data administrator owns all files and subdirectories enforced with inotify script rw Group manager is in experiment group Experiment directory is rwx for group rw User is in experiment group Experiment directory is rwx for group Experiment (or Project) Groupcreate group modify group member modify group members Group manager uid has additional group owner attribute in schema none User can not modify group Thoughts on Data Management - SSG - 14 June 2013 18

19 APS Architecture - Hardware Thoughts on Data Management - SSG - 14 June 2013 19 Beamline Acquisition Computer dserv lustre gridFTP Server Internal gridFTP ServerExternal GO Endpoint Beamline Acquisition Computer dserv Beamline Acquisition Computer dserv Globus APS Firewall

20 APS Architecture – Software Internal Transfer & Tracking  Storage Resource Broker (SRB) (SDSC)  SPADE (ALS-LBL)  Modify our internal workflow pipeline (APS-ANL)  SLAC has an internal system  XRootD SSG is investigating which to adopt User Accounts  Integrate user badges with APS LDAP Management  Develop web site for modifying ownership and access permissions Thoughts on Data Management - SSG - 14 June 2013 20

21 APS Architecture – Software External Transfer & Access  GlobusOnline provides access to APS data from the outside  Users authenticate using their APS badge number and password  Users can only see their data  Users can integrate with other Globus tools Thoughts on Data Management - SSG - 14 June 2013 21

22 APS Practices and Procedures Data Storage Workflow  Data should be transferred from the acquisition computer to the local dserv  Data on the dserv is transferred to lustre storage at one of the following intervals: –Immediately –Daily (at a designated time) –Every Tuesday @ 8AM –At the end of an experiment –At the end of a run  Data on lustre is automatically deleted at a time determined by APS policy Thoughts on Data Management - SSG - 14 June 2013 22

23 APS Practices and Procedures Data Storage Organization Experiment Data  Experiment data must be stored in a directory named e[EASFNumber]_[PILastName], e.g. e123456_Smith  Experiment data directories must be located in /data/managed/experiments/r[RunNumber], e.g. /data/managed/experiments/r2013-2  /data/managed/experiments/r2013-2/e123456_Smith Project Data  Project data must be stored in a directory named p[ProjectID]_[ProjectName], e.g. p000001_MyProject  Project data directories must be located in /data/managed/projects  /data/managed/projects/p000001_MyProject Thoughts on Data Management - SSG - 14 June 2013 23

24 Thoughts on Data Management - SSG - 14 June 2013 24


Download ppt "Thoughts on Data Management Nicholas Schwarz Software Services Group Advanced Engineering Support (AES) Division Advanced Photon Source (APS) 25 June 2013."

Similar presentations


Ads by Google