BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

2 IMBER BEER Imbizo Welcome Times and discussion - ask (or write down) pertinent questions - this is a workshop Tea/coffee/BEER Who are we CROZEX and Crozet (Possession Island)

3 IMBER Data Management Data Management Committee Arrange data by Project First task is to engage and educate researchers to how good organization of data will benefit them Before, during and after the project field phase

4 The bottom line DM cannot be an afterthought If you give DM some thought when you first plan a project, it will be –relatively straightforward –not too much effort –remarkably useful to all participants –valuable to those who come after

5 DM topics (data management) Cookbook ( Recognition for DM Data Scientist Best Practice (e.g. BCO-DMO*) Data and Metadata (e.g. CSR) cruise summary Data Centers – national (e.g. BODC) Data Centers – specialist (e.g. OBIS, CCHDO, COPEPOD) IMBER Data Portal Biological and Chemical Oceanography Data Management Office

6 Writing papers Writing papers is an essential part of a researchers job Writing papers is time consuming Writing papers is tedious/boring Writing papers needs attention to detail Publications are a legacy of your research

7 Data management Data management is an essential part of a researchers job Data management is time consuming Data management is tedious/boring Data management needs attention to detail Data sets are a legacy of your research

8 So why do we accept that we must write papers, but treat DM as the poor relation? Because everybody else does! Because we get recognition for publishing But not for DM - seek to change this But in fact: Our published interpretation may be wrong A good data set can be reinterpretted (..Fe) So the data set is a more objective legacy of a cruise (say) which cost a huge amount and cannot easily be repeated

9 Recognition for DM Carrots and sticks SCOR is considering how to allocate DOIs (Digital Object Identifiers) to data sets –At what level? –Quality control? Put it on your CV –Act as Data Scientist to a project/cruise –Breadth of interest –Management experience –Contribute to promotion/pay rise

10 Being a Data Scientist is FUN! Raymond Pollard

11 So, what is a Data Scientist? The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data Why is it FUN? - because you learn so much yourself by having to talk to people Can be full or part-time; paid or unpaid; hire, cajole or volunteer

12 Key role 1 - talking to people Find out what they do and how they document it - methods, accuracy, … What do they need from others - positions, water temperature, … How do they store and back up their data. Do they back it up??! What do they do with the data - calibrate, compare, sort, …

13 Range of data Be aware of huge range of data types and quantities. People are blinkered by their own experience E.g.volumes: –nutrients - 24 values per CTD cast –T&S - 5,000 to 100,000 values per cast –Turbulence - millions Storage –Nutrients - PC spreadsheet –T&S, navigation - central workstation –Turbulence - dedicated workstation

14 Key role 2 - helping PIs back up their data –paper copies –copy to central server document their data, e.g. –help with metadata –create forms for them obtain data from others for them by masterminding an Event Log

15 Key role 3 - documentation Document as much as possible yourself Take copies of PIs handwritten records Use USB stick to copy their spreadsheets –be diplomatic –assure them you will NOT copy to others –emphasize the value of duplication Create your own summary spreadsheets

16 Key role 4 - assist Principal Scientist Help PS enforce unique referencing Maintain and post an Event Log –of stations occupied –accurate station times and positions, etc Quietly advise PS if a PI is not coping –with data rate –documentation Prepare or help PS prepare CSR

17 Why cant the PS do most DS tasks? Not his priority (optimize cruise program) Maybe not his forte Too much work

18 Possible role 5 - primary data Scientists often seem to assume that universally required data (time, navigation, CTD depth, temp, surface and met data) appears from thin air In fact, those data need careful calibration DS may need to do this, if no other person is responsible – at least check it e.g. WHPO => CCHDO GEOTRACES (Chris Measures)

19 What does the DS gain? Broadening your experience, learning from other PIs Advancing your own DM skills Great management training! (listening to others, looking for problems) Looks great on your CV You might even get paid

