Presentation is loading. Please wait.

Presentation is loading. Please wait.

OTN Workshop 2015 OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager.

Similar presentations


Presentation on theme: "OTN Workshop 2015 OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager."— Presentation transcript:

1 OTN Workshop 2015 OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager

2 Start OTN Sandbox On your laptop  open command prompt  Navigate to Desktop/OTN Sandbox  Execute command ‘vagrant up’

3 Outline Background Platform Overview Quick R-Studio review Changes with Exercises - Data folder management - Filter function - Distance Matrix - Compressed data New functions with exercises - Compress Function - Add Column Unique ID Function - Cohort Data Function Wrap Up

4 OTN Sandbox Backround Symposium 2013 researcher requests First incarnation 2014 Evolution and improvements New Functions for 2015

5 OTN SandBox Platform Free open software Black Box Oracle Virtual Box HashiCorp Vagrant Rstudio IPython Notebook Postgresql

6 OTN SandBox Tools (last year) White-Mihoff False Filtering Tool Builds a file of suspect detections Creates a file of filtered detections Creates a distance matrix Distance Matrix Merge Outputs a matrix overriding distances with researcher input Mihoff Interval Data Tool Creates a file of Compressed detections and a file of Interval data Miscellaneous File Conversion (UTF8) Cleanup

7 OTN SandBox Tools - Changes Filter Function added new parameter: detection radius change in file structure for distance matrix output Distance Matrix Merge Function Changes change in file structure for distance matrix inputs and output Interval Data Function Changes change in file structure for distance matrix input new additional column on compressed data output file 'Average time between Detections'

8 OTN SandBox Tools – New Functions Compress Function first step of Interval Data Function split out on its own exact same output Add Column Unique ID Function takes any file and adds column unqdetecid sequential integer values no validation is done on input file so can be used for any type of file Cohort Data Function Input a compressed detection file and time parameter Identifies groups of animals which visit stations within time period.

9 Sign In Open Chrome or Firefox Paste sandbox URL Sign in Username: sandbox Password: otn123 Will not work with VPN turned on

10 R-Studio Navigation Look at bottom right corner Click on the folder RStudio

11 R-Studio Navigation Creating a New folder Click the New Folder area on the Files tab Give your folder a name

12 R-Studio Navigation Rename a folder Click the Rename area on the Files tab Give your folder a new name

13 R-Studio Navigation More Click the More area on the Files tab to see other options

14 Data Folder Management Big change from last year No longer required to import and export your data folder Now the data folder stays on your laptop and is always visible to the Sandbox NEVER EVER Delete or Rename folder data in OTNSandbox Instead copy your data folder

15 Data Folder Management Save the data folder by making a copy Navigate to OTNSandbox/data Right click on data folder Chose copy Go Back to OTNSandbox/ Right Click and paste

16 Data Folder Management Empty the data folder after you have copied the folder open folder data Ctrl a Right click on highlighted area Choose delete

17 Data Folder Management Go to Sample data link http://members.oceantrack.org/toolbox/workshop Click 2015.zip Save to Desktop/OTNSandbox

18 Data Folder Management Navigate to OTNSandbox Unzip (Right click, Extract All) Drill down to folder with files Open Crtl a Right click Choose copy Retrieve the sample data

19 Data Folder Management Paste sample data into data folder Navigate to OTNSandbox/data folder Open folder data Right click Chose paste Open file CutPaste_file.txt in a text editor

20 Data Folder Management Data folder in RStudioData folder on lap top

21 Changes to existing functions Filter Driver new parameter: detection radius optional Distance Matrix Merge Driver Two new columns added to input and output files Compressed Data File New column on output file: Avg_min_between_det

22 Create a work shop folder for test scripts Go to Rstudio menu bar Click New Folder button on Files Menu Type in folder name Click OK

23 Exercise: Interval Data We are going to do three exercises with the interval data tool Each involves a different distance matrix 1.A matrix with no values for detection radius 2.A matrix with detection radius 3.A matrix with detection radius and some real distances We will look at the output and see what changes

24 Exercise: Interval Data Open sandbox folder Click file interval_data_driver.r Will open in upper left window Save to WorkShop Scripts folder

25 Exercise: Interval Data Using distance matrix without detection radius In the top left pane edit the script by typing the files names in yellow Save the script

26 Exercise: Interval Data Compressed data output file Example of how to use new column avg_min_between_detections Open file matched_detections_2013_wo_radius_compressed_detections_v00.csv with XLS or ODT Record 1650 Shows 7 detections with average 277.9 minutes between them Indicates a problem One or more may be suspect

27 Exercise: Interval Data Interval data output file Open file matched_detections_2013_wo_radius_interval_data_v00.csv with XLS or ODT Look at records 9 through 15 The last column is velocity

28 Exercise: Interval Data Using distance matrix with detection radius Copy the highlighted lines of code Paste just below Edit by changing the input file names Change _wo_ to _w_

29 Exercise: Interval Data Execute the three lines of code Highlight Click Run

30 Exercise: Interval Data Interval data output file Open file matched_detections_2013_w_radius_interval_data_v00.csv with XLS or ODT Look at records 9 through 15 The last column now shows zero velocity

31 Exercise: Interval Data Interval data output file Same file looking at from station HFX036(caught/lost/found) caught/lost/found means receiver was recovered at a different place than deployed Loaded with the recover lat and long Don’t know when it went off station

32 Distance Matrix real Distances Provide real distances for lost found receivers

33 Exercise: Interval Data Using distance matrix with detection radius and real distances Copy and paste same three lines of code Change the file names as above Hightlight and execute

34 Exercise: Interval Data now if you look at those records the velocity is reasonable

35 Exercise: Compress data (New) Open file compress_driver in the upper left pane You do this by clicking on the file in folder Home> Rstudio> sandbox Change the input file name as shown in yellow

36 Exercise: Compress data (New) Highlight code and execute Look at the messages Go to the data folder on your laptop and open file vue_export_reformatted.csv in a text editor

37 Exercise: Compress data (New) Rename column names in the header record as follows: date_and_time_utc to datecollected Transmitter to catalognumber Receiver to station Save file Edit file vue_export_reformated.csv

38 Exercise: Compress data (New) Edit the script by changing the filename Highlight code and execute Look at the messages Still one column missing: unqdetecid

39 Exercise: Add column unqdetecid Open file add_column_unqdetecid.r in sandbox folder Change input file name to the one you just edited You can cut from the message pane and paste Highlight code and execute

40 Back to Exercise: Compress data (New) Highlight code and execute Go back to the compress driver script We are going to use a different input to get ready for next function Change the input filename as shown in yellow

41 Exercise: Cohort data (New) Open file cohort_driver.r in sandbox folder Change input file name to value highlighted in yellow You can cut from the bottom of the message pane and paste Highlight code and execute

42 Exercise: Cohort data (New) Looking at the messages 242 incidents of animals appearing at stations close together were identified Open the output file from your laptop

43 Cohort data file

44 Documentation and Software Location Introduction page with links http://members.oceantrack.org/data/otn-tool-box

45 Folder Structure: Documentation Direct link to documentation: http://members.oceantrack.org/toolbox/ttp://members.oceantrack.org/toolbox/

46 Teach yourself to program Free open software Extremely powerful Standardized IPython: rival to MATLAB and Rstudio Can embed R code and java script PostgreSQL

47 How? Coursera Rice University : An Introduction to Interactive Programming in Python TBA https://www.coursera.org/specialization/fundamentalscomputing/9?utm_medium=catalogSpec Johns Hopkins: GitHub; and R Programming both Part of the "Data Science" Specialization"Data Science" Specialization https://www.coursera.org/course/datascitoolbox https://www.coursera.org/course/rprog University of Michigan : Programming for Everybody https://www.coursera.org/course/pythonlearn

48 PostgreSQL: Online Tutorials http://www.postgresqltutorial.com /

49 Questions?


Download ppt "OTN Workshop 2015 OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager."

Similar presentations


Ads by Google