Download presentation
Presentation is loading. Please wait.
Published byMartha Sayres Modified over 9 years ago
1
Analysis and Workflows Lesson 10: Analysis and Workflows CC image by wlef70 on Flickr
2
Analysis and Workflows Review of typical data analyses Reproducibility & provenance Workflows in general Informal workflows Formal workflows CC image by jwalsh on Flickr
3
Analysis and Workflows After completing this lesson, the participant will be able to: o Understand a subset of typical analyses used o Define a workflow o Understand the concepts informal and formal workflows o Discuss the benefits of workflows CC image by cybrarian77 on Flickr
4
Analysis and Workflows Plan Collect AssureDescribePreserveDiscoverIntegrateAnalyze
5
Analysis and Workflows Conducted via personal computer, grid, cloud computing Statistics, model runs, parameter estimations, graphs/plots etc. CC image by hegemonx on Flickr CC image by tai viinikka on Flickr
6
Analysis and Workflows Processing: subsetting, merging, manipulating o Reduction: important for high-resolution datasets o Transformation: unit conversions, linear and nonlinear algorithms 0711070500276000 0711070600276000 0711070700277003 0711070800282017 0711070900285000 0711071000293000 0711071100301000 0711071200304000 Datetimeair tempprecip Cmm 11-Jul-075:0027.6000 11-Jul-076:0027.6000 11-Jul-077:0027.7003 11-Jul-078:0028.2017 11-Jul-079:0028.5000 11-Jul-0710:0029.3000 11-Jul-0711:0030.1000 11-Jul-0712:0030.4000 Recreated from Michener & Brunt (2000)
7
Analysis and Workflows Graphical analyses o Visual exploration of data: search for patterns o Quality assurance: outlier detection Box and whisker plot of temperature by month Scatter plot of August Temperatures Strasser, unpub. data
8
Analysis and Workflows Statistical analyses Conventional statistics o Experimental data o Examples: ANOVA, MANOVA, linear and nonlinear regression o Rely on assumptions: random sampling, random & normally distributed error, independent error terms, homogeneous variance Descriptive statistics o Observational or descriptive data o Examples: diversity indices, cluster analysis, quadrant variance, distance methods, principal component analysis, correspondence analysis From Oksanen (2011) Multivariate Analysis of Ecological Communities in R: vegan tutorial Example of Principle Component Analysis
9
Analysis and Workflows Statistical analyses (continued) o Temporal analyses: time series o Spatial analyses: for spatial autocorrelation o Nonparametric approaches useful when conventional assumptions violated or underlying distribution unknown o Other misc. analyses: risk assessment, generalized linear models, mixed models, etc. Analyses of very large datasets o Data mining & discovery o Online data processing
10
Analysis and Workflows Re-analysis of outputs Final visualizations: charts, graphs, simulations etc. Science is iterative: The process that results in the final product can be complex
11
Analysis and Workflows Reproducibility at core of scientific method Complex process = more difficult to reproduce Good documentation required for reproducibility o Metadata: data about data o Process metadata: data about process used to create, manipulate, and analyze data CC image by Richard Carter on Flickr
12
Analysis and Workflows Process metadata: Information about process (analysis, data organization, graphing) used to get to data outputs Related concept: data provenance o Origins of data o Good provenance = able to follow data throughout entire life cycle o Allows for Replication & reproducibility Analysis for potential defects, errors in logic, statistical errors Evaluation of hypotheses
13
Analysis and Workflows Formalization of process metadata Precise description of scientific procedure Conceptualized series of data ingestion, transformation, and analytical steps Three components o Inputs: information or material required o Outputs: information or material produced & potentially used as input in other steps o Transformation rules/algorithms (e.g. analyses) Two types: o Informal o Formal/Executable
14
Analysis and Workflows Workflow diagrams: Some basic building blocks Analytical process Data (input or output) Decision Predefined process (subroutine) Predefined process (subroutine) o Inputs or outputs include data, metadata, or visualizations o Analytical processes include operations that change or manipulate data in some way o Decisions specify conditions that determine the next step in the process o Predefined processes or subroutines specify a fixed multi- step process
15
Analysis and Workflows Output data, visualizations, associated metadata Data ingestion Raw data and associated metadata Data cleaning Analysis Step 1 Analysis Step 2 Output generatio n Workflow diagrams: Simple linear flow chart Conceptualizing analysis as a sequence of steps o arrows indicate flow
16
Analysis and Workflows Flow charts: simplest form of workflow Graph production Analysis: mean, SD Quality control & data cleaning Data import into R
17
Analysis and Workflows Flow charts: simplest form of workflow Transformation Rules Graph production Analysis: mean, SD Quality control & data cleaning Data import into R
18
Analysis and Workflows Flow charts: simplest form of workflow Inputs & Outputs Graph production Analysis: mean, SD Quality control & data cleaning Data import into R Salinity data Temperature data “Clean” T & S data Data in R format Summary statistics
19
Analysis and Workflows Analysis 2 Analysis 1 CONDITION false true CONDITIONAL LOOP IF Workflow diagrams: Adding decision points
20
Analysis and Workflows species name lat/long Data ingestion Data and metadata QA/QC (e.g., check for appropriate data types, outliers) Determine range (calculate convex hull of points) Compile list of unique taxa Generate output data and visualizations for taxa frequencies in stated range Data, visualizations, and metadata FOR counts Calculate taxa frequencies Workflow diagrams: a simple example
21
Analysis and Workflows false true A B Data ingestion Data and metadata Data cleaning Analysis 1B Analysis 1A Output generation Data, visualizations, and metadata Data and metadata FOR Predefined process (subroutine) Predefined process (subroutine) Data ingestion Data cleaning Data integration Data and metadata Data ingestion Data cleaning IF Analysis 2 Data integration Workflow diagrams: a complex example
22
Analysis and Workflows # % $ & Commented scripts: best practices Well-documented code is easier to review, share, enables repeated analysis Add high-level information at the top o Project description, author, date o Script dependencies, inputs, and outputs o Describes parameters and their origins Notice and organize sections o What happens in the section and why o Describe dependencies, inputs, and outputs Construct “end-to-end” script if possible o A complete narrative o Runs without intervention from start to finish
23
Analysis and Workflows Analytical pipeline Each step can be implemented in different software systems Each step & its parameters/requirements formally recorded Allows reuse of both individual steps and overall workflow CC image by AJ Cann on Flickr
24
Analysis and Workflows Benefits: Single access point for multiple analyses across software packages Keeps track of analysis and provenance: enables reproducibility o Each step & its parameters/requirements formally recorded Workflow can be stored Allows sharing and reuse of individual steps or overall workflow o Automate repetitive tasks o Use across different disciplines and groups o Can run analyses more quickly since not starting from scratch
25
Analysis and Workflows Example: Kepler Software Open-source, free, cross-platform Drag-and-drop interface for workflow construction Steps (analyses, manipulations etc) in workflow represented by “actor” Actors connect from a workflow Possible applications o Theoretical models or observational analyses o Hierarchical modeling o Can have nested workflows o Can access data from web-based sources (e.g. databases) Downloads and more information at kepler-project.org
26
Analysis and Workflows Drag & drop components from this list Actors in workflow Example: Kepler Software
27
Analysis and Workflows This model shows the solution to the classic Lotka- Volterra predator prey dynamics model. It uses the Continuous Time domain to solve two coupled differential equations, one that models the predator population and one that models the prey population. The results are plotted as they are calculated showing both population change and a phase diagram of the dynamics. Example: Kepler Software
28
Analysis and Workflows Output Example: Kepler Software
29
Analysis and Workflows Example: VisTrails Open-source Workflow & provenance management support Geared toward exploratory computational tasks o Can manage evolving SWF o Maintains detailed history about steps & data www.vistrails.org Screenshot example
30
Analysis and Workflows Science is becoming more computationally intensive Sharing workflows benefits science o Scientific workflow systems make documenting workflows easier Minimally: document your analysis via informal workflows Emerging workflow applications (formal/executable workflows) will o Link software for executable end-to-end analysis o Provide detailed info about data & analysis o Facilitate re-use & refinement of complex, multi-step analyses o Enable efficient swapping of alternative models & algorithms o Help automate tedious tasks
31
Analysis and Workflows Scientists should document workflows used to create results o Data provenance o Analyses and parameters used o Connections between analyses via inputs and outputs Documentation can be informal (e.g. flowcharts, commented scripts) or formal (e.g. Kepler, VisTrails) CC image geek calendar on Flickr
32
Analysis and Workflows Modern science is computer-intensive o Heterogeneous data, analyses, software Reproducibility is important Workflows = process metadata Use of informal or formal workflows for documenting process metadata ensures reproducibility, repeatability, validation
33
Analysis and Workflows 1. W. Michener and J. Brunt, Eds. Ecological Data: Design, Management and Processing. (Blackwell, New York, 2000).
34
Analysis and Workflows The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Analysis and Workflows. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L10_Analysis Workflows.pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.