Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics.

Similar presentations


Presentation on theme: "Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics."— Presentation transcript:

1 Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics Dept, Biomedical & Behavioral Methodology Core (BBMC) REDCap Con Sept 23, 2014

2 Accessing REDCap Data 1.Manual Import & Export (eg, through CSV files) – Require human interaction every time. 2.Dynamic Data Pull (pull data from an external system) 3.REDCap’s API (application programming interface) – The API allows nonhumans to interact with each other directly (i.e. R, SAS, python, etc.). 4.REDCapR call REDCap’s API (an R library) – Provides functions that wrap around calls to API. – Write 1 line of R code instead of ~40 lines.

3 Python Interaction Packages 1.PyCap "PyCap is an interface to the REDCap Application Programming Interface (API). PyCap is designed to be a minimal interface exposing all required and optional API parameters." (sburns.org/PyCap) 2.django-redcap "Utilities for porting REDCap projects to and from Django models." (github.com/cbmi/django-redcap)

4 R Interaction Packages 1.redcapAPI This is the most active fork of Jeffrey Horner’s ‘redcap’ package, now developed by Benjamin Nutter. A complete list of forks can be found through GitHub. (github.com/nutterb/redcapAPI) 2.REDCapR a similar package that also streamlines API calls from R to REDCap (github.com/Ouhsc/REDCapR)

5 Required pieces of information for API 1.The URL of the REDCap server. 2.A “token”, which is a hash that combines: – The specific REDCap project (within the REDCap server). – The specific user. – The user’s password.

6 Security Could spend 4 hours discussing security details. – Consult REDCap IT staff and/or our team. Use a private GitHub repository. (free for academics) Be careful with REDCap tokens. (ie, passwords) Get PHI into REDCap & SQL as early as possible. – We regularly receive CSVs & XLSXs from partners. – DB files aren’t accidentally copied or ed. – And try to store derivative datasets in REDCap & SQL instead of on the file server.

7 R is a free software environment for statistical computing and graphics. R compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. For more information: About R

8 What is this REDCapR you speak of? REDCapR is an R package developed to streamline API calls from R to REDCap by encapsulating various functions. REDCapR

9 “Necessity is the mother of invention” -English Proverb REDCapR was born out of the necessity of breaking one large data call from REDCap, which is prone to timing out, into multiple small calls to REDCap. From the user perspective, the data call has the look of one call. From REDCap’s perspective, the data call is multiple smaller calls that are later assembled. Created to help with the Maternal Infant and Early Childhood Home Visiting (MIECHV) evaluation. REDCapR History

10 Our current MIECHV investigation uses two REDCap projects: Recruiting: 84,000 records and 204 fields (17 million EAV rows) Community Survey: 1,500 records and 2,330 fields (3.5 million EAV rows) We were constantly timing out the operations, and tying up the server. Timeouts make things unpredictable and unnecessarily DOS our own people repeatedly. Motivation for REDCapR

11 REDCapR Installation ### Read short intro at # https://github.com/OuhscBbmc/REDCapR ### Choice 1: Either install the stable version from CRAN install.packages("REDCapR") ### Choice 2: Or install the development version from GitHub install.packages("devtools") devtools::install_github(repo="OuhscBbmc/REDCapR") ### Load the 'REDCapR' package into R's memory # so the functions are more easily accessible. library(REDCapR) REDCapR Installation

12 create_batch_glossary redcap_column_sanitize redcap_download_file_oneshot redcap_project redcap_read redcap_read_oneshot redcap_upload_file_oneshot redcap_write redcap_write_oneshot retrieve_token validate_for_write REDCapR Functions

13 Data extraction: redcap_read_oneshot Read/export records from a REDCap project. redcap_read Read/export records from a REDCap project in subsets, and stacks them together before returning a data.frame. REDCapR Data Extraction

14 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) Several arguments of the redcap_read function will be discussed, however it should be noted that not all arguments are required. This function can be used with a statement as simple as: redcap_read(redcap_uri, token) redcap_read

15 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) batch_size:The maximum number of subject records a single batch should contain. The default is 100. redcap_read

16 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) interbatch_delay:The number of seconds the function will wait before requesting a new subset from REDCap. The default is 0.5 seconds redcap_read

17 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) redcap_uri:The URI of the REDCap project. Required. Note: In computing, a uniform resource identifier (URI) is a string of characters used to identify a name of a web resource. Such identification enables interaction with representations of the web resource over a network (typically the World Wide Web) using specific protocols. redcap_read

18 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) token:The user-specific string that serves as the password for a project. Required. redcap_read

19 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) records:An array, where each element corresponds to the ID of a desired record. Optional. REDCapR

20 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) fields:An array, where each element corresponds to a desired project field. Optional. redcap_read

21 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) export_data_access_groups:A boolean value that specifies whether or not to export the “redcap_data_access_group” field when data access groups are utilized in the project. Default is FALSE. redcap_read

22 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) raw_or_label:A string (either ‘raw’ or ‘label’) that specifies whether to export the raw coded values or the labels for the options of multiple choice fields. Default is ‘raw’. redcap_read

23 Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL,fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) cert_location:If present, this string should point to the location of cert files required for SSL verification. If the value is missing or NULL, the server’s identity will be verified using a recent CA bundle from the cURL website. Optional. redcap_read

24 Details Specifically, redcap_read internally uses multiple calls to redcap_read_oneshot to select and return data. Initially, only primary key is queried through the REDCap API. The long list is then subset into partitions, whose sizes are determined by the batch_size parameter. REDCap is then queried for all variables of the subset’s subjects. This is repeated for each subset, before returning a unified data.frame. The function allows a delay between calls, which allows the server to attend to other users’ requests. redcap_read

25 Data import: redcap_write_oneshot: writes data to REDCap all at once redcap_write: writes data to REDCap in subsets REDCapR Data Import

26 Usage ### Sample Code redcap_write(ds_to_write, batch_size = 10L, interbatch_delay = 0.5, redcap_uri, token, verbose = TRUE) This function contains many similar arguments to redcap_read. The new argument, ds_to_write, is the R data.frame that is going to be imported into a REDCap project. redcap_write

27 Exporting records (less secure) ### Declare the address of the server and # your token (ie, hash of project_id, username, password) uri <- "https://bbmc.ouhsc.edu/redcap/api/" token <- "9A C4E5F03428B8AC3AA7B" ### Call the server result_read <- redcap_read(redcap_uri=uri, token=token) ### Extract the dataset from the results ds <- result_read$data ds record_id first_name age 1 1 Nutmeg Tumtum Marcus Trudy John Lee 58 record_id first_name age 1 1 Nutmeg Tumtum Marcus Trudy John Lee 58

28 Comparison against Minimal ### Call the server rawCsvText <- RCurl::postForm( uri = uri, token = token, content ='record', format = 'csv', type = 'flat',.opts = curlOptions(ssl.verifypeer=FALSE) ) ### Convert raw text into a data.frame ds <- read.csv(text=rawCsvText, stringsAsFactors=FALSE) ### Call the server result <- redcap_read(redcap_uri=uri, token=token) ### Pull out the dataset from the results ds <- result$data

29 Comparison without batching redcap_read_oneshot <- function( redcap_uri, token, records=NULL, records_collapsed="", fields=NULL, fields_collapsed="", export_data_access_groups=FALSE, raw_or_label='raw', verbose=TRUE, cert_location=NULL ) { start_time <- Sys.time() if( missing(redcap_uri) ) stop("The required parameter `redcap_uri` was missing from the call to `redcap_read_oneshot()`.") if( missing(token) ) stop("The required parameter `token` was missing from the call to `redcap_read_oneshot()`.") if( nchar(records_collapsed)==0 ) records_collapsed <- ifelse(is.null(records), "", paste0(records, collapse=",")) #This is an empty string if `records` is NULL. if( nchar(fields_collapsed)==0 ) fields_collapsed <- ifelse(is.null(fields), "", paste0(fields, collapse=",")) #This is an empty string if `fields` is NULL. export_data_access_groups_string <- ifelse(export_data_access_groups, "true", "false") if( missing( cert_location ) | is.null(cert_location) | (length(cert_location)==0)) cert_location <- system.file("cacert.pem", package="httr") if( !base::file.exists(cert_location) ) stop(paste0("The file specified by `cert_location`, (", cert_location, ") could not be found.")) config_options <- list(cainfo=cert_location, sslversion=3) post_body <- list( token = token, content = 'record', format = 'csv', type = 'flat', rawOrLabel = raw_or_label, exportDataAccessGroups = export_data_access_groups_string, records = records_collapsed, fields = fields_collapsed ) result <- httr::POST( url = redcap_uri, body = post_body, config = config_options ) status_code <- result$status success <- (status_code==200L) raw_text <- httr::content(result, "text") elapsed_seconds <- as.numeric(difftime( Sys.time(), start_time, units="secs")) if( success ) { try ( ds <- read.csv(text=raw_text, stringsAsFactors=FALSE), #Convert the raw text to a dataset. silent = TRUE #Don't print the warning in the try block. Print it below, where it's under the control of the caller. ) outcome_message <- paste0(format(nrow(ds), big.mark=",", scientific=FALSE, trim=TRUE), " records and ", format(length(ds), big.mark=",", scientific=FALSE, trim=TRUE), " columns were read from REDCap in ", round(elapsed_seconds, 2), " seconds. The http status code was ", status_code, ".") raw_text <- "" } else { ds <- data.frame() #Return an empty data.frame #outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", geterrmessage()) outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", raw_text) } if( verbose ) message(outcome_message) return( list( data = ds, success = success, status_code = status_code, # status_message = status_message, outcome_message = outcome_message, records_collapsed = records_collapsed, fields_collapsed = fields_collapsed, elapsed_seconds = elapsed_seconds, raw_text = raw_text ) ) } ### Call the server result <- redcap(redcap_uri=uri, token=token) ### Pull out the dataset from the results ds <- result$data That’s a lot of code to copy for every project. Double this amount of code to batch.

30 Perks of REDCapR (part 1) 1.Batching: making smaller calls to server, and combining the results to appear as if only one call was made. – Avoids server-time outs. – Can suspend between calls, to avoid tying up server. 2.Translates: resolves differences between API and R. – eg, R stores IDs as a vector c(10, 20, 30), while the API needs a string "10,20,30" 3.Validates: proactively looks for common mistakes. – Helps catch errors sooner, – Better error messages b/c it’s closer to error’s source. 4.Subset: easier to avoid retrieving an entire dataset. – Fewer rows. – Fewer columns.

31 Perks of REDCapR (part 2) 1.SSL: provides extra transport security, by default. – Assumes responsibility for updating certificates. 2.Unit & Integration Tested: 100+ checks before release. – Corner cases are being added every month. 3.Wider Adoption: Library is used across multiple projects. – More assurances than evolving code that’s copy & pasted. – Builds on experience within and between libraries (eg, PyCap Python package and redcap R package).

32 Future Directions Attaching data labels to the variable names and values Extracting Calendar Events Cloning Projects

33 To contribute https://github.com/OuhscBbmc/REDCapR Contributors: William H. Beasley David E. Bard Thomas N. Wilson John J. Aponte Rollie Parrish Benjamin Nutter Andrew R. Peters

34 Thanks to Funders HRSA/ACF D89MC23154 OUHSC CCAN Independent Evaluation of the State of Oklahoma Competitive Maternal, Infant, and Early Childhood Home Visiting (MIECHV) Project. Evaluates MIECHV expansion and enhancement of Evidence-based Home Visitation programs in four Oklahoma counties.


Download ppt "Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics."

Similar presentations


Ads by Google