Interacting with the REDCap API using the REDCapR Package

Interacting with the REDCap API using the REDCapR Package
Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics Dept, Biomedical & Behavioral Methodology Core (BBMC) REDCap Con Sept 23, 2014 Good morning. My name is Thomas Wilson. Today I am going to be discussing the use of an R package, REDCapR to interact with REDCap’s API. In the previous presentation, Will discussed an architecture design for literate programming patterns and practices. At OUHSC, we utilize this architecture structure using multiple components. Among those components are REDCap, the R programming language, and the REDCapR package. Although, this structure is what we use, you can easily see that the components can be employed as stand alone pieces. However, I think you will be able to see the advantage of combining these pieces to your research practices.

Accessing REDCap Data Manual Import & Export (eg, through CSV files)
Require human interaction every time. Dynamic Data Pull (pull data from an external system) REDCap’s API (application programming interface) The API allows nonhumans to interact with each other directly (i.e. R, SAS, python, etc.). REDCapR call REDCap’s API (an R library) Provides functions that wrap around calls to API. Write 1 line of R code instead of ~40 lines. This crowd is very familiar with the different methods of accessing your data in REDCap. In the early stages of our REDCap use, we used the data import and export tools for our data needs. We then graduated to utilizing REDCap’s API. As a result of the data needs for our inaugural REDCap project, we developed the REDCapR package for R

Python Interaction Packages
PyCap "PyCap is an interface to the REDCap Application Programming Interface (API). PyCap is designed to be a minimal interface exposing all required and optional API parameters." (sburns.org/PyCap) django-redcap "Utilities for porting REDCap projects to and from Django models." (github.com/cbmi/django-redcap) While our obvious preference for REDCap interaction packages is REDCapR, it is not unique in the sense of it being the only REDCap API interaction package. PyCap and django-redcap are a couple of packages that use python to interact with REDCap

R Interaction Packages
redcapAPI This is the most active fork of Jeffrey Horner’s ‘redcap’ package, now developed by Benjamin Nutter. A complete list of forks can be found through GitHub. (github.com/nutterb/redcapAPI) REDCapR a similar package that also streamlines API calls from R to REDCap (github.com/Ouhsc/REDCapR) Using the R language to interact with REDCap, two packages that are available are the REDCapAPI package, which has expanded upon Jeffrey Horner’s redcap R package and our REDCapR package.

Required pieces of information for API
The URL of the REDCap server. A “token”, which is a hash that combines: The specific REDCap project (within the REDCap server). The specific user. The user’s password. For those not familiar with the use of REDCap’s API, there are 2 required pieces of information when making an API call: the URL of your institution’s REDCap server and a token. Tokens are project and user specific and they are created within REDCap. Multiple users on the same project will require multiple tokens.

Security Could spend 4 hours discussing security details.
Consult REDCap IT staff and/or our team. Use a private GitHub repository. (free for academics) Be careful with REDCap tokens. (ie, passwords) Get PHI into REDCap & SQL as early as possible. We regularly receive CSVs & XLSXs from partners. DB files aren’t accidentally copied or ed. And try to store derivative datasets in REDCap & SQL instead of on the file server. Our security practices include a layered approach. We store our URI and tokens remotely in an SQL database. When making a call to REDCap’s API using R, we make a remote ODBC call to this SQL database to retrieve our URI and token. This makes it possible for us to keep the URI and token from being a visible part of our R API code.

About R R is a free software environment for statistical computing and graphics. R compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. For more information: As you may have surmised by now, we use R to interact with REDCap via API. If you’re not familiar with R, a multitude of online tutorials/resources are available to get you started enjoying the wonderful world of R computing. We utilize R Studio and Eclipse as an interface for R. R Studio and Eclipse make R much more enjoyable to use.

REDCapR What is this REDCapR you speak of?
REDCapR is an R package developed to streamline API calls from R to REDCap by encapsulating various functions. So, what exactly is REDCapR? Simply put, it is an R package that was developed to make life easier for us when making API calls from R to REDCap.

REDCapR History “Necessity is the mother of invention” -English Proverb REDCapR was born out of the necessity of breaking one large data call from REDCap, which is prone to timing out, into multiple small calls to REDCap. From the user perspective, the data call has the look of one call. From REDCap’s perspective, the data call is multiple smaller calls that are later assembled. Created to help with the Maternal Infant and Early Childhood Home Visiting (MIECHV) evaluation. On our campus, we were the first to begin using REDCap. It was specifically chosen to be the data collection tool for our Maternal Infant and Early Childhood Home Visiting (Mc V) evaluation. The MIECHV evaluation includes 6 benchmarks with multiple constructs for each benchmark, 5 aims with multiple subaims, and a continuous quality improvement component. As we progressed in this project, the amount of replication and volume of R code needed for our reporting requirements was becoming cumbersome. As a result of this, the development of REDCapR became a priority to encapsulate much of what we were doing.

Motivation for REDCapR
Our current MIECHV investigation uses two REDCap projects: Recruiting: 84,000 records and 204 fields (17 million EAV rows) Community Survey: 1,500 records and 2,330 fields (3.5 million EAV rows) We were constantly timing out the operations, and tying up the server. Timeouts make things unpredictable and unnecessarily DOS our own people repeatedly. To give a little more perspective about our REDCap use on the MIECHV project, we employ two separate REDCap projects. The first is a recruiting project. In this project, we receive quarterly data files of recruiting records from two state agencies. These files are imported into REDCap for use by our data collectors. Currently, we have more than 84,000 records and 204 fields in this project. Individuals that agree to participate complete a community survey, which is our second REDCap project. To date, we have more than 1500 records with 2300 fields in our community survey. With that much data, whenever we would perform an API operation we encountered multiple difficulties: We often had our import/export operations time out and more importantly, we were often causing our data collectors a “denial of service” while conducting interviews We have 7 data collectors operating in 4 counties across the state. They conduct remote interviews 6 and sometimes 7 days a week with interview schedules ranging from 8:00 am to 8:00 pm. Our data collectors conduct interviews in participant’s homes with the interview times ranging from minutes. Data collection being vital to our research, we needed to ensure that our reporting did NOT disrupt the data collection process.

REDCapR Installation REDCapR Installation
### Read short intro at # ### Choice 1: Either install the stable version from CRAN install.packages("REDCapR") ### Choice 2: Or install the development version from GitHub install.packages("devtools") devtools::install_github(repo="OuhscBbmc/REDCapR") ### Load the 'REDCapR' package into R's memory # so the functions are more easily accessible. library(REDCapR) To state the obvious, before you can use REDCapR you first need to install REDCapR. A short intro is available on our github site. Installation of REDCapR involves three steps. First, you will need to install the devtools package. In addition to being a useful package, devtools is necessary for the installation of REDCapR. Second, you will install the REDCapR package using devtools….and last, but not least, you will want to load REDCapR into R’s memory. Just a note of clarification. The code above is R code only. It is NOT something that is embedded into another language or environment.

REDCapR Functions create_batch_glossary redcap_column_sanitize redcap_download_file_oneshot redcap_project redcap_read redcap_read_oneshot redcap_upload_file_oneshot redcap_write redcap_write_oneshot retrieve_token validate_for_write This is the exhaustive list of the functions that REDCapR has available to you. While, I know everybody would love to go into the intricate details of each function, we don’t have time for that. For this presentation, I will discuss redcap_read (data export) in detail and then briefly parallel redcap_write (data import)

REDCapR Data Extraction
Data extraction: redcap_read_oneshot Read/export records from a REDCap project. redcap_read Read/export records from a REDCap project in subsets, and stacks them together before returning a data.frame. REDCapR offers two functions for data extraction. The first is redcap_read_oneshot. This function reads records from a project. The second is redcap_read. Redcap_read pulls records from a REDCap project in subsets. It then stacks them together and returns a data.frame. In R, a data.frame is analogous to a spreadsheet.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) Several arguments of the redcap_read function will be discussed, however it should be noted that not all arguments are required. This function can be used with a statement as simple as: redcap_read(redcap_uri, token) Redcap_read has several arguments. I won’t be discussing every argument, but rather I will hit some of the what we believe are the highlights. Even though redcap_read has multiple arguments, only two arguments are required: redcap_uri and token.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) batch_size: The maximum number of subject records a single batch should contain. The default is 100. The batch_size argument tells REDCapR how many records at a time to extract from a project. These batches are then assembled into one data.frame after all the records have been extracted.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) interbatch_delay: The number of seconds the function will wait before requesting a new subset from REDCap. The default is 0.5 seconds Interbatch_delay is the number of seconds REDCapR will wait before requesting a new subset from REDCap.

redcap_read Usage redcap_uri: The URI of the REDCap project. Required.
### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) redcap_uri: The URI of the REDCap project. Required. Note: In computing, a uniform resource identifier (URI) is a string of characters used to identify a name of a web resource. Such identification enables interaction with representations of the web resource over a network (typically the World Wide Web) using specific protocols. The redcap_uri argument refers to your institutions specific REDCap URL. This is a required argument.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) token: The user-specific string that serves as the password for a project. Required. Token is the REDCap generated, project specific, user specific string. This is a required argument.

REDCapR Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) records: An array, where each element corresponds to the ID of a desired record. Optional. You can pull specific records from a project using the records argument. REDCapR can pull one specific study id or a list of study ids.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) fields: An array, where each element corresponds to a desired project field. Optional. Similar to the records argument, REDCapR can also pull a specific field or fields.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) export_data_access_groups: A boolean value that specifies whether or not to export the “redcap_data_access_group” field when data access groups are utilized in the project. Default is FALSE. Self-explanatory.

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) raw_or_label: A string (either ‘raw’ or ‘label’) that specifies whether to export the raw coded values or the labels for the options of multiple choice fields. Default is ‘raw’. Again, self explanatory

redcap_read Usage ### Sample Code redcap_read(batch_size = 100L, interbatch_delay = 0.5, redcap_uri, token, records = NULL, records_collapsed = NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw", verbose = TRUE, cert_location = NULL) cert_location: If present, this string should point to the location of cert files required for SSL verification. If the value is missing or NULL, the server’s identity will be verified using a recent CA bundle from the cURL website. Optional. The cert_location argument, if present, will point to the location of cert files that are required for SSL verification.

redcap_read Details Specifically, redcap_read internally uses multiple calls to redcap_read_oneshot to select and return data. Initially, only primary key is queried through the REDCap API. The long list is then subset into partitions, whose sizes are determined by the batch_size parameter. REDCap is then queried for all variables of the subset’s subjects. This is repeated for each subset, before returning a unified data.frame. The function allows a delay between calls, which allows the server to attend to other users’ requests. Behind the scenes, redcap_read is making multiple calls to redcap_read_oneshot. It will initially query the primary key (study id) and create a list that is subset into partitions. Your partitions are determined by the batch_size argument. Once this is done, REDCap is queried for all variables using the subjects that are included in the subset. REDCapR then does a technological version of Lather, rinse, repeat for each subset until the entire dataset has been exported.

REDCapR Data Import Data import: redcap_write_oneshot: writes data to REDCap all at once redcap_write: writes data to REDCap in subsets Similar to data the data extraction functions, REDCapR offers two functions for data import: redcap_write_oneshot and redcap_write.

redcap_write Usage ### Sample Code redcap_write(ds_to_write, batch_size = 10L, interbatch_delay = 0.5, redcap_uri, token, verbose = TRUE) This function contains many similar arguments to redcap_read. The new argument, ds_to_write, is the R data.frame that is going to be imported into a REDCap project. Redcap_write is very similar to redcap_read. It does include a new argument, ds_to_write. Ds_to_write is the R data.frame that you want to import into a REDCap project.

Exporting records (less secure)
### Declare the address of the server and # your token (ie, hash of project_id, username, password) uri <- " token <- "9A C4E5F03428B8AC3AA7B" ### Call the server result_read <- redcap_read(redcap_uri=uri, token=token) ### Extract the dataset from the results ds <- result_read$data ds This is a very basic example of a REDCap API data export using REDCapR. Please note: this example is not using the layers of security that we utilize in our daily practices. The uri and token variables are pulled from an SQL database and are not a visible part of our R coding. None of the examples presented today utilize our standard security procedures. We wanted the focus to be on REDCapR rather than security procedures. Also, nobody in the audience has received our official double-secret security clearance. The first steps in this API data export are creating your URI and Token variables. These variables are passed into the redcap_read function below. Redcap_read calls your server. After that, you extract your dataset and then you can view what is in your data.frame. record_id first_name age Nutmeg 10 Tumtum 11 Marcus 79 Trudy 61 John Lee 58

Comparison against Minimal
### Call the server result <- redcap_read(redcap_uri=uri, token=token) ### Pull out the dataset from the results ds <- result$data ### Call the server rawCsvText <- RCurl::postForm( uri = uri, token = token, content ='record', format = 'csv', type = 'flat', .opts = curlOptions(ssl.verifypeer=FALSE) ) ### Convert raw text into a data.frame ds <- read.csv(text=rawCsvText, stringsAsFactors=FALSE) Here, we compare two methods for a basic API data pull. On the left, we are using REDCapR. On the right, is the standard R API extraction code.

Comparison without batching
### Call the server result <- redcap(redcap_uri=uri, token=token) ### Pull out the dataset from the results ds <- result$data redcap_read_oneshot <- function( redcap_uri, token, records=NULL, records_collapsed="", fields=NULL, fields_collapsed="", export_data_access_groups=FALSE, raw_or_label='raw', verbose=TRUE, cert_location=NULL ) { start_time <- Sys.time() if( missing(redcap_uri) ) stop("The required parameter `redcap_uri` was missing from the call to `redcap_read_oneshot()`.") if( missing(token) ) stop("The required parameter `token` was missing from the call to `redcap_read_oneshot()`.") if( nchar(records_collapsed)==0 ) records_collapsed <- ifelse(is.null(records), "", paste0(records, collapse=",")) #This is an empty string if `records` is NULL. if( nchar(fields_collapsed)==0 ) fields_collapsed <- ifelse(is.null(fields), "", paste0(fields, collapse=",")) #This is an empty string if `fields` is NULL. export_data_access_groups_string <- ifelse(export_data_access_groups, "true", "false") if( missing( cert_location ) | is.null(cert_location) | (length(cert_location)==0)) cert_location <- system.file("cacert.pem", package="httr") if( !base::file.exists(cert_location) ) stop(paste0("The file specified by `cert_location`, (", cert_location, ") could not be found.")) config_options <- list(cainfo=cert_location, sslversion=3) post_body <- list( token = token, content = 'record', format = 'csv', type = 'flat', rawOrLabel = raw_or_label, exportDataAccessGroups = export_data_access_groups_string, records = records_collapsed, fields = fields_collapsed ) result <- httr::POST( url = redcap_uri, body = post_body, config = config_options status_code <- result$status success <- (status_code==200L) raw_text <- httr::content(result, "text") elapsed_seconds <- as.numeric(difftime( Sys.time(), start_time, units="secs")) if( success ) { try ( ds <- read.csv(text=raw_text, stringsAsFactors=FALSE), #Convert the raw text to a dataset. silent = TRUE #Don't print the warning in the try block. Print it below, where it's under the control of the caller. outcome_message <- paste0(format(nrow(ds), big.mark=",", scientific=FALSE, trim=TRUE), " records and ", format(length(ds), big.mark=",", scientific=FALSE, trim=TRUE), " columns were read from REDCap in ", round(elapsed_seconds, 2), " seconds. The http status code was ", status_code, ".") raw_text <- "" } else { ds <- data.frame() #Return an empty data.frame #outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", geterrmessage()) outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", raw_text) if( verbose ) message(outcome_message) return( list( data = ds, success = success, status_code = status_code, # status_message = status_message, outcome_message = outcome_message, records_collapsed = records_collapsed, fields_collapsed = fields_collapsed, elapsed_seconds = elapsed_seconds, raw_text = raw_text ) ) Here we present a not so basic API data pull. The code on the right shows what is going on behind the curtain for the code on the left. Here, take a moment and let that sink in. Moment over. This code does NOT include a batching component. Batching would roughly double the amount of code. That’s a lot of code to copy for every project. Double this amount of code to batch.

Perks of REDCapR (part 1)
Batching: making smaller calls to server, and combining the results to appear as if only one call was made. Avoids server-time outs. Can suspend between calls, to avoid tying up server. Translates: resolves differences between API and R. eg, R stores IDs as a vector c(10, 20, 30), while the API needs a string "10,20,30" Validates: proactively looks for common mistakes. Helps catch errors sooner, Better error messages b/c it’s closer to error’s source. Subset: easier to avoid retrieving an entire dataset. Fewer rows. Fewer columns. Last week, I did a quick search of the google groups on export timeout issues. Since July, there have been approx 15 issues that were reported. Similarly, I looked for import timeout issues. In the same time frame there were approx 8 issues reported. Batching avoids the server-timeout calls. REDCapR helps with validations by being proactive. It returns better error messages being closer to the error’s source. Examples include: uploading a nonexistent file or downloading a file that would overwrite an existing file and when uploading files, it checks to make sure your file exists on your computer rather than uploading an empty file. The subsetting component allows you to filter on rows and columns. This helps make your data pulls more efficient. For example, I pulled data from a REDCap project with 7100 records. The first comparison pulled all records using a batch_size of It took R 839 seconds to complete this task. I then pulled all records, but only 4 of the variables. Same batch size. This took 94 seconds to complete. The 745 second time savings could be spent updating your facebook status or looking at the latest gluten-free recipes on pinterest.

Perks of REDCapR (part 2)
SSL: provides extra transport security, by default. Assumes responsibility for updating certificates. Unit & Integration Tested: 100+ checks before release. Corner cases are being added every month. Wider Adoption: Library is used across multiple projects. More assurances than evolving code that’s copy & pasted. Builds on experience within and between libraries (eg, PyCap Python package and redcap R package). Some additional perks of REDCapR include the SSL security and testing. REDCapR has had in excess of 100 checks before release. These checks include using our test environment to remove all data from a REDCap project, and then writing that data back into the project. Afterwards, the data is exported form the project and compared to ensure that what was in the project matched what we expected to see in that project. You can also use the library across multiple projects. This is an improvement over copy, paste, adapt when you have a new project.

Future Directions Attaching data labels to the variable names and values Extracting Calendar Events Cloning Projects These are a few of the future directions we have in mind for REDCapR

https://github.com/OuhscBbmc/REDCapR
To contribute Contributors: William H. Beasley David E. Bard Thomas N. Wilson John J. Aponte Rollie Parrish Benjamin Nutter Andrew R. Peters We welcome your input and collaboration. If you have ideas/suggestions for the future of REDCapR and would like to contribute to the project check us out on github!

Thanks to Funders HRSA/ACF D89MC23154 OUHSC CCAN Independent Evaluation of the State of Oklahoma Competitive Maternal, Infant, and Early Childhood Home Visiting (MIECHV) Project. Evaluates MIECHV expansion and enhancement of Evidence-based Home Visitation programs in four Oklahoma counties.

Interacting with the REDCap API using the REDCapR Package

Similar presentations

Presentation on theme: "Interacting with the REDCap API using the REDCapR Package"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interacting with the REDCap API using the REDCapR Package

Similar presentations

Presentation on theme: "Interacting with the REDCap API using the REDCapR Package"— Presentation transcript:

Similar presentations

About project

Feedback