Entering Experimental Data – File Parsers

Entering Experimental Data – File Parsers
Core LIMS Training: Entering Experimental Data – File Parsers Welcome to Core LIMS Training. This chapter will discuss File Parsers: what they are and how they can be used within Platform for Science. This training assumes that you already have a basic understanding of how simple experiments are configured and experimental data is entered. If not, please review earlier lessons on entering experimental data before starting this topic. Please click on a slide whenever you are ready to move to the next page.

Topics Setting up File Parsers Setting up File Drops
Creating File Entity Types File attributes Setting up File Drops Creating File Jobs Scheduling Jobs Job Logs Associating File Parsers, Experiments, and Jobs Loading Data into an Experiment More specifically, the topics within this chapter will begin with a discussion on how to set up file parsers, including how to create File entity types and configure the appropriate file attributes. This lesson will continue with a look at how to set up file drops, create and schedule jobs, and then demonstrate how to view the job log files. We will look at how file parsers, experiments and jobs are linked together. Finally we will show how the end user can load raw data into an experiment after all the file parsers are configured. CONFIDENTIAL

File Parser Concepts If you have a new raw data instrument file that outputs data in a basic table, you can create a custom file parser for it The parser defines how to map the data from the file into the database The parser will be stored as a new file entity type under the File super type Each file parser will have a physical folder where files will be dropped (can be on server or in cloud) File jobs will be scheduled to periodically check the folders for new files and process them The file job will associate any files in the folder to the correct experiment and import the data Let's begin with a look at a few file parser concepts that are important to note. First, let’s start with what a file parser does within the Platform for Science. A file parser enables you to convert data from an external file into a format that can be loaded into the LIMS. When you put the data from a simple experiment into an Excel file in one of the formats that the system recognizes and upload the file, you are using the generic file parser that the system provides out of the box. However, you may have instrument files that output data in a slightly different format and want to create a custom file parser so that the raw instrument file can be read directly. Using raw data files minimizes any potential transcription errors by users. If your data instrument file is in a basic table format, you can usually create your own custom file parsers through simple configurations. More complex types of data files will need assistance from Core Informatics. If you create a custom file parser, this is simply a file entity type that allows you to define where in the database various parts of a raw data instrument file should be put. Custom file parsers are often used with automated data capture experiments that process dose-response curves, but you can also build your own custom file parsers to read raw instrument data for simple experiments. The parser will be stored as a new file entity type under the File super type. Every individual file that needs to be processed in a different way will require its own separate file entity type. These file entities are automatically created whenever files are uploaded to the LIMS. They will store meta data about the file as well as hyperlinks to stored copies of the files. Each file parser will have a physical folder where files will be saved. This folder can reside on either your server (if you have a private enterprise system hosted on site) or on the cloud. File Jobs can then be scheduled to routinely check these folders for new files. When new files are found, they will be processed. The File job will associate any files in the folder to the correct experiment, and import the data. CONFIDENTIAL

Example Custom Instrument File
Let’s say we had a new instrument file that output data in this format (as a tab delimited txt file) See the Automated Data Capture slide deck if you have files in plate-based format and are calculating dose-response curves Here is an example of a custom Instrument file that we will use for demonstration. In this example, we are not using any special file parsers used for automated data capture to process dose-response curves. We are just importing a simple, flat instrument file where the key data is in the table that starts on the 3rd line. The instrument file would actually be output in tab delimited or txt format. The file was imported into Excel here to make it easier to display on the slide. CONFIDENTIAL

Create a New File Entity Type
You will need a new File Entity Type for every file drop folder you want Tip: If the file attributes are the same as another file type use the Copy button Before you try to create a custom file parser for an instrument file of this type, you should first check if this custom file parser or a similar one already exists. Navigate to the Main Admin Panel, locate the FILE Super Type and click on the List All icon to return a list of all existing File Types. As a general rule of thumb, you will need a new file entity type for every different drop folder that you want users to drop files into. For example, you may be using the same instrument that uses the same file format for multiple experiment types, but you would probably want separate drop folders for each experiment type. If that is the case, you can save configuration time by using the COPY button on a similar file parser that uses the same attributes. If you don’t see an existing file parser that you can use or a similar file parser that you can copy, go to the buttons at the top of the page and select the Create New File Type button to start a new file parser configuration from scratch. . CONFIDENTIAL

Create File Entity Type Page
File types are usually named by instrument type (if you reuse for many experiment types) OR by experiment type You will NOT be able to edit the name later (only create a new one) – check for typos before submitting! Enter a barcode prefix and starting sequence Leave other settings as default You will now be brought to the Create New File Entity Type Page where we can create a new file entity type. The file type name field is first. Since this will be an entity type, the name field will not be editable later, so be sure to check for any typos before you submit the page. It is recommended to use all capital letters for consistency. Choose a name that users will easily recognize from a pull down menu list of all file entity types. File types are usually named to represent the instrument type if you will use this for multiple experiment types, or by experiment type. You will also need to enter a barcode prefix and starting sequence number. You can probably leave most other settings as default. When the form is complete, click on the Create New Entity Type button when you are ready to continue. This will bring you to the security details page. Set the appropriate Read/Write/Edit access for the access groups shown. Click on Update to save. CONFIDENTIAL

Example File Attributes
The file type attributes and their default values define how the file will be parsed This example maps 3 columns of data. Column headers are found in the 3rd line of the file and the data starts on the 4th line of the file. Any current data will be replaced by data from file. The file type has been created and can now be further configured. Next, we need to define how the file will be parsed. This is done within the Attributes section of the page. From the Function drop down menu select Attributes. The File Type attributes and their default values define how the file will be parsed. The main attribute that will need to be added to the newly created file type is the Header Tokens attribute. This will likely have already been created for other file types, so you should be able to simply reuse this attribute by selecting it. Remember to assign a Display and Call Sequence number before clicking on the Update Attribute button to save. The Header Token is where you can map the column headers in your file to the fields in the database. Usually the attributes that you are mapping into are stored on the assay entity type. Before we add the rest of the attributes, let's pause for a minute and look at some of the most commonly used File Parser attributes in greater depth. For our instrument file example, here are the final attributes that were configured for this custom file parser. The Header tokens mapped the 3 columns in the table using the appropriate format. The Force New Experiment Samples Boolean was set to replace any existing samples with the samples in the file if a file is uploaded to an experiment multiple times. The Header Line Number attribute indicates that the column header start on the 3rd line of the file. The Data Start Line Number indicates that the data values to be parsed start in the 4th line of the file. These attributes will allow the LIMS to ignore the 2 header rows in the file. CONFIDENTIAL

Common File Parser Attributes
Attribute Name Description Notes Header Tokens This is where you can map the column headers in your file to fields in Core LIMS (primarily assay attributes). See slide for details FilePath Legacy way to define the location of the drop folder For most efficient performance, use FilePath attribute on the File Job entity instead Parser Legacy way to define the java class For most efficient performance, use Parser attribute on the File Job entity instead More Header Tokens If you need >2000 characters to define your Header Tokens attribute, you can use this attribute for extra space. File Stop Token You can define a character in the file that will tell the parser to stop looking for data after that character is read (for example $). This can be useful to improve performance on long files with unnecessary data. For your reference, here are some common attributes that are added to file parsers. You are not required to add all of these to your file parser if you do not need them. FilePath - The FilePath attribute should only be used if you are not setting up a file system job to automatically process the files. It is a legacy way to define the name of the folder underneath the File Drop folder where the files will be input. It is more efficient and best practice to define this in the File Job entity instead. We will show this a little later in the lesson. Parser – Just like FilePath, Parser should also be used only if you are not setting up a file system job. This is the name of the class file where the code to parse the file is located. More Header Tokens – As we mentioned earlier, this attribute is used only if you need more than 2,000 characters to define your Header Tokens attribute. It provides more space for additional characters. File Stop Token - This attribute will allow you to define a character in the file to tell the parser to stop looking for data AFTER that character is read. For example a $ can be used. This is helpful as it can improve the performance on long files that may contain additional unnecessary data. CONFIDENTIAL

Common File Parser Attributes
Attribute Name Description Notes File Name Token This will add the file name (without a file extension) to any column name in the file that you want. This can be useful way to add container name to file contents Force New Experiment Samples This is a true/false Boolean used to indicate whether new experiment samples and intermediate data from a file should replace or add to existing samples and data already in an experiment. If attribute is missing, the default is false which adds to existing data Header Line Number This is the populated row line in the file which contains the column headers. Data Start Line Number This is the first populated row line in the file which contains the data. See online documentation for additional attributes File Name Token - This will add the file name (without the file extension) to any column name the file that you want. This is helpful if you want to add container names to file contents. Force New Experiment Samples - This is a true/false Boolean attribute, used to indicate whether new experiment samples and intermediate data from a file should replace or add to the existing samples and data that are already saved to an experiment. The default value for this is 'false' which means if it is not specified, it will add to the existing data. Header Line Number - This is the populated row in the file, which contains the column headers. Data Start Line Number - This is the first populated row in the file, which contains the actual data to import. Other attributes are available for configuration and can be found in the online documentation with further details. CONFIDENTIAL

Header Tokens This is the primary attribute used for custom parsers to map file data into LIMS fields Format is:<File Header Name> = <LIMS attribute name> Use commas to separate multiple mappings This attribute is limited to 2000 characters – use More Header Tokens if you need additional space If the file header name is exactly the same as the LIMS attribute name it should map automatically; if the file header name is exactly the same as the LIMS display name and the display name is unique it may also map automatically You can NOT use special characters (like '/') in the file header names that you are mapping to the LIMS Header Tokens - The Header Tokens attribute is the primary attribute used for custom parsers to map file data into LIMS fields. It uses the following format: <File Header Name> = <LIMS attribute name> where File Header Name is the column name in your table in the file. If there are multiple mappings to define, separate each with a comma. (picture) In our instrument file example, we needed to map database destinations for the Sample ID, Area, and Injection Volume columns. This attribute is limited to 2,000 characters. If you have a lot of mappings and need more characters, you can use the More Header Tokens attribute. If you keep the file header name exactly the same as the LIMS attribute name, it should map automatically, so this can be very helpful and save configuration time. If the file header name is exactly the same as the LIMS display name and the LIMS display name is unique, it should also map automatically. Additionally, special characters, like ('/') cannot be used in the file header names that you are mapping into the LIMS. CONFIDENTIAL

File Job Need to set up a physical folder for where files will be uploaded Create a File Job record to control the thread that looks into the folder and looks for new files to process Jobs are slightly different depending on whether on Core server or in the cloud (AWS S3) Next, you will need to configure a File Job to process the file. You should create a File Job record for every custom File Parser that you make. File jobs will check a specific folder for the existence of a raw data file. If you have an on-premise private enterprise system, then the drop folder may exist on your Core server. If not, you can configure a secure folder in the cloud using Amazon S3. If a raw data file is found in the folder, it will parse the contents, process the data and attach the file to the appropriate experiment. To set up the job you will first, need to set up a physical folder where the files will be uploaded to. Then, you will need to create a File job record to control the thread that looks into the folder to look for new files to process. Jobs are slightly different depending on whether they are on the CORE server or are on the Cloud. Let's look at examples of each. CONFIDENTIAL

File Jobs Using Core Server
Admin needs to manually create the main directory on the server that is listed in this attribute on the LIMS record File type must be SDMS File System Job Implementation class must be com.coreinformatics.core.job.impl.sdms.FileDaemonJob Cron String ***** = every minute LIMS will fill out Task ID/Instance ID LIMS will automatically create the folder you put in the file path Choose the File Entity Type that you just made Here is an example of the less common Job using the Core Server. First, navigate to the System Configuration page to check what the Drop Directory Base attribute is set to. Check that the root folder for file drops that is shown in that file path actually exists on your server. If it does not, you must manually create the root file drop folder. Jobs on the Core server must be SDMS File System Jobs. Let’s navigate to an existing job and take a look at an example of these types of job records. Locate the Application Menu in the upper left corner of the page and expand it and click on the LIMS Administration link. Only Administrators will have access to configure jobs, so if you are not an administrator, you will not be able to access these. From the LIMS Administration Page, locate and expand the Jobs Menu Item. Locate the SDMS File System Job Menu option and expand. Click on LIST ALL to get a list of all of the existing jobs currently in the LIMS. Let’s take a look at an existing Job to see how it is defined. For Jobs on the CORE Server, the job type must be SDMS File System Job files, which we selected on our navigation here. The Implementation Class must be set to: com.coreinformatics.core.job.impl.sdms.FileDaemonJob. You can define how frequently the job should run with the chron string. We will look at that in more detail in a minute. The most important attribute is the file path. Start the value with $(dropDirectoryBase) token so the LIMS will look at the Drop Directory Base value you defined on the System Configuration page to find the full file path for the root file drop folder. After that add a backslash and list the specific subfolder name that you want users to drop files into. The LIMS should automatically create that folder for you if it does not already exist once the job is running. You will also need to enter a file parser implementation class also. It is best practice to fill out the file path and file parser attributes, here on the job entity and NOT on the File Entity type, as defining them at the File Entity Type will create extra duplicate processes. The File Entity Type attribute is where you define the custom file entity type you just made earlier to be used as the file parser. It should be spelled exactly as it is spelled in the Main Admin Panel. Other attributes such as Task ID and Instance ID should be automatically populated by the system. If you return to this record later after the job has run, you can also see the job log displayed. CONFIDENTIAL

File Jobs Using Amazon S3
Core Informatics will provide you with AWS S3 Bucket, AWS S3 Folder, AWS S3 Access Key and AWS S3 Secret Key to access your bucket Admin to create folder listed in File Path – note the slash at the end! (avoid capitals and spaces) Users will need access details to connect ftp tools to upload to folder File Type must be SDMS AWS S3 Job Implementation class must be com.coreinformatics.core.job.impl.aws.S3FileDaemonJob Now let’s compare that job on a server to a file job that is on the cloud using Amazon S3. The Job Type here must be a SDMS AWS S3 Job and the Implementation class must be set to: com.coreinformatics.core.job.impl.aws.S3FileDaemonJob. Note that the implementation classes are different for each job type so be sure that you are using the correct one. If you are in a shared multi-tenant environment, CORE will provide you with the information you need to access your AWS S3 Bucket, along with an AWS S3 Folder, and an AWS S3 secret key. An Administrator should log into Amazon S3 externally (using a ftp tool) and create the drop folder. Note that this drop folder should be given a name that users will be able to easily recognize if they have multiple drop folders. Avoid using capital letters and spaces. Once the drop folder is created, enter that folder name in the File Path attribute. The path must end with a forward '/' slash. Like the other job type, you will also need to enter an implementation class for the file parser, the file entity type that you created earlier, and frequency in the Cron String. The system will automatically populate Instance and Task ID and the job log. For AWS S3 Jobs, you do not need to worry about what the value of the DropDirectoryBase attribute is on the System Configuration page. Once the job is configured and working, you will need to provide users with access details to connect any ftp tools that may b CONFIDENTIAL

Common Implementation Classes
Entity Type Implementation Class Notes Job com.coreinformatics.core.job.impl.sdms.FileDaemonJob Use for jobs on Core Server com.coreinformatics.core.job.impl.aws.S3FileDaemonJob Use for jobs on AWS S3 File Parser com.coreinformatics.core.manager.file.ExperimentFileParserImpl Use for data in table format com.coreinformatics.analysis.manager.file.GenericPlateReaderParserImpl Use for data in spatial array (plate) For your reference, this slide shows the most commonly used Implementation Classes. The first implementation class is used for Jobs on the Core Server. The second implementation class is used for jobs on the AWS S3 server. The third Implementation class is for a File Parser which is parsing data in table format, and the last implementation class is used for a File Parser which parses data in a spatial array such as a plate. CONFIDENTIAL

Cron Strings Modeled after Unix cron string syntax
Typically a 5 part string: Minutes (0-59) Hours (0-23) Days of month (1-31) Month (1-12) Days of week (0-6) Star wildcard = ‘every’ Examples ***** = job starts every minute 5****=job starts every hour on the 5th minute (0:05, 1:05, 2:05, etc.) *12**1= job starts on the 12th hour every Monday Let’s take a minute to see how the job frequency is defined. The Cron String attribute on either job type represents the number of minutes it will wait to check for new files. This is modeled after the Unix cron string syntax. It typically consists of a 5 part string defining minutes from 0 to 59, hours from 0 to 23, days of the month from 1-31, the month from 1-12 and days of the week from 0-6. The wild character of '*' can be used to represent 'every'. Let's look at some examples: ***** = the job starts every minute; this is the most commonly used value for many jobs 5****= the job starts every hour on the 5th minute (0:05,1:05, 2:05, etc). *12**1=the job starts on the 12th hour every Monday The cron string attribute allows you to stagger jobs to run at different times or less frequently when they are not used often in order to optimize overall system performance. CONFIDENTIAL

Job Scheduler Once the job has been created, the next step is to navigate to the Job Scheduler and make sure the job is turned on and running. All jobs and their logs, whether they are file jobs or project jobs used for project management, can be found on the same Job Scheduler page. Expand the Jobs Menu and from any Job entity type, click on the Job Scheduler Link. Jobs that have been scheduled will be located towards the top of the page, and those that have not yet been scheduled are located in red at the bottom of the page. If your job is not enabled, simply click on the Schedule Job button to turn it on. Once running you will see the most recent times the job has done a check for existing raw data files. You can refresh the browser page to see the updated log. Once Job is configured, check the Job Scheduler to make sure it is turned on and running CONFIDENTIAL

File Parser to File Job Link
The association between file parsers and the jobs that sweep for files is defined on the file job Note: it is best practice to fill out the file path and file parser attributes on the job entity and NOT the file entity type (otherwise you will create extra duplicate processes) So now that we know how to create file parsers and jobs, how do they work together? When we learned about assays, experiments, and protocols, we saw that associations to assays and protocols were defined on the experiment type. File parsers and jobs and experiments are not linked by these types of associations – so how are they linked together? The File Job and File Parser are linked through the File Entity Type attribute on the Job record. An example is shown on this slide. CONFIDENTIAL

File Parser to Experiment Link
This is not configured in the main admin panel If user uploading a single file to an experiment, they can select the file parser during file upload If user dropping table-based files into the parser folder, the files must be named by the experiment barcode so the LIMS can automatically link the file to the correct experiment If user dropping plate-based files into the parser folder, the files must be named by the container barcode so the LIMS can automatically link the file to the correct experiment (a container may only be associated to 1 active experiment) Let's continue on now, and look at how the File Parser links with Experiments. The link between the File Parser and the Experiment is also NOT something that is configured in the main admin panel. This link will be established by the user when they are submitting the files. If a user is uploading a single raw data file to an experiment, they can select the file parser during the upload (show pic). This will allow the single file to be parsed as defined in the file parser. If a user is dropping table-based files directly into the parser folder to be processed, the files must be named by the experiment barcode so the LIMS can automatically link the file to the correct experiment. Dropping files into the cloud folder will require an ftp tool to assist with the upload. (pic) If a user is dropping plate-based files into the parser folder, the files must be named by the container barcode so the LIMS can automatically link the file to the correct experiment. Keep in mind that a container may only be associated to one active experiment. CONFIDENTIAL

Dropping Files into Cloud Folders
Copy your file from your local machine to the drop folder with a ftp tool After the file is in the drop folder on the right, refresh the view; the file will disappear from the drop folder after it is processed Now that everything is configured, how does this look to a user? First, create the experiment. Once you know the barcode, rename the raw data file to the experiment barcode. Then log into the ftp tool which is pointing at the root file drop folder. Your ftp tool might look different, but it should work similarly. Navigate to the appropriate sub folder that was configured for this file parser. Drag and drop the file into the folder. Refresh the view. Eventually you will see the file disappear after it is processed. Navigate to the Experiment record in the LIMS and you should see the instrument file is attached. Click on the Validate page to see the data parsed and loaded. Making any changes you need and publish the experiment. CONFIDENTIAL

X Uploading Via Files Icon Click on the Files icon
Select File Type and Submit Browse to file and Upload File name still needs to match experiment barcode! If you do not have access to the drop folder you can alternatively upload files one at a time through the Files Icon in the main tool bar. Do NOT use the File Upload button – that is for simple spreadsheets. You will need to select the file type that is configured to parse the data correctly and then browse to the file just as you would for uploading any type of file. Click on the Upload button when you have found the correct file. CONFIDENTIAL

After File is Uploaded The raw data file that was uploaded will automatically be attached Click on Validate to view the imported data You can tell that the LIMS is done parsing the file when you see colored number bubble next to the Files icon on your experiment record. If your file job is configured to run every minute then you should probably see this number bubble within a minute or two. You may have to refresh the page in your browser to view the updated page. The number indicates the number of attached files. You can click on the Files icon to view the raw data file directly. You should also see a new Intermediate Data button be displayed as soon as the Intermediate values are calculated. Your data is now ready for review. If the file does not attach or you are missing your Intermediate Data button, have your administrator check the logs to see what configuration might not be working correctly. CONFIDENTIAL

Review Setting up File Parsers Setting up File Drops
Creating File Entity Types File attributes Setting up File Drops Creating File Jobs Scheduling Jobs Job Logs Associating File Parsers, Experiments, and Jobs Loading Data into an Experiment You should now have a good understanding about how to set up custom file parsers, how to create file entity types, how to configure the file attributes on the file parsers, along with how to setup and create file drops, and how to schedule and view the Job Logs. You should also have a good understanding of the associations between file parsers, experiments and jobs, and be able to load data into an experiment. If there are any concepts that you wish to review at this time, please go back through the lesson pertaining to that topic. For additional information, please be sure to reference the online documentation or other tutorials for further information. Thank you for investing your time in CORE LIMS Training. CONFIDENTIAL

Entering Experimental Data – File Parsers

Similar presentations

Presentation on theme: "Entering Experimental Data – File Parsers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Entering Experimental Data – File Parsers

Similar presentations

Presentation on theme: "Entering Experimental Data – File Parsers"— Presentation transcript:

Similar presentations

About project

Feedback