Presentation on theme: "Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions."— Presentation transcript:
Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions
Agenda Typical work flow when sponsors create the SDTM / ADaM in-house and collaborate with vendors for the Define files Define.xml Sections Define.xml Process - How do we go about extracting the information from the data & documents provided..? Validating Define.xml & the typical Checks Common Issues Conclusion – How soon should the sponsor start..?
Typical Work flow collaborating with a Vendor for creating Define files Run the compliance / structure checks on the data Generate draft Define.xml & run the compliance checks Summarize the Issues/findings and deliver the draft define for review Sponsor reviews the findings and update the specification / dataset / annotation Send the updated Annotations/Specifica tion / XPTs back to the vendor for a final delivery (Pass II) Runs the compliance checks, re-generate the final version of Define (Pass II) Sponsor provides the documents & Draft Data
Inputs that are provided.. Annotated Case Report Form Mapping Specification documents SAS Datasets / XPTs Sponsor Controlled Terminology Documents, if applicable Protocol, if Trial Design Domain to be produced Data Guide / Supplemental Document
Define.XML Section TOC – Metadata of Datasets blankcrf (Annotated ) Variable Level Metadata Value Level Metadata Controlled Terminology Computational Algorithms Supplemental Data Definition Document
Define.XML Section (Not visible through the Style Sheet) Xmlns - Identifies the default namespace for this document ODMVersion - Identifies the ODM version that underlies the schema for the Define- XML FileOID - unique identifier for this file. CreationDateTime - When the specific version of the define.xml file was created. StudyName, StudyDescription, ProtocolName – Study level Information
Define.XML Components and how do we generate them… MetaData Generation – DOMAIN Level VARIABLE Level VALUE Level ORIGIN, CODELIST, Comments and Computational Algorithm blankcrf, Data Guide / Supplemental Docs Generate Define.xml Validate Define files
Input Sheet for Define.XML Generation DOMAIN Level Input – SAS based macro utility will create the Input s for this sheet based on the Datasets provided VARIABLE METADATA – By reading through the metadata of the SAS datasets provided, variable Level metadata input sheet is populated.
Input Sheet for Define.XML Generation ORIGIN information will be extracted based on the Annotations & Mapping Specification provided. Based on the variables for which CODELIST, COMPUTATION ALGORITHM and VALUELIST need to be populated, OID will be assigned here. Based on the OIDs assigned in the VARIABLE LEVEL sheet, VALUE LEVEL input sheet and CODELIST input sheet will be generated by reading the data and the associated codelist files.
Input Sheet for Define.XML Generation Value Level Input Codelist / Computation Methods Input
External Documents – blankcrf & Data Guide Annotated Case Report Form and Supplemental Documents like Data Guide will be linked to the define.xml ORIGIN Page number presented as part of the variable level metadata must be hyperlinked to the corresponding CRF pages attached to the Define file.
Input Sheet for Define.XML Generation Once the Domain Level, Variable level, Value Level, Codelist sheets are created, external documents linked and the ORIGIN, COMPUTATIONAL ALGORITHM & External Dictionary information updated and inputs reviewed, DEFINE.XML can be generated
Validation Checks Structural Checks: Type of Checks on the Metadata 1.Domain Label mismatch 2.Variable Label mismatch 3.Data type mismatch 4.Missing Expected & Required variables 5.Required / Expected Variables with NULL values for all records 6. Non-standard SDTM variables 7. Variable Names in lower case 8. Variable Order mismatch 9. Variables with Formats 10. Permissible variables present with NULL Values for all records
Validate Define.XML A valid Define.xml should be well formed & conform to the XML schemas. Should reference correct versions of CDISC standards. Sample Validation Checks 1.XML is well formed 2.All Required Elements are included and / not empty 3.OID attribute element must be unique within a single Metadata version – No duplicates def:leaf element, def:ComputationMethod, def:valueListDef, 4.No Duplicates in ItemGroupDef, ItemDef, ItemRef, Study, CodeList element etc. 6.Invalid Data type value for CODELIST elements 7.CodeValue must be unique within a single CodeList 8.Invalid Codelist for variable, non- extensible CT 8. Invalid Data type value for ItemDef elements 9. Invalid ‘Filetype’, ‘MedDRA’ values 10. Invalid ‘Repeating’, ‘Mandatory’ values
Common Issues Origin is ‘CRF’, but not annotated. ORIGIN ‘Derived’ but annotated in the CRF. Key variables not properly defined. While presenting Custom domains, Domain assumption should be followed. Sometimes custom domains derived without a TOPIC variable. Subjects collected as part of external data LB/EG, but not populated in DM domain. All Subjects must be present in DM domain. One-to-one relationship missing across some of the paired variables like TEST / TESTCD, PARAM / PARMCD, VISIT / VISITNUM, AVISIT /AVISITN, TPT / TPTNUM TPT & TPTREF Common variables across different domains having different ORIGIN derivation. If it’s the same across, can go with “Copied from ADSL.XX”
Common Issues (contd) Generally, XPTs up to 1 GB size is fine. If the XPT file size exceeds 1GB, it must be split to smaller datasets not exceeding 1 GB. Study Data SpecificationsStudy Data Specifications Split files should have the same metadata structure so that concatenation / merging of the split datasets should be feasible. Both smaller split files & larger (non-split) file should be included. Split datasets and the method applied should be documented in the data guide If not following linear approach, need to make sure consistency between ADaM/SDTM sources.
Common Issues (Contd) ADaM when derived in a Parallel Stream might require extra efforts for ensuring traceability & Data Lineage.
Conclusion Finalize the scope of the work being outsourced / to be performed by the vendor. Explain the process being followed and agree to a common form for exchange of documets that could expedite the Define files generation. While working across a family of similar studies within the same indication, after a couple of iterations/studies, should look for achieving better efficiency. Identify the Vendor(s) at least three months before you expect the first Define.XML to be published. If possible, do a pilot or DEMO define.