Presentation on theme: "1 Metadata-driven systems in Biomedicine Prakash M. Nadkarni."— Presentation transcript:
1 Metadata-driven systems in Biomedicine Prakash M. Nadkarni
2 What is Metadata? Data that defines and describes other data Roles that Metadata plays Descriptive: primarily human-readable – for documentation purposes Active/Functional; primarily machine- interpretable- can control the behavior of a system. The difference between the two lie in the consequences of incorrect specification.
3 Active vs. Descriptive Metadata The distinction between the two is not hard- and-fast. Active metadata must be specifiable by human curators, and must therefore be human-understandable to some degree. Though one thinks of descriptive metadata in its most basic form as narrative text, it must often be structured in order to facilitate machine processing – e.g., the description that accompanies an electronic record of a gene expression or proteomics experiment.
4 What is Metadata-driven software? Most software relies on configuration information to customize its behavior. In its simplest form, this information consists of property-value pairs- e.g., registry settings or resource files. As configuration needs get more complex, the configuration information needs to acquire a specific structure, i.e., its own data model.
5 Implementing Metadata Models In simpler situations, a hierarchical data model suffices, and XML may be utilized. In more complex scenarios this does not suffice. One must then resort to a relational model. The relational model makes sense where the application itself is database-oriented. A subset of the database schema can be reserved to represent the metadata model. In many applications, the metadata model is actually more elaborate (in terms of number of tables) than the model of the rest of the data.
6 Instances of Systems using Metadata Electronic Medical Records: TMR, Help, Columbia/Presbyterian CDR Clinical Study Data Management Systems: TrialDB Systems for Maintenance of Laboratory Data Browsing Systems for Presentation of Scientific Data Tom Slezak’s Chromosome 19 Database at Lawrence Livermore Heterogeneous Objects: SenseLab
7 Strengths of metadata-driven systems Avoidance of repetitive tasks Instead of writing similar code ten times, create a framework that solves the generic problem (“creative laziness”). Allowing administrators who are non- programmers to specify system behavior by using the framework. (Doesn’t mean that there is no learning curve: however, they don’t have to wrestle with syntax issues.) Relative rapidity of evolution of the system (once you get over the initial hump).
8 Example: User Interface Generation Validation Metadata: Datatype, Range Checks, membership in a list (“discrete value group” or “choice set”) Presentation Metadata – caption in one or more languages. Grouping Metadata – an element belongs to a higher-order group. E.g., in clinical data management systems, a question belongs to a form. An object can know how to display itself if there is sufficient information to enable it to do so.
9 Standards for Metadata There is no universal standard Some standards developed for one purpose have been misinterpreted and misapplied for others ISO/IEC 11167 turns out to be adequate for the purpose of document repositories and provenance, but has been misapplied for controlled vocabularies. Standards have evolved for specific purposes MAGE-OM for gene expression data CDISC for clinical study data These standards specify minimum information. A given application generally needs more.
10 Challenges (1) Harder to develop up-front – there is typically a long latency period opf “tool-building” when nothing seems to happen. Because they are framework-driven, they cannot do everything: some parts of the system may not fit the framework’s metaphor, but are still necessary to provide desired functionality to the users. (Billing module for clinical research.) It is at that point necessary to document very carefully what parts were generated and what parts weren’t.
11 Challenges (2) The metadata model cannot be ad hoc- you need a solid foundation once it gets complex enough. Solutions involving XML (or other) configuration files do not scale, because they get progressively harder to understand and maintain. Providing metadata editing functionality for administrator-level personnel who are not developers but who are required to maintain it becomes vital. Metadata are your system’s crown jewels- you cannot entrust its editing to insufficiently compulsive or motivated individuals.
12 Challenges (3) Moving metadata between systems is desirable when new installations of a system must be bootstrapped with standard functionality. This can be tricky. In relational schemas, many metadata tables utilize auto-number primary keys that are referenced elsewhere in the metadata. The resulting sets of IDs are meaningful only to a particular installation. Exchange involves identifying some other property of a row that is unique – e.g., a name. As a last resort, one must resort to pseudo- hierarchical formats.