In laboratories, there’s a pressing need for data provenance—tracing the origin and changes over time of critical data such as electronic health records, analytical results, and workflow records.
"Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness." – The W3 Consortium, PROV-Overview
Data in laboratories is often produced by separate information systems, which can make it difficult to trace. When the systems aren’t connected, it can be even more challenging to maintain the data’s chain of custody and the metadata records.
The result? Stakeholders can’t generate a report that describes exactly what happened to a sample, such as which agents1 and activities2 were involved, and at what times. This information about the data—the metadata—provides granular details that might have been collected in a notebook or file previously.
For example, in order for a lab to produce a full provenance record on a measured volume, they need to prove how they know that a certain volume was measured. They also need to be able to answer questions such as:
Answering these questions efficiently requires that metadata for each specimen be recorded and stored in a standardized way so that it can be easily reviewed. While no regulatory bodies (such as the FDA, CAP, or CLIA) currently dictate how this metadata is captured for clinical laboratories, once data provenance becomes better supported in modern laboratory software, we predict that detailed traceability will become required in clinical software. We recommend preparing for this sooner rather than later.
There are a number of reasons why labs should place a high priority on addressing data provenance. For instance:
In an ideal world, laboratory informatics systems would be able to generate and interact with data that adheres to provenance standards, such as W3C PROV.3 What we’d like to see, eventually, is the ability for labs to immediately access all metadata records linked to a sample directly from within the laboratory information management system (LIMS). Unfortunately, that’s not possible yet using an off-the-shelf LIMS. There are a lot of obstacles to overcome before a universal provenance standard is adopted and all healthcare data formatting is harmonized.
However, in the meantime, labs can work with a software consultant to integrate the various components of their informatics systems to provide more robust data provenance. When you’re selecting a new vendor or consultant, be sure to confirm that they understand the importance of data provenance as a functional requirement in software. Custom clinical software should always be built with data provenance in mind.
An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
The World Wide Web Consortium (W3C) has created PROV, a set of recommended standards, to support the interchange of provenance information on the Web.