Jul 1, 2023 12:36:00 PM | Blog

Why a Graph Database is a Better Choice for Lab Software

Every lab that uses software like a laboratory information management system (LIMS), is using a database to store and access lab data. But what lab managers might not be aware of is that the underlying type of database—relational or non-relational—can impact how effectively a lab can innovate and scale.

Labs that aim to rapidly put new assays into production and iterate on them need a database that supports two primary functions: storage of data with its context (metadata) and easy reuse and searchability. However, most, if not all, LIMS on the market today use a relational database, which doesn’t support these critical functions. Studies show that graph databases, a type of non-relational database, are much more suitable for biomedical applications.

Relational & non-relational databases: what’s the difference?

Databases are categorized as relational or non-relational, depending on their underlying data structures.

Type Example Description
Relational ("SQL") database
  • PostgreSQL
  • Microsoft SQL Server
  • Oracle Database
  • MySQL
Databases that consist of multiple related tables, with data stored in rows and columns. They use structured query language (SQL) invented in the 1970s to read and make modifications to data in the database
Non-relational ("NoSQL") database
  • Document datastore
  • Column-oriented 
  • Key-value store
  • Graph database
  • RDF (e.g., OntoText GraphDB)
  • Property (e.g., Neo4)

Databases that do not use tables, allowing a more flexible, suitable data format.

Note that there are two types of graph databases. RDF graphs support RDF (Resource description framework) and conform to certain W3C standards. Property graphs do not support RDF and are less precise.

Check out this Oracle article if you're interested in learning more about the differences between RDF and property graphs.

Table 1.  Comparison of differences between relational and non-relational databases.

Why are relational databases insufficient for labs?

In a relational database, you must perform data modeling in order to set up database tables for the specific types of data to be stored. You’ll need to know upfront what types of data you want to store and what types of queries you’ll want to perform. If you need to restructure the data at any point, you have to perform a database migration—a process that becomes increasingly high-risk as the volume of data grows because every new addition to a table requires a corresponding change in the software code. The more changes you make, the more chance you have of introducing an error.

As your business evolves, what you store is likely to change based both on what you want to query and new products your lab wants to create. In our experience, labs tend to use a lot of unstructured and user-defined data, which is not easy to deal with in a structured database table.

Non-relational databases are less likely to need major database restructuring due to the inherent flexibility of their structures and the software applications that use them. For example, each piece of data is stored with its own data type, unlike in a relational database, where entire table columns need to be a uniform inferred type.

What is an RDF graph database?

RDF graph databases use a data model that consists of:

  • Nodes representing entities, such as a person, sample, or reagent.
  • Edges representing relationships between entities. For example, a sample is processed by a person.

If we were to represent this as an image, it would look something like the figure below.

20220818 Labbit Graph Database Graphic

Figure 1. Example of RDF graph database data model showing nodes & edges.

This data model is much more flexible than a table, as it does not constrain the type of data that can be added to the graph. It also records the relationships between entities, which is a form of human-and machine-readable metadata not explicitly stored in relational databases. We chose an RDF graph database for our new Labbit Intelligent Lab System due to three primary benefits.

Three advantages of an RDF graph database for clinical software

RDF graph databases offer labs a number of benefits compared to other types of databases. We’ve distilled these down to three main ones:

  1. More flexibility and agility. Graphs can evolve as your business and data requirements change. They allow you to include primary entities of any kind without adding technical debt in the data model, which convolutes reporting and thus necessitates ETLs.
  2. Easier querying and more precise data capture are possible because graph-stored data more accurately represents reality than the normalized forms required by relational databases. When you need information from the database, the graph can tell you exactly what you need to know, quickly and with context.
  3. The ability to create and update the data model within your LIMS to match the evolving ontology your laboratorians hold in their mind about the lab, rather than forcing them to adapt to the software’s narrowly defined view of it.

This ability supports two very useful applications for labs:

  1. MUCH faster knowledge inference for machine learning applications.
  2. Improved data shareability by enabling data interoperability. Paired with the right software and planning, an RDF graph enables a lab’s data to be stored, maintained, and shared in accordance with the FAIR data principles. Each piece of data includes rich metadata and has a unique identifier (in an RDF graph database, this is an internationalized resource identifier or IRI). These identifiers let a lab make connections to other sources of data and share the data with external users.

Here’s a quick comparison between relational and graph databases.

Type Relational Database Graph Database
Relationships Inferred using foreign keys between tables Stored explicitly between nodes as data
Data Structure Rigid - Must be pre-determined to create the correct tables Flexible - New types of data can be added without the need to change a schema or perform a migration.
Complex Querying Slower and difficult to construct - Requires complex joins on data tables and deep knowledge of both your schema and best practices Faster - Does not require joins; follows connections between nodes, allowing for ad-hoc querying on any topic without prior data modeling or knowledge of the data model.

Table 2. Comparison between relational and graph databases.

A more flexible option

Although relational databases have been the default for lab software for many years, advances in technology and data modeling mean that labs can now choose a more flexible, proven option—the non-relational RDF graph database. Modern RDF graph databases are at the forefront of data storage. Because they can help future-proof software, they have been widely adopted by cutting-edge organizations with a heavy research focus.

If your lab is facing yet another replatforming or database migration, we recommend considering using an RDF graph database instead, as part of your overall laboratory management data solution. Or, you might choose a LIMS like Labbit which natively employs an RDF graph database.

Ready to learn more?   

Labbit removes laboratory configuration bottlenecks, enabling you to simplify workflows, collaborate seamlessly, and empower new discoveries on a scalable and future-proof platform. Contact us for a free consultation.

Labbit

Written By: Labbit