Are the best data management systems, the ones we do not notice?


Importance of data and its early lifecycle. This is a series of blogs looking at the importance of data & the current efforts to make data more FAIR (Findable, Accessible, Interoperable & Reusable) & higher quality  to help tackle reproducibility and Life Science & health challenges. There are many excellent activities ongoing around data reusability & FAIR including FAIRplus, GoFAIR & Pistoia Alliance FAIR Toolkit where you can find more info.

But are the best data management systems the ones you do not notice?

Ever wondered how many times scientists enter data during their day in Life Science R&D & Health? For some this will be a key part of their job & especially those that create data as part of experiments and research in Life science. 

It feels that the effort & design (usability (UX) etc) for the data systems supporting these data entry processes and the overall data lifecycle are much less compared to the experimental effort, time & cost in creating these data.

Much is talked about how we need better quality data to support all this decision making and this is especially true in R&D Life Science.

Scientists have been recording data and analysing for centuries and over the past 40 years this storage and analysis has been driven by the rise of computing to support this both as a rise in compute power (Moore’s Law, GPUs, Quantum to come), the accessibility of that compute and flexibility of aligned storage (Cloud & Hybrid etc).

If we think a minute about the Data Lifecycle from the initial creation of data assets through storage, processing and analysis , plus possibly repeat analysis and onto longer term management & archival/deletion. The first part of the lifecycle should be the most critical in determining the value of the data but in many situations the initial data capture has been lacking in terms of meta data and quality plus overall experience. 

With the rise of big data, much effort was spent on cleansing data but this is fixing the data well after initial creation and can only be a stop gap when data volumes at these extreme levels are set to increase across R&D and health. Especially now with the rise of AI & ML which both requires high quality data and often in large quantities to make the predictions valid.

So in addressing our science challenges and how data can support this:

  • Need to consider the usability (UX) of the data entry systems 
  • Appropriate effort in those data capture systems to ensure they are FAIR and data is born FAIR
  • Think how this can enrich the quality of the data (FAIR + Quality)

In the next part of this series we will build on the role of good data management & what can be done to address this.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.