MES-COBRAD: TOOLS AND TECHNIQUES FOR MEDICAL DATA ANALYSIS

  • Time to read 3 minutes
image

The importance of data analysis and tools is clearly established in medical research, especially when considering Real World Data (RWD), where heterogenous and big-data sources need to be harmonised and combined. In MES-CoBraD, the use of such tools is required to extract and perform analyses on various types of RWD (raw imaging data, electroencephalograms, hypnograms, etc) and use the resulting information to perform diagnosis and suggest treatment, to test scientific hypotheses, and, ultimately, to advance medical research. At the same time, the ability to show and manipulate the data and present the results of the analyses in an intuitive and easily comprehensible form is developed as an array of visualisation tools, drawing directly from the data, as well as from the analytics tools. When selecting the appropriate tools for performing data analyses, the following categories of methods/tools were identified:

 

  • Descriptive, using past data to extract knowledge about trends and answer the question “what happened?”
  • Diagnostic, using data discovery and correlation techniques to answer the question “why did it happen?”
  • Predictive, using forecasting models to answer the question “what is likely to happen?”
  • Prescriptive, analysing data to answer the question “what needs to be done?”

 

The data analysis process is not always straight forward. Often the challenge lies with the pre-processing of data after their ingestion, namely their preparation, in the form of cleansing, formatting, combining/harmonising. As a result, there are numerous industry-specific issues for the data analysis process that MES-CoBraD aims to address the

 

  • Analysis of structured and unstructured data, meaning to select the most appropriate model to represent biomedical data
  • Feature selection, meaning to identify and select relevant biomarkers
  • Diagnosis Classification, meaning to classify diagnoses and improve sensitivity of diagnostics processes
  • Visualisation, meaning to select appropriate visualisation techniques and tools

 

It is important to note that the tools used should guarantee the interpretability and explainability of the models, especially where Artificial Intelligence is involved, and comply with clinical guidelines. At the same time, the medical researchers need to have a clear understanding of the limitations of the data they perform analyses on as well as the tools they use, and appropriately specify the requirements that need to be met. Specifically for neuroimaging data, it is often required to perform multiple transformations before the analysis. The most important being considered in MES-CoBraD are

 

  • Spatial Normalisation and Smoothing, meaning to bring samples to similar scales and to remove high frequency artefacts
  • Coregistration, namely the mapping of functional information into anatomical space
  • Slice time correction, which shifts time-series of each slice to temporally align all slices to a reference time-point

 

Some of the more prevalent statistical methods and metrics that are used for data analysis and knowledge extraction considered and being implemented are:

 

  • Correlation analysis, meaning the assessment of the relationship between two continuous variables
  • Regression analysis, meaning the investigation of the relationship between dependent and one or more independent variables
  • Meanvariance, and standard deviation, which are basic statistical measures of central tendency

 

Regarding specific requirements for data analysis in MES-CoBraD’s domains, electroencephalograms (EEG) are processed, and analysis tools are being developed using the following methods:

 

  • Time series analysis, meaning the application of methods to describe observed time series
  • Time series forecasting, the application of the ARIMA methods to predict future values of the EEG
  • Time frequency analysis, the analyses of the EEG on both the time and frequency domain

 

Lastly, as mentioned above, specific types of visualisations need to be implemented in MES-CoBraD to enable or ease the work of researchers and clinicians, including spectrograms, hypnograms, EEG topography, etc.

 

After the selection of the necessary functions, methods and tools, a state-of-the-art analysis was performed to match these requirements with the available open-source libraries that can be used to implement them. A series of questionnaires, interviews, workshops and brainstorming sessions resulted in the selection of the tools below. These are shown here together with their most relevant to the project features, and their relevance to the type of functionality they provide.

 

 

Tool

Features

Analytics

Visualisations

NumPy

Access to NumPy arrays and methods

 

SciPy

Signal Processing, including filtering, fourier transforms. Statistical tests via the stats package. Image Processing

 

Statsmodels

Time-series analysis (ARMA, ARIMA, etc.)

MNE

Reads data from many formats, i.e., edf, vhdr, eeg, vmrk, etc.,

Functionalities for preprocessing data, i.e., filtering, repairing of artifacts, etc.

YASA

Algorithms for event detection, i.e., sleep spindles/slow waves/rapid eye movements.

Spectral analysis

 

FreeSurfer

Neuroimaging toolkit for processing, analyzing, and visualising human brain MR images

Matplotlib

Plotting library for the Python programming language and its numerical mathematics extension NumPy

 

Amcharts

JavaScript visualisation package offering interactive charts

 

 

 

Currently (June 2022), the MES-CoBraD consortium is hard at work integrating these tools, implementing the analysis and visualisation requirements and developing intuitive and feature-full graphical interfaces to accommodate their functionality.

 

This article was originally posted in NTUA’s website.