CDMS | Scientific Article: The contribution of MinatoLoader to the efficiency of AI model training
- anaelias39
- Jan 13
- 2 min read
Updated: 3 days ago
The CDMS project – Claim Denials Management Solution – is based on a clear premise: innovation only creates real impact when it is applied to concrete problems, with practical value for organizations. In the healthcare context, this means fundamentally rethinking how invoice returns, disputes with insurers, and avoidable revenue losses are managed, as these directly affect the financial sustainability of institutions.
Within this framework, scientific research plays a central role. To develop robust, efficient, and reliable artificial intelligence solutions, it is necessary to ensure not only strong models, but also training infrastructures capable of dealing with complex, heterogeneous, and large-scale data, as is typical in the healthcare sector.
Academic research within the scope of the CDMS project
As part of this project, developed by the co-promoters EFFY, Randy Labs, and INESC TEC, in partnership with CUF, the research conducted by INESC TEC resulted in relevant scientific advances, now published in the article “MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing”.
The article was co-authored by researcher Ricardo Macedo, from the High Assurance Software Laboratory (HASLab) at INESC TEC, and by Rahma Nouaji, Stella Bitchebe, and Oana Balmau, from McGill University, and will be presented at the 21st ACM European Conference on Computer Systems (EuroSys’26), which will take place from April 27 to 30, 2026, in Edinburgh.

The problem addressed
During the training of machine learning models, especially in real-world scenarios such as those involving clinical and administrative data in healthcare, a significant portion of time can be wasted even before GPU computation begins. The reason lies in data loading and preprocessing, whose duration can vary considerably from sample to sample.
Traditional data loaders assume relatively homogeneous processing times. When this assumption does not hold, the GPU is often left waiting for the slowest sample, becoming underutilized and compromising overall training efficiency. This phenomenon becomes particularly critical in projects such as CDMS, where scale, complexity, and computational cost are determining factors.
MinatoLoader’s contribution
MinatoLoader emerges as a direct response to this problem. It is a generic and transparent replacement for PyTorch’s standard DataLoader, which can be integrated without structural changes to the existing code.
Its approach is based on a mechanism for dynamically prioritizing samples, which makes it possible to identify, at runtime, differences in preprocessing times, prioritize faster samples when building batches, and process slower ones in parallel, preventing bottlenecks in the training pipeline.
This strategy significantly reduces GPU idle time and improves training efficiency without compromising the model logic or data quality.
Results and conclusions
The experimental results presented in the article show that MinatoLoader can speed up model training by up to 7.5×, while substantially increasing the average GPU utilization rate. Importantly, these gains are achieved without any negative impact on the accuracy or quality of the trained models.
The conclusions reinforce a central idea for the CDMS project: optimizing the training infrastructure is an essential step to make artificial intelligence solutions more efficient, scalable, and economically viable. By addressing a frequently overlooked bottleneck, MinatoLoader directly contributes to building faster, more reliable decision-support systems that are better prepared to respond to real challenges in the healthcare sector.
The full article is publicly available, in English, on arXiv and can be accessed here:





Comments