Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Ricardo Macedo is currently a Researcher at INESC TEC. He obtained is PhD degree in 2023 under the MAP-i Doctoral Programme in Computer Science from the Universities of Minho, Aveiro and Porto with the thesis “User-level Software-Defined Storage Data Planes”. 

His research is mainly focused on storage and operating systems, with an emphasis on designing new building blocks fitted for the performance, reliability, and energy consumption requirements of modern, large-scale I/O infrastructures, including key-value stores, kernel-bypass storage stacks, and disaggregated I/O resources. For more information, please check my personal web page at https://rgmacedo.github.io/.

Interest
Topics
Details

Details

  • Name

    Ricardo Gonçalves Macedo
  • Role

    Assistant Researcher
  • Since

    01st December 2016
009
Publications

2026

MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing

Authors
Nouaji, R; Bitchebe, S; Macedo, R; Balmau, O;

Publication
EuroSys

Abstract
Machine learning (ML) frameworks, such as PyTorch and TensorFlow, rely on data loaders to preprocess data before feeding it to accelerators. When preprocessing is inefficiently pipelined, GPUs can remain idle over long periods of time, leading to substantial training delays. For example, PyTorch’s default data loaders can cause up to 76% GPU idleness. A key bottleneck is the variability in preprocessing time across samples within the same dataset. Existing data loaders are oblivious to this variability, training all samples uniformly. In this case, a single slow sample can stall the entire batch, causing head-of-line blocking. We present MinatoLoader, a general-purpose data loader for PyTorch that accelerates training and improves GPU utilization under single-server, multi-GPU settings. It continuously prepares data in background and constructs batches by prioritizing fast-to-process samples, while slower samples are processed in parallel. Experiments conducted over NVIDIA V100 and A100 GPUs show that MinatoLoader accelerates training by up to 7.5× (3.6× on average) over PyTorch DataLoader and Pecan, and up to 3× (2.2× on average) over DALI. It also increases average GPU utilization from 46% with PyTorch to 90%, while preserving model accuracy and enabling faster convergence. © 2026 Copyright held by the owner/author(s)

2026

Holpaca: Holistic and Adaptable Cache Management for Shared Environments

Authors
Peixoto, JP; González, A; Bhimani, J; Rangaswami, R; Brito, C; Paulo, J; Macedo, R;

Publication
ICPE

Abstract

2026

Idiosyncrasies of Programmable Caching Engines

Authors
Peixoto, JP; González, A; Bhimani, J; Rangaswami, R; Brito, C; Paulo, J; Macedo, R;

Publication
CoRR

Abstract

2025

Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)

Authors
Adão, R; Wu, Z; Zhou, C; Balmau, O; Paulo, J; Macedo, R;

Publication
CoRR

Abstract

2025

KEIGO: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy

Authors
Adao, R; Wu, ZJ; Zhou, CJ; Balmau, O; Paulo, J; Macedo, R;

Publication
PROCEEDINGS OF THE VLDB ENDOWMENT

Abstract
We present Keigo, a concurrency-and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques-concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write-and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.