Publicacoes - INESC TEC

Publicações

Publicações por João Tiago Paulo

2021

Pods-as-Volumes: Effortlessly Integrating Storage Systems and Middleware into Kubernetes

Autores
Faria, A; Macedo, R; Paulo, J;

Publicação
WOC@Middleware

Abstract

2022

Accelerating Deep Learning Training Through Transparent Storage Tiering

Autores
Dantas, M; Leitao, D; Cui, P; Macedo, R; Liu, XL; Xu, WJ; Paulo, J;

Publicação
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022)

Abstract
We present MONARCH, a framework-agnostic storage middleware that transparently employs storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers of modern supercomputers (i.e., compute node's local storage and shared parallel file system (PFS)), while considering the I/O patterns of DL frameworks to improve data placement across tiers. MONARCH aims at accelerating DL training and decreasing the I/O pressure imposed over the PFS. We apply MONARCH to TensorFlow and PyTorch, while validating its performance and applicability under different models and dataset sizes. Results show that, even when the training dataset can only be partially stored at local storage, MONARCH reduces TensorFlow's and PyTorch's training time by up to 28% and 37% for I/O-intensive models, respectively. Furthermore, MONARCH decreases the number of I/O operations submitted to the PFS by up to 56%.

FecharLer Abstract

2022

PAIO: General, Portable I/O Optimizations With Minor Application Modifications

Autores
Macedo, R; Tanimura, Y; Haga, J; Chidarnbaram, V; Pereira, J; Paulo, J;

Publicação
PROCEEDINGS OF THE 20TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2022

Abstract
We present PAID, a framework that allows developers to implement portable I/O policies and optimizations for different applications with minor modifications to their original code base. The chief insight behind PALO is that if we are able to intercept and differentiate requests as they flow through different layers of the I/O stack, we can enforce complex storage policies without significantly changing the layers themselves. PAIO adopts ideas from the Software-Defined Storage community, building data plane stages that mediate and optimize I/O requests across layers and a control plane that coordinates and fine-tunes stages according to different storage policies. We demonstrate the performance and applicability of PALO with two use cases. The first improves 99th percentile latency by 4 x in industry-standard LSM-based key-value stores. The second ensures dynamic per-application bandwidth guarantees under shared storage environments.

FecharLer Abstract

2022

Protecting Metadata Servers From Harm Through Application-level I/O Control

Autores
Macedo, R; Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Evans, RT; Paulo, J;

Publicação
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022)

Abstract
Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadata-intensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present PADLL, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that PADLL can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization.

FecharLer Abstract

2025

Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)

Autores
Adão, R; Wu, Z; Zhou, C; Balmau, O; Paulo, J; Macedo, R;

Publicação
CoRR

Abstract

2025

KEIGO: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy

Autores
Adao, R; Wu, ZJ; Zhou, CJ; Balmau, O; Paulo, J; Macedo, R;

Publicação
PROCEEDINGS OF THE VLDB ENDOWMENT

Abstract
We present Keigo, a concurrency-and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques-concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write-and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.

FecharLer Abstract