Publications

Publications by José Orlando Pereira

2025

Rethinking BFT: Leveraging Diverse Software Components with LLMs

Authors
Imperadeiro, J; Alonso, AN; Pereira, J;

Publication
2025 55TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS-SUPPLEMENTAL VOLUME, DSN-S

Abstract
Diversity is crucial in systems that tolerate Byzantine faults. Traditionally, system builders have relied on standardized interfaces (e.g., POSIX for operating systems) to obtain off-the-shelf components or on n-version programming for custom functionality. Unfortunately, standardized alternatives are rare, and the independent development of multiple versions of the same software is costly and justified only on the most critical applications. In this paper, we show that a limited and focused use of LLMs for translation opens up the possibility of leveraging the existing diversity in functionally equivalent but non-standardized components. Specifically, we show that LLMs can produce functionally correct database query translations with minimal guidance and adapt to diverse data models and query contexts, enabling the use of radically different database models, both SQL and NoSQL, together in a Byzantine fault-tolerant replicated system. We outline an approach to achieve this in practice and discuss future research directions.

CloseRead Abstract

2025

Uma extensão de Raft com propagação epidémica

Authors
Gonçalves, A; Alonso, AN; Pereira, J; Oliveira, R;

Publication
CoRR

Abstract

2025

Towards Adaptive Transactional Consistency for Georeplicated Datastores

Authors
Braga, R; Pereira, J; Coelho, F;

Publication
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
Developers of data-intensive georeplicated applications face a difficult decision when selecting a database system. As captured by the CAP theorem, CP systems such as Spanner provide strong consistency that greatly simplifies application development. AP systems such as AntidoteDB providing Transactional Causal Consistency (TCC), ensure availability in face of network partitions and isolate performance from wide-area round-trip times, but avoid lost-update anomalies only when values can be merged. Ideally, an application should be able to adapt to current data and network conditions by selecting which transactional consistency to use for each transaction. In this paper, we test the hypothesis that a georeplicated database system can be built at its core providing only TCC, hence, being AP, but allow an application to execute some transactions under Snapshot Isolation (SI), hence CP. Our main result is showing that this can be achieved even when all the interaction happens through the TCC database system, without additional communication channels between the participants. A preliminary experimental evaluation with a proof-of-concept implementation using AntidoteDB shows that this approach is feasible.

CloseRead Abstract

2025

CRDV: Conflict-free Replicated Data Views

Authors
Faria, N; Pereira, J;

Publication
Proc. ACM Manag. Data

Abstract
There are now multiple proposals for Conflict-free Replicated Data Types (CRDTs) in SQL databases aimed at distributed systems. Some, such as ElectricSQL, provide only relational tables as convergent replicated maps, but this omits semantics that would be useful for merging updates. Others, such as Pg\_crdt, provide access to a rich library of encapsulated column types. However, this puts merge and query processing outside the scope of the query optimizer and restricts the ability of an administrator to influence access paths with materialization and indexes. Our proposal, CRDV, overcomes this challenge by using two layers implemented as SQL views: The first provides a replicated relational table from an update history, while the second implements varied and rich types on top of the replicated table. This allows the definition of merge semantics, or even entire new data types, in SQL itself, and enables global optimization of user queries together with merge operations. Therefore, it naturally extends the scope of query optimization and local transactions to operations on replicated data, can be used to reproduce the functionality of common CRDTs with simple SQL idioms, and results in better performance than alternatives.

CloseRead Abstract Read Full Publication

2024

Can Current SDS Controllers Scale To Modern HPC Infrastructures?

Authors
Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Cazes, J; Macedo, R; Pereira, J; Paulo, J;

Publication
PROCEEDINGS OF SC24-W: WORKSHOPS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

Abstract
Modern supercomputers host numerous jobs that compete for shared storage resources, causing I/O interference and performance degradation. Solutions based on software-defined storage (SDS) emerged to address this issue by coordinating the storage environment through the enforcement of QoS policies. However, these often fail to consider the scale of modern HPC infrastructures. In this work, we explore the advantages and shortcomings of state-of-the-art SDS solutions and highlight the scale of current production clusters and their rising trends. Furthermore, we conduct the first experimental study that sheds new insights into the performance and scalability of flat and hierarchical SDS control plane designs. Our results, using the Frontera supercomputer, show that a flat design with a single controller can scale up to 2,500 nodes with an average control cycle latency of 41 ms, while hierarchical designs can handle up to 10,000 nodes with an average latency ranging between 69 and 103 ms.

CloseRead Abstract

2024

TADA: A Toolkit for Approximate Distributed Agreement

Authors
da Conceiçao, EL; Alonso, AN; Oliveira, RC; Pereira, J;

Publication
SCIENCE OF COMPUTER PROGRAMMING

Abstract
TADA is a unique toolkit designed to foster the use and implementation of approximate distributed agreement primitives. Developed in Java, TADA provides ready-to-use implementations of several approximate agreement algorithms, as well as the tools to enable programmers/researchers to easily implement further protocols: A template that enables new protocol implementations to be created by simply changing specific functions; and high-level abstractions for communication and concurrency control. As an example, the toolkit includes a ready-to-use implementation for clock synchronisation between distributed processes. Further use cases can include sensor input stabilisation and distributed machine learning, or other instances of distributed agreement where network synchrony cannot be assumed, byzantine fault tolerance may be required and a bounded divergence in decision values can be tolerated.

CloseRead Abstract