2025
Authors
Imperadeiro, J; Alonso, AN; Pereira, J;
Publication
2025 55TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS-SUPPLEMENTAL VOLUME, DSN-S
Abstract
Diversity is crucial in systems that tolerate Byzantine faults. Traditionally, system builders have relied on standardized interfaces (e.g., POSIX for operating systems) to obtain off-the-shelf components or on n-version programming for custom functionality. Unfortunately, standardized alternatives are rare, and the independent development of multiple versions of the same software is costly and justified only on the most critical applications. In this paper, we show that a limited and focused use of LLMs for translation opens up the possibility of leveraging the existing diversity in functionally equivalent but non-standardized components. Specifically, we show that LLMs can produce functionally correct database query translations with minimal guidance and adapt to diverse data models and query contexts, enabling the use of radically different database models, both SQL and NoSQL, together in a Byzantine fault-tolerant replicated system. We outline an approach to achieve this in practice and discuss future research directions.
2025
Authors
Gonçalves, A; Alonso, AN; Pereira, J; Oliveira, R;
Publication
CoRR
Abstract
2025
Authors
Braga, R; Pereira, J; Coelho, F;
Publication
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING
Abstract
Developers of data-intensive georeplicated applications face a difficult decision when selecting a database system. As captured by the CAP theorem, CP systems such as Spanner provide strong consistency that greatly simplifies application development. AP systems such as AntidoteDB providing Transactional Causal Consistency (TCC), ensure availability in face of network partitions and isolate performance from wide-area round-trip times, but avoid lost-update anomalies only when values can be merged. Ideally, an application should be able to adapt to current data and network conditions by selecting which transactional consistency to use for each transaction. In this paper, we test the hypothesis that a georeplicated database system can be built at its core providing only TCC, hence, being AP, but allow an application to execute some transactions under Snapshot Isolation (SI), hence CP. Our main result is showing that this can be achieved even when all the interaction happens through the TCC database system, without additional communication channels between the participants. A preliminary experimental evaluation with a proof-of-concept implementation using AntidoteDB shows that this approach is feasible.
2025
Authors
Faria, N; Pereira, J;
Publication
Proc. ACM Manag. Data
Abstract
2024
Authors
Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Cazes, J; Macedo, R; Pereira, J; Paulo, J;
Publication
PROCEEDINGS OF SC24-W: WORKSHOPS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS
Abstract
Modern supercomputers host numerous jobs that compete for shared storage resources, causing I/O interference and performance degradation. Solutions based on software-defined storage (SDS) emerged to address this issue by coordinating the storage environment through the enforcement of QoS policies. However, these often fail to consider the scale of modern HPC infrastructures. In this work, we explore the advantages and shortcomings of state-of-the-art SDS solutions and highlight the scale of current production clusters and their rising trends. Furthermore, we conduct the first experimental study that sheds new insights into the performance and scalability of flat and hierarchical SDS control plane designs. Our results, using the Frontera supercomputer, show that a flat design with a single controller can scale up to 2,500 nodes with an average control cycle latency of 41 ms, while hierarchical designs can handle up to 10,000 nodes with an average latency ranging between 69 and 103 ms.
2024
Authors
da Conceiçao, EL; Alonso, AN; Oliveira, RC; Pereira, J;
Publication
SCIENCE OF COMPUTER PROGRAMMING
Abstract
TADA is a unique toolkit designed to foster the use and implementation of approximate distributed agreement primitives. Developed in Java, TADA provides ready-to-use implementations of several approximate agreement algorithms, as well as the tools to enable programmers/researchers to easily implement further protocols: A template that enables new protocol implementations to be created by simply changing specific functions; and high-level abstractions for communication and concurrency control. As an example, the toolkit includes a ready-to-use implementation for clock synchronisation between distributed processes. Further use cases can include sensor input stabilisation and distributed machine learning, or other instances of distributed agreement where network synchrony cannot be assumed, byzantine fault tolerance may be required and a bounded divergence in decision values can be tolerated.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.