Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Sobre

Sobre

Eu sou investigador no HASLab e professor na U. Minho. A minha investigação centra-se em sistemas distribuidos confiáveis. Interesso-me principalmente pela gestão de dados, incluindo replicação de bases de dados e processamento de SQL sobre sistemas NoSQL, e por comunicação em grupo, incluindo protocolos de consenso e de difusão epidémica para sistemas em grande escala. Interesso-me também por técnicas e ferramentas para testar, avaliar e observar sistemas distribuídos confiáveis. Mais informação está disponível na minha página pessoal.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    José Orlando Pereira
  • Cargo

    Investigador Coordenador
  • Desde

    01 novembro 2011
010
Publicações

2025

CRDV: Conflict-free Replicated Data Views

Autores
Faria, N; Pereira, J;

Publicação
Proc. ACM Manag. Data

Abstract
There are now multiple proposals for Conflict-free Replicated Data Types (CRDTs) in SQL databases aimed at distributed systems. Some, such as ElectricSQL, provide only relational tables as convergent replicated maps, but this omits semantics that would be useful for merging updates. Others, such as Pg\_crdt, provide access to a rich library of encapsulated column types. However, this puts merge and query processing outside the scope of the query optimizer and restricts the ability of an administrator to influence access paths with materialization and indexes. Our proposal, CRDV, overcomes this challenge by using two layers implemented as SQL views: The first provides a replicated relational table from an update history, while the second implements varied and rich types on top of the replicated table. This allows the definition of merge semantics, or even entire new data types, in SQL itself, and enables global optimization of user queries together with merge operations. Therefore, it naturally extends the scope of query optimization and local transactions to operations on replicated data, can be used to reproduce the functionality of common CRDTs with simple SQL idioms, and results in better performance than alternatives.

2024

Databases in Edge and Fog Environments: A Survey

Autores
Ferreira, LMM; Coelho, F; Pereira, J;

Publicação
ACM COMPUTING SURVEYS

Abstract
While a significant number of databases are deployed in cloud environments, pushing part or all data storage and querying planes closer to their sources (i.e., to the edge) can provide advantages in latency, connectivity, privacy, energy, and scalability. This article dissects the advantages provided by databases in edge and fog environments by surveying application domains and discussing the key drivers for pushing database systems to the edge. At the same time, it also identifies the main challenges faced by developers in this new environment and analyzes the mechanisms employed to deal with them. By providing an overview of the current state of edge and fog databases, this survey provides valuable insights into future research directions.

2024

When Amnesia Strikes: Understanding and Reproducing Data Loss Bugs with Fault Injection

Autores
Ramos, M; Azevedo, J; Kingsbury, K; Pereira, J; Esteves, T; Macedo, R; Paulo, J;

Publicação
PROCEEDINGS OF THE VLDB ENDOWMENT

Abstract
We present LAZYFS, a new fault injection tool that simplifies the debugging and reproduction of complex data durability bugs experienced by databases, key-value stores, and other data-centric systems in crashes. Our tool simulates persistence properties of POSIX file systems (e.g., operations ordering and atomicity) and enables users to inject lost and torn write faults with a precise and controlled approach. Further, it provides profiling information about the system's operations flow and persisted data, enabling users to better understand the root cause of errors. We use LAZYFS to study seven important systems: PostgreSQL, etcd, Zookeeper, Redis, LevelDB, PebblesDB, and Lightning Network. Our fault injection campaign shows that LAZYFS automates and facilitates the reproduction of five known bug reports containing manual and complex reproducibility steps. Further, it aids in understanding and reproducing seven ambiguous bugs reported by users. Finally, LAZYFS is used to find eight new bugs, which lead to data loss, corruption, and unavailability.

2024

TADA: A Toolkit for Approximate Distributed Agreement

Autores
da Conceiçao, EL; Alonso, AN; Oliveira, RC; Pereira, J;

Publicação
SCIENCE OF COMPUTER PROGRAMMING

Abstract
TADA is a unique toolkit designed to foster the use and implementation of approximate distributed agreement primitives. Developed in Java, TADA provides ready-to-use implementations of several approximate agreement algorithms, as well as the tools to enable programmers/researchers to easily implement further protocols: A template that enables new protocol implementations to be created by simply changing specific functions; and high-level abstractions for communication and concurrency control. As an example, the toolkit includes a ready-to-use implementation for clock synchronisation between distributed processes. Further use cases can include sensor input stabilisation and distributed machine learning, or other instances of distributed agreement where network synchrony cannot be assumed, byzantine fault tolerance may be required and a bounded divergence in decision values can be tolerated.

2023

MRVs: Enforcing Numeric Invariants in Parallel Updates to Hotspots with Randomized Splitting

Autores
Faria, N; Pereira, J;

Publicação
Proc. ACM Manag. Data

Abstract
Performance of transactional systems is degraded by update hotspots as conflicts lead to waiting and wasted work. This is particularly challenging in emerging large-scale database systems, as latency increases the probability of conflicts, state-of-the-art lock-based mitigations are not available, and most alternatives provide only weak consistency and cannot enforce lower bound invariants. We address this challenge with Multi-Record Values (MRVs), a technique that can be layered on existing database systems and that uses randomization to split and access numeric values in multiple records such that the probability of conflict can be made arbitrarily small. The only coordination needed is the underlying transactional system, meaning it retains existing isolation guarantees. The proposal is tested on five different systems ranging from DBx1000 (scale-up) to MySQL GR and a cloud-native NewSQL system (scale-out). The experiments explore design and configuration trade-offs and, with the TPC-C and STAMP Vacation benchmarks, demonstrate improved throughput and reduced abort rates when compared to alternatives.

Teses
supervisionadas

2023

User-level software-defined storage data planes

Autor
Ricardo Gonçalves Macedo

Instituição
UM

2023

Distributed and Dependable SDS Control Plane for HPC

Autor
Mariana Martins de Sá Miranda

Instituição
UM

2022

Scientific data management

Autor
Rui Nuno Borges Cruz Oliveira

Instituição
UM

2022

Acordo Distribuído para Arquiteturas de Microsserviços

Autor
João Pedro Oliveira da Silva

Instituição
UM

2022

Epidemic broadcast algorithms In a Byzantine environment

Autor
Tomás Francisco Cruz Costa

Instituição
UM