Publicacoes - INESC TEC

Publicações

Publicações por Carlos Baquero

2011

Conflict-Free Replicated Data Types

Autores
Shapiro, M; Preguica, N; Baquero, C; Zawirski, M;

Publicação
STABILIZATION, SAFETY, AND SECURITY OF DISTRIBUTED SYSTEMS

Abstract
Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in large-scale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflict-free Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

FecharLer Abstract

2007

Improving on version stamps

Autores
Almeida, PS; Baquero, C; Fonte, V;

Publicação
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 2, PROCEEDINGS

Abstract
Optimistic distributed systems often rely on version vectors or their variants in order to track updates on replicated objects. Some of these mechanisms rely on some form of global configuration or distributed naming protocol in order to assign unique identifiers to each replica. These approaches are incompatible with replica creation under arbitrary partitions, a typical operation mode in mobile or poorly connected environments. Other mechanisms assign unique identifiers relying on statistical correctness. In previous work we have introduced an update tracking mechanism that overcomes these limitations. This paper presents results from recent experimentation, that brought to surface a particular pattern of operation that results in an unforeseen, unlimited growth in space consumption. We also describe informally a new update tracking mechanism that does not exhibit this pathological growth while providing guaranteed unique identifiers for a dynamic number of replicas under arbitrary partitions and the same functionality of version vectors.

FecharLer Abstract

2007

Implementing range queries with a decentralized balanced tree over distributed hash tables

Autores
Lopes, N; Baquero, C;

Publicação
NETWORK-BASED INFORMATION SYSTEMS, PROCEEDINGS

Abstract
Range queries, retrieving all keys within a given range, is an important add-on for Distributed Hash Tables (DHTs), as they rely only on exact key matching lookup. In this paper we support range queries through a balanced tree algorithm, Decentralized Balanced Tree, that runs over any DHT system. Our algorithm is based on the B(+)-tree design that efficiently stores clustered data while maintaining a balanced load on hosts. The internal structure of the balanced tree is suited for range queries operations over many data distributions since it easily handles clustered data without losing performance. We analyzed, and evaluated our algorithm under a simulated environment, to show it's operation scalability for both insertions and queries. We will show that the system design. imposes a fixed penalty over the DHT access cost, and thus inherits the scalability properties of the chosen underlying DHT.

FecharLer Abstract

2007

Scalable Bloom Filters

Autores
Almeida, PS; Baquero, C; Preguica, N; Hutchison, D;

Publicação
INFORMATION PROCESSING LETTERS

Abstract
Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.

FecharLer Abstract

2008

Interval Tree Clocks A Logical Clock for Dynamic Systems

Autores
Almeida, PS; Baquero, C; Fonte, V;

Publicação
PRINCIPLES OF DISTRIBUTED SYSTEMS, 12TH INTERNATIONAL CONFERENCE, OPODIS 2008

Abstract
Causality tracking mechanisms, such as vector clocks and version vectors, rely on mappings from globally unique identifiers to integer counters. In a system with a well known set of entities these ids can be preconfigured and given distinct positions in a vector or distinct names in a mapping. Id management is more problematic in dynamic systems, with large and highly variable number of entities, being worsened when network partitions occur. Present solutions for causality tracking are not appropriate to these increasingly common scenarios. In this paper we introduce Interval Tree Clocks, a novel causality tracking mechanism that can be used in scenarios with a dynamic number of entities, allowing a completely decentralized creation of processes/replicas without need for global identifiers or global coordination. The mechanism has a variable size representation that adapts automatically to the number of existing entities, growing or shrinking appropriately. The representation is so compact that the mechanism can even be considered for scenarios with a fixed number of entities, which makes it a general substitute for vector clocks and version vectors.

FecharLer Abstract

2009

Forby: Providing Groupware Features Relying on Distributed File System Event Dissemination

Autores
Sousa, P; Preguica, N; Baquero, C;

Publicação
GROUPWARE-DESIGN: IMPLEMENTATION, AND USE, PROCEEDINGS

Abstract
Intensive research and development has been conducted in the design and creation of groupware systems for distributed users. While for some activities, these groupware tools are widely used, for other activities the impact in the groupware community has been smaller and can be improved. One reason for this fact is that the mostly common used applications do not support collaborative features and users are reluctant to change to a different application. In this paper we discuss how available file system mechanisms can help to address this problem. In this context, we present Forby, a system that allows to provide groupware features to distributed users by combining filesystem monitoring and distributed event dissemination. To demonstrate our solution, we present three systems that rely on Forby for providing groupware features to users running unmodified applications.

FecharLer Abstract