Publications

Publications by José Orlando Pereira

2011

A Correlation-Aware Data Placement Strategy for Key-Value Stores

Authors
Vilaca, R; Oliveira, R; Pereira, J;

Publication
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS

Abstract
Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies. In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of complex queries common in social network read-intensive workloads. We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network application and show that the proposed correlation-aware data placement strategy offers a major improvement on the system's overall response time and network requirements.

CloseRead Abstract

2007

GORDA: An open architecture for database replication

Authors
Correia, A; Pereira, J; Rodrigues, L; Carvalho, N; Vilaca, R; Oliveira, R; Guedes, S;

Publication
Sixth IEEE International Symposium on Network Computing and Applications, Proceedings

Abstract

2006

Revisiting 1-copy equivalence in clustered databases

Authors
Oliveira, R; Pereira, J; Correia, A; Archibald, E;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Recently renewed interest in scalable database systems for shared nothing clusters has been supported by replication protocols based on group communication that are aimed at seamlessly extending the native consistency criteria of centralized database management systems. By using a read-one/write-all-available approach and avoiding the fine-grained synchronization associated with traditional distributed locking, one needs just a single distributed interaction step for each update transaction. Therefore the system can easily be scaled to a large number of replicas, especially, with read intensive loads typical of Web server support environments. In this paper we point out that 1-copy equivalence for causal consistency, which is subsumed by both serializability and snap-shot isolation criteria, depends on basic session guarantees that are costly to ensure in clusters, especially in a multi-tier environment. We then point out a simple solution that guarantees causal consistency in the Database State Machine protocol and evaluate its performance, thus highlighting the cost of seamlessly providing common consistency criteria of centralized databases in a clustered environment. Copyright 2006 ACM.

CloseRead Abstract

2006

Efficient epidemic multicast in heterogeneous networks

Authors
Pereira, J; Oliveira, R; Rodrigues, L;

Publication
On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, Pt 2, Proceedings

Abstract
The scalability and resilience of epidemic multicast, also called probabilistic or gossip-based multicast, rests on its symmetry: Each participant node contributes the same share of bandwidth thus spreading the load and allowing for redundancy. On the other hand, the symmetry of gossiping means that it does not avoid nodes or links with less capacity. Unfortunately, one cannot naively avoid such symmetry without also endangering scalability and resilience. In this paper we point out how to break out of this dilemma, by lazily deferring message transmission according to a configurable policy. An experimental proof-of-concept illustrates the approach.

CloseRead Abstract

2003

Semantically reliable multicast: Definition, implementation, and performance evaluation

Authors
Pereira, J; Rodrigues, L; Oliveira, R;

Publication
IEEE TRANSACTIONS ON COMPUTERS

Abstract
Semantic Reliability is a novel correctness criterion for multicast protocols based on the concept of message obsolescence: A message becomes obsolete when its content or purpose is superseded by a subsequent message. By exploiting obsolescence, a reliable multicast protocol may drop irrelevant messages to find additional buffer space for new messages. This makes the multicast protocol more resilient to transient performance perturbations of group members, thus improving throughput stability. This paper describes our experience in developing a suite of semantically reliable protocols. It summarizes the motivation, definition, and algorithmic issues and presents performance figures obtained with a running implementation. The data obtained experimentally is compared with analytic and simulation models. This comparison allows us to confirm the validity of these models and the usefulness of the approach. Finally, the paper reports the application of our prototype to distributed multiplayer games.

CloseRead Abstract

2009

On the Cost of Database Clusters Reconfiguration

Authors
Vilaca, R; Pereira, J; Oliveira, R; Armendariz Inigo, JE; Gonzalez de Mendivi, JRG;

Publication
2009 28TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS

Abstract
Data base clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ability to meet service level agreements when load spikes occur or cluster nodes fail. This translates into the ability to provision and deploy additional nodes. Many current research efforts focus on designing autonomic controllers to perform such reconfiguration, tuned to quickly react to system changes and spawn new replicas based on resource usage and performance measurements. In contrast, we are concerned about the inherent impact of deploying an additional node to an online cluster, considering both the time required to finish such an action as well as the impact on resource usage and performance of the cluster as a whole. If noticeable, such impact hinders the practicability of self-management techniques, since it adds an additional dimension that has to he accounted for. Our approach is to systematically benchmark a number of different reconfiguration scenarios to assess the cost of bringing a new replica online. We consider factors such as: workload characteristics, incremental and parallel recovery, flow control and outdatedness of the recovering replica. As a result, we show that research should be refocused from optimizing the capture and transmition of changes to applying them, which in a realistic setting dominates the cost of the recovery operation.

CloseRead Abstract