Publications

Publications by HASLab

2007

Coupled schema transformation and data conversion for XML and SQL

Authors
Berdaguer, P; Cunha, A; Pacheco, H; Visser, J;

Publication
Practical Aspects of Declarative Languages

Abstract
A two-level data transformation consists of a type-level transformation of a data format coupled with value level transformations of data instances corresponding to that format. We have implemented a system for performing two-level transformations on XML schemas and their corresponding documents, and on SQL schemas and the databases that they describe. The core of the system consists of a combinator library for composing type-changing rewrite rules that preserve structural information and referential constraints. We discuss the implementation of the system's core library, and of its SQL and XML front-ends in the functional language Haskell. We show how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.

CloseRead Abstract

2007

Parallel graphics and visualization

Authors
Santo, LP; Raffin, B; Heirich, A;

Publication
PARALLEL COMPUTING

Abstract

2006

A pragmatic protocol for database replication in interconnected clusters

Authors
Grov, J; Soares, L; Jr., AC; Pereira, J; Oliveira, RC; Pedone, F;

Publication
12th Pacific Rim International Symposium on Dependable Computing, Proceedings

Abstract
Multi-master update everywhere database replication, as achieved by protocols based on group communication such as DBSM and Postgres-R, addresses both performance and availability. By scaling it to wide area networks, one could save costly bandwidth and avoid large round-trips to a distant master server Also, by ensuring that updates are safely stored at a remote site within transaction boundaries, disaster recovery is guaranteed. Unfortunately, scaling existing cluster based replication protocols is troublesome. In this paper we present a database replication protocol based on group communication that targets interconnected clusters. In contrast with previous proposals, it uses a separate multicast group for each cluster and thus does not impose any additional requirements on group communication, easing implementation and deployment in a rea setting. Nonetheless, the protocol ensures one-copy equivalence while allowing all sites to execute update transactions. Experimental evaluation using the workload of the industry standard TPC-C benchmark confirms the advantages of the approach.

CloseRead Abstract

2006

Revisiting 1-copy equivalence in clustered databases

Authors
Oliveira, R; Pereira, J; Correia, A; Archibald, E;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Recently renewed interest in scalable database systems for shared nothing clusters has been supported by replication protocols based on group communication that are aimed at seamlessly extending the native consistency criteria of centralized database management systems. By using a read-one/write-all-available approach and avoiding the fine-grained synchronization associated with traditional distributed locking, one needs just a single distributed interaction step for each update transaction. Therefore the system can easily be scaled to a large number of replicas, especially, with read intensive loads typical of Web server support environments. In this paper we point out that 1-copy equivalence for causal consistency, which is subsumed by both serializability and snap-shot isolation criteria, depends on basic session guarantees that are costly to ensure in clusters, especially in a multi-tier environment. We then point out a simple solution that guarantees causal consistency in the Database State Machine protocol and evaluate its performance, thus highlighting the cost of seamlessly providing common consistency criteria of centralized databases in a clustered environment. Copyright 2006 ACM.

CloseRead Abstract

2006

Efficient epidemic multicast in heterogeneous networks

Authors
Pereira, J; Oliveira, R; Rodrigues, L;

Publication
On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, Pt 2, Proceedings

Abstract
The scalability and resilience of epidemic multicast, also called probabilistic or gossip-based multicast, rests on its symmetry: Each participant node contributes the same share of bandwidth thus spreading the load and allowing for redundancy. On the other hand, the symmetry of gossiping means that it does not avoid nodes or links with less capacity. Unfortunately, one cannot naively avoid such symmetry without also endangering scalability and resilience. In this paper we point out how to break out of this dilemma, by lazily deferring message transmission according to a configurable policy. An experimental proof-of-concept illustrates the approach.

CloseRead Abstract

2006

Evaluating certification protocols in the partial database state machine

Authors
Sousa, A; Correia, A; Moura, F; Pereira, J; Oliveira, R;

Publication
First International Conference on Availability, Reliability and Security, Proceedings

Abstract
Partial replication is an alluring technique to ensure the reliability of very large and geographically distributed databases while, at the same time, offering good performance. By correctly exploiting access locality most transactions become confined to a small subset of the database replicas thus reducing processing, storage access and communication overhead associated with replication. The advantages of partial replication have however to be weighted against the added complexity that is required to manage it. In fact, if the chosen replica configuration prevents the local execution of transactions or if the overhead of consistency protocols offsets the savings of locality, potential gains cannot be realized. These issues are heavily dependent on the application used for evaluation and render simplistic benchmarks useless. In this paper, we present a detailed analysis of Partial Database State Machine (PDBSM) replication by comparing alternative partial replication protocols with full replication. This is done using a realistic scenario based on a detailed network simulator and access patterns from an industry standard database benchmark. The results obtained allow us to identify the best configuration for typical on-line transaction processing applications.

CloseRead Abstract