Publications

Publications by HASLab

2013

Evaluating Cassandra as a manager of large file sets

Authors
Beernaert, L; Gomes, P; Matos, M; Vilaça, R; Oliveira, R;

Publication
Proceedings of the 3rd International Workshop on Cloud Data and Platforms, CloudDP@EuroSys 2013, Prague, Czech Republic, April 14-17, 2013

Abstract
All companies developing their business on the Web, not only giants like Google or Facebook but also small companies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertisements on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a production system, shows that when using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system. Copyright © 2013 ACM.

CloseRead Abstract

2013

Lightweight, efficient, robust epidemic dissemination

Authors
Matos, M; Schiavoni, V; Felber, P; Oliveira, R; Riviere, E;

Publication
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

Abstract
Today's intensive demand for data such as live broadcast or news feeds requires efficient and robust dissemination systems. Traditionally, designs focus on extremes of the efficiency/robustness spectrum by either using structures, such as trees for efficiency or by using loosely-coupled epidemic protocols for robustness. We present BRISA, a hybrid approach combining the robustness of epidemics with the efficiency of structured approaches. BRISA implicitly emerges embedded dissemination structures from an underlying epidemic substrate. The structures' links are chosen with local knowledge only, but still ensuring connectivity. Failures can be promptly compensated and repaired thanks to the epidemic substrate, and their impact on dissemination delays masked by the use of multiple independent structures. Besides presenting the protocol design, we conduct an extensive evaluation in real environments, analyzing the effectiveness of the structure creation mechanism and its robustness under dynamic conditions. Results confirm BRISA as an efficient and robust approach to data dissemination in large dynamic environments.

CloseRead Abstract

2013

MeT: workload aware elasticity for NoSQL

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaça, R;

Publication
Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14-17, 2013

Abstract
NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations. In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern. MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes. © 2013 ACM.

CloseRead Abstract

2013

Scaling Up Publish/Subscribe Overlays Using Interest Correlation for Link Sharing

Authors
Matos, M; Felber, P; Oliveira, R; Pereira, JO; Riviere, E;

Publication
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Abstract
Topic-based publish/subscribe is at the core of many distributed systems, ranging from application integration middleware to news dissemination. Therefore, much research was dedicated to publish/subscribe architectures and protocols, and in particular to the design of overlay networks for decentralized topic-based routing and efficient message dissemination. Nonetheless, existing systems fail to take full advantage of shared interests when disseminating information, hence suffering from high maintenance and traffic costs, or construct overlays that cope poorly with the scale and dynamism of large networks. In this paper, we present StaN, a decentralized protocol that optimizes the properties of gossip-based overlay networks for topic-based publish/subscribe by sharing a large number of physical connections without disrupting its logical properties. StaN relies only on local knowledge and operates by leveraging common interests among participants to improve global resource usage and promote topic and event scalability. The experimental evaluation under two real workloads, both via a real deployment and through simulation, shows that StaN provides an attractive infrastructure for scalable topic-based publish/subscribe.

CloseRead Abstract

2013

AJITTS: Adaptive just-in-time transaction scheduling

Authors
Nunes, A; Oliveira, R; Pereira, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Distributed transaction processing has benefited greatly from optimistic concurrency control protocols thus avoiding costly fine-grained synchronization. However, the performance of these protocols degrades significantly when the workload increases, namely, by leading to a substantial amount of aborted transactions due to concurrency conflicts. Our approach stems from the observation that when the abort rate increases with the load as already executed transactions queue for longer periods of time waiting for their turn to be certified and committed. We thus propose an adaptive algorithm for judiciously scheduling transactions to minimize the time during which these are vulnerable to being aborted by concurrent transactions, thereby reducing the overall abort rate. We do so by throttling transaction execution using an adaptive mechanism based on the locally known state of globally executing transactions, that includes out-of-order execution. Our evaluation using traces from the industry standard TPC-E workload shows that the amount of aborted transactions can be kept bounded as system load increases, while at the same time fully utilizing system resources and thus scaling transaction processing throughput. © 2013 IFIP International Federation for Information Processing.

CloseRead Abstract

2013

An Effective Scalable SQL Engine for NoSQL Databases

Authors
Vilaça, R; Cruz, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 13th IFIP WG 6.1 International Conference, DAIS 2013, Held as Part of the 8th International Federated Conference on Distributed Computing Techniques, DisCoTec 2013, Florence, Italy, June 3-5, 2013. Proceedings

Abstract
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase. © 2013 IFIP International Federation for Information Processing.

CloseRead Abstract