Publications

Publications by Francisco Miguel Cruz

2015

CumuloNimbo: A Cloud Scalable Multi-tier SQL Database

Authors
Peris, RJ; Martínez, MP; Kemme, B; Brondino, I; Pereira, JO; Vilaça, R; Cruz, F; Oliveira, R; Ahmad, MY;

Publication
IEEE Data Eng. Bull.

Abstract

2016

Resource Usage Prediction in Distributed Key-Value Datastores

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaca, R;

Publication
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2016

Abstract
In order to attain the promises of the Cloud Computing paradigm, systems need to be able to transparently adapt to environment changes. Such behavior benefits from the ability to predict those changes in order to handle them seamlessly. In this paper, we present a mechanism to accurately predict the resource usage of distributed key-value datastores. Our mechanism requires offline training but, in contrast with other approaches, it is sufficient to run it only once per hardware configuration and subsequently use it for online prediction of database performance under any circumstance. The mechanism accurately estimates the database resource usage for any request distribution with an average accuracy of 94 %, only by knowing two parameters: (i) cache hit ratio; and (ii) incoming throughput. Both input values can be observed in real time or synthesized for request allocation decisions. This novel approach is sufficiently simple and generic, while simultaneously being suitable for other practical applications.

CloseRead Abstract

2013

MeT: workload aware elasticity for NoSQL

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaça, R;

Publication
Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14-17, 2013

Abstract
NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations. In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern. MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes. © 2013 ACM.

CloseRead Abstract

2013

An Effective Scalable SQL Engine for NoSQL Databases

Authors
Vilaça, R; Cruz, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 13th IFIP WG 6.1 International Conference, DAIS 2013, Held as Part of the 8th International Federated Conference on Distributed Computing Techniques, DisCoTec 2013, Florence, Italy, June 3-5, 2013. Proceedings

Abstract
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase. © 2013 IFIP International Federation for Information Processing.

CloseRead Abstract

2016

Towards Performance Prediction in Massive Scale Datastores

Authors
Cruz, F; Coelho, F; Oliveira, R;

Publication
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
Buffer caching mechanisms are paramount to improve the performance of today's massive scale NoSQL databases. In this work, we show that in fact there is a direct and univocal relationship between the resource usage and the cache hit ratio in NoSQL databases. In addition, this relationship can be leveraged to build a mechanism that is able to estimate resource usage of the nodes composing the NoSQL cluster.

CloseRead Abstract

2014

pH1: A Transactional Middleware for NoSQL

Authors
Coelho, F; Cruz, F; Vilaca, R; Pereira, J; Oliveira, R;

Publication
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner, requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external transaction certifier, exposed by extending the client's interface with transaction bracketing primitives. We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases amounts to 11% decrease in throughput. Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of HBase's support for multiple versions, with a throughput penalty of 76% in the same conditions

CloseRead Abstract