Publications

Publications by Francisco Almeida Maia

2014

DATAFLASKS: epidemic store for massive scale systems

Authors
Maia, F; Matos, M; Vilaca, R; Pereira, J; Oliveira, R; Riviere, E;

Publication
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation.

CloseRead Abstract

2013

MeT: Workload aware elasticity for NoSQL

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaca, R;

Publication
Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys 2013

Abstract
NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations. In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern. MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes. © 2013 ACM.

CloseRead Abstract

2017

SafeFS: a modular architecture for secure user-space file systems: one FUSE to rule them all

Authors
Pontes, Rogerio; Burihabwa, Dorian; Maia, Francisco; Paulo, Joao; Schiavoni, Valerio; Felber, Pascal; Mercier, Hugues; Oliveira, Rui;

Publication
Proceedings of the 10th ACM International Systems and Storage Conference, SYSTOR 2017, Haifa, Israel, May 22-24, 2017

Abstract
The exponential growth of data produced, the ever faster and ubiquitous connectivity, and the collaborative processing tools lead to a clear shift of data stores from local servers to the cloud. This migration occurring across different application domains and types of users|individual or corporate|raises two immediate challenges. First, outsourcing data introduces security risks, hence protection mechanisms must be put in place to provide guarantees such as privacy, confidentiality and integrity. Second, there is no \one-size-fits-all" solution that would provide the right level of safety or performance for all applications and users, and it is therefore necessary to provide mechanisms that can be tailored to the various deployment scenarios. In this paper, we address both challenges by introducing SafeFS, a modular architecture based on software-defined storage principles featuring stackable building blocks that can be combined to construct a secure distributed file system. SafeFS allows users to specialize their data store to their specific needs by choosing the combination of blocks that provide the best safety and performance tradeoffs. The file system is implemented in user space using FUSE and can access remote data stores. The provided building blocks notably include mechanisms based on encryption, replication, and coding. We implemented SafeFS and performed indepth evaluation across a range of workloads. Results reveal that while each layer has a cost, one can build safe yet efficient storage architectures. Furthermore, the different combinations of blocks sometimes yield surprising tradeoffs. © 2017 ACM.

CloseRead Abstract

2017

DDFlasks: Deduplicated Very Large Scale Data Store

Authors
Maia, F; Paulo, J; Coelho, F; Neves, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract
With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge. In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fullydecentralized and resilient design. © IFIP International Federation for Information Processing 2017.

CloseRead Abstract

2016

Towards Quantifiable Eventual Consistency

Authors
Maia, F; Matos, M; Coelho, F;

Publication
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
In the pursuit of highly available systems, storage systems began offering eventually consistent data models. These models are suitable for a number of applications but not applicable for all. In this paper we discuss a system that can offer a eventually consistent data model but can also, when needed, offer a strong consistent one.

CloseRead Abstract

2017

SAFETHINGS: Data Security by Design in the IoT

Authors
Barbosa, M; Ben Mokhtar, S; Felber, P; Maia, F; Matos, M; Oliveira, R; Riviere, E; Schiavoni, V; Voulgaris, S;

Publication
2017 13TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2017)

Abstract
Despite years of research and the long-lasting promise of pervasiveness of an "Internet of Things", it is only recently that a truly convincing number of connected things have been deployed in the wild. New services are now being built on top of these things and allow to realize the IoT vision. However, integration of things in complex and interconnected systems is still only in the hands of their manufacturers and of Cloud providers supporting IoT integration platforms. Several issues associated with data privacy arise from this situation. Not only do users need to trust manufacturers and IoT platforms for handling their data, but integration between heterogeneous platforms is still only incipient. In this position paper, we chart a new IoT architecture, SAFETHINGS, that aims at enabling data privacy by design, and that we believe can serve as the foundation for a more comprehensive IoT integration. The SAFETHINGS architecture is based on two simple but powerful conceptual component families, the cleansers and blenders, that allow data owners to get back the control of IoT data and its processing.

CloseRead Abstract