Publicacoes - INESC TEC

Publicações

Publicações por HASLab

2016

Holistic Shuffler for the Parallel Processing of SQL Window Functions

Autores
Coelho, F; Pereira, J; Vilaca, R; Oliveira, R;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2016

Abstract
Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. Currently, systems bypass parallelization opportunities which become especially relevant when considering Big Data as data is naturally partitioned. We present a shuffling technique to improve the parallel execution of window functions when data is naturally partitioned when the query holds a partitioning clause that does not match the natural partitioning of the relation. We evaluated this technique with a non-cumulative ranking function and we were able to reduce data transfer among parallel workers in 85% when compared to a naive approach.

FecharLer Abstract

2016

On the Cost of Safe Storage for Public Clouds: an Experimental Evaluation

Autores
Burihabwa, D; Pontes, R; Felber, P; Maia, F; Mercier, H; Oliveira, R; Paulo, J; Schiavoni, V;

Publicação
PROCEEDINGS OF 2016 IEEE 35TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Cloud-based storage services such as Dropbox, Google Drive and OneDrive are increasingly popular for storing enterprise data, and they have already become the de facto choice for cloud-based backup of hundreds of millions of regular users. Drawn by the wide range of services they provide, no upfront costs and 24/7 availability across all personal devices, customers are well-aware of the benefits that these solutions can bring. However, most users tend to forget-or worse ignore-some of the main drawbacks of such cloud-based services, namely in terms of privacy. Data entrusted to these providers can be leaked by hackers, disclosed upon request from a governmental agency's subpoena, or even accessed directly by the storage providers (e.g., for commercial benefits). While there exist solutions to prevent or alleviate these problems, they typically require direct intervention from the clients, like encrypting their data before storing it, and reduce the benefits provided such as easily sharing data between users. This practical experience report studies a wide range of security mechanisms that can be used atop standard cloud-based storage services. We present the details of our evaluation testbed and discuss the design choices that have driven its implementation. We evaluate several state-of-the-art techniques with varying security guarantees responding to user-assigned security and privacy criteria. Our results reveal the various trade-offs of the different techniques by means of representative workloads on top of industry-grade storage services.

FecharLer Abstract

2016

Reducing Data Transfer in Parallel Processing of SQL Window Functions

Autores
Coelho, F; Pereira, J; Vilaca, R; Oliveira, R;

Publicação
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. We propose a technique that can be used in the parallel execution of this operator when data is naturally partitioned. The proposed method benefits the cases where the required partitioning is not the natural partitioning employed. Preliminary evaluation shows that we are able to limit data transfer among parallel workers to 14% of the registered transfer when using a naive approach.

FecharLer Abstract

2016

Resource Usage Prediction in Distributed Key-Value Datastores

Autores
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaca, R;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2016

Abstract
In order to attain the promises of the Cloud Computing paradigm, systems need to be able to transparently adapt to environment changes. Such behavior benefits from the ability to predict those changes in order to handle them seamlessly. In this paper, we present a mechanism to accurately predict the resource usage of distributed key-value datastores. Our mechanism requires offline training but, in contrast with other approaches, it is sufficient to run it only once per hardware configuration and subsequently use it for online prediction of database performance under any circumstance. The mechanism accurately estimates the database resource usage for any request distribution with an average accuracy of 94 %, only by knowing two parameters: (i) cache hit ratio; and (ii) incoming throughput. Both input values can be observed in real time or synthesized for request allocation decisions. This novel approach is sufficiently simple and generic, while simultaneously being suitable for other practical applications.

FecharLer Abstract

2016

Towards Performance Prediction in Massive Scale Datastores

Autores
Cruz, F; Coelho, F; Oliveira, R;

Publicação
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
Buffer caching mechanisms are paramount to improve the performance of today's massive scale NoSQL databases. In this work, we show that in fact there is a direct and univocal relationship between the resource usage and the cache hit ratio in NoSQL databases. In addition, this relationship can be leveraged to build a mechanism that is able to estimate resource usage of the nodes composing the NoSQL cluster.

FecharLer Abstract

2016

Formalizing Single-Assignment Program Verification: An Adaptation-Complete Approach

Autores
Lourenco, CB; Frade, MJ; Pinto, JS;

Publicação
PROGRAMMING LANGUAGES AND SYSTEMS (ESOP 2016)

Abstract
Deductive verification tools typically rely on the conversion of code to a single-assignment (SA) form. In this paper we formalize program verification based on the translation of While programs annotated with loop invariants into a dynamic single-assignment language with a dedicated iterating construct, and the subsequent generation of compact, indeed linear-size, verification conditions. Soundness and completeness proofs are given for the entire workflow, including the translation of annotated programs to SA form. The formalization is based on a program logic that we show to be adaptation-complete. Although this important property has not, as far as we know, been established for any existing program verification tool, we believe that adaptation-completeness is one of the major motivations for the use of SA form as an intermediate language. Our results here show that indeed this allows for the tools to achieve the maximum degree of adaptation when handling subprograms.

FecharLer Abstract