Publications

Publications by José Orlando Pereira

2014

A peer-to-peer service architecture for the Smart Grid

Authors
Campos, F; Matos, M; Pereira, J; Rua, D;

Publication
14-TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)

Abstract
Important challenges in interoperability, reliability, and scalability need to be addressed before the Smart Grid vision can be fulfilled. The sheer scale of the electric grid and the criticality of the communication among its subsystems for proper management, demands a scalable and reliable communication framework able to work in an heterogeneous and dynamic environment. Moreover, the need to provide full interoperability between diverse current and future energy and non-energy systems, along with seamless discovery and configuration of a large variety of networked devices, ranging from the resource constrained sensing devices to servers in data centers, requires an implementation-agnostic Service Oriented Architecture. In this position paper we propose that this challenge can be addressed with a generic framework that reconciles the reliability and scalability of Peer-to-Peer systems, with the industrial standard interoperability of Web Services. We illustrate the flexibility of the proposed framework by showing how it can be used in two specific scenarios.

CloseRead Abstract

2013

Scaling Up Publish/Subscribe Overlays Using Interest Correlation for Link Sharing

Authors
Matos, M; Felber, P; Oliveira, R; Pereira, JO; Riviere, E;

Publication
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Abstract
Topic-based publish/subscribe is at the core of many distributed systems, ranging from application integration middleware to news dissemination. Therefore, much research was dedicated to publish/subscribe architectures and protocols, and in particular to the design of overlay networks for decentralized topic-based routing and efficient message dissemination. Nonetheless, existing systems fail to take full advantage of shared interests when disseminating information, hence suffering from high maintenance and traffic costs, or construct overlays that cope poorly with the scale and dynamism of large networks. In this paper, we present StaN, a decentralized protocol that optimizes the properties of gossip-based overlay networks for topic-based publish/subscribe by sharing a large number of physical connections without disrupting its logical properties. StaN relies only on local knowledge and operates by leveraging common interests among participants to improve global resource usage and promote topic and event scalability. The experimental evaluation under two real workloads, both via a real deployment and through simulation, shows that StaN provides an attractive infrastructure for scalable topic-based publish/subscribe.

CloseRead Abstract

2013

AJITTS: Adaptive just-in-time transaction scheduling

Authors
Nunes, A; Oliveira, R; Pereira, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Distributed transaction processing has benefited greatly from optimistic concurrency control protocols thus avoiding costly fine-grained synchronization. However, the performance of these protocols degrades significantly when the workload increases, namely, by leading to a substantial amount of aborted transactions due to concurrency conflicts. Our approach stems from the observation that when the abort rate increases with the load as already executed transactions queue for longer periods of time waiting for their turn to be certified and committed. We thus propose an adaptive algorithm for judiciously scheduling transactions to minimize the time during which these are vulnerable to being aborted by concurrent transactions, thereby reducing the overall abort rate. We do so by throttling transaction execution using an adaptive mechanism based on the locally known state of globally executing transactions, that includes out-of-order execution. Our evaluation using traces from the industry standard TPC-E workload shows that the amount of aborted transactions can be kept bounded as system load increases, while at the same time fully utilizing system resources and thus scaling transaction processing throughput. © 2013 IFIP International Federation for Information Processing.

CloseRead Abstract

2013

An effective scalable SQL engine for NoSQL databases

Authors
Vilaca, R; Cruz, F; Pereira, J; Oliveira, R;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase. © 2013 IFIP International Federation for Information Processing.

CloseRead Abstract

2017

DDFlasks: Deduplicated Very Large Scale Data Store

Authors
Maia, F; Paulo, J; Coelho, F; Neves, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract
With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge. In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fullydecentralized and resilient design. © IFIP International Federation for Information Processing 2017.

CloseRead Abstract

2017

Similarity Aware Shuffling for the Distributed Execution of SQL Window Functions

Authors
Coelho, F; Matos, M; Pereira, J; Oliveira, R;

Abstract
Window functions are extremely useful and have become increasingly popular, allowing ranking, cumulative sums and other analytic aggregations to be computed over a highly flexible and configurable sliding window. This powerful expressiveness comes naturally at the expense of heavy computational requirements which, so far, have been addressed through optimizations around centralized approaches by works both from the industry and academia. Distribution and parallelization has the potential to improve performance, but introduces several challenges associated with data distribution that may harm data locality. In this paper, we show how data similarity can be employed across partitions during the distributed execution of these operators to improve data co-locality between instances of a Distributed Query Engine and the associated data storage nodes. Our contribution can attain network gains in the average of 3 times and it is expected to scale as the number of instances increase. In the scenario with 8 nodes, we were to able attain bandwidth and time savings of 7.3 times and 2.61 times respectively. © IFIP International Federation for Information Processing 2017.

CloseRead Abstract