Publications

Publications by HASLab

2016

A calculational approach to path-based properties of the Eisenstein-Stern and Stern-Brocot trees via matrix algebra

Authors
Ferreira, JF; Mendes, A;

Publication
JOURNAL OF LOGICAL AND ALGEBRAIC METHODS IN PROGRAMMING

Abstract
This paper proposes a calculational approach to prove properties of two well-known binary trees used to enumerate the rational numbers: the Stern-Brocot tree and the Eisenstein-Stern tree (also known as Calkin-Wilf tree). The calculational style of reasoning is enabled by a matrix formulation that is well-suited to naturally formulate path-based properties, since it provides a natural way to refer to paths in the trees. Three new properties are presented. First, we show that nodes with palindromic paths contain the same rational in both the Stern-Brocot and Eisenstein-Stern trees. Second, we show how certain numerators and denominators in these trees can be written as the sum of two squares x(2) and y(2), with the rational x/y appearing in specific paths. Finally, we show how we can construct Sierpifiski's triangle from these trees of rationals. (C) 2015 Published by Elsevier Inc.

CloseRead Abstract

2016

Tuning Pipelined Scientific Data Analyses for Efficient Multicore Execution

Authors
Pereira, A; Onofre, A; Proenca, A;

Publication
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016)

Abstract
Scientific data analyses often apply a pipelined sequence of computational tasks to independent datasets. Each task in the pipeline captures and processes a dataset element, may be dependent on other tasks in the pipeline, may have a different computational complexity and may be filtered out from progressing in the pipeline. The goal of this work is to develop an efficient scheduler that automatically (i) manages a parallel data reading and an adequate data structure creation, (ii) adaptively defines the most efficient order of pipeline execution of the tasks, considering their inter-dependence and both the filtering out rate and the computational weight, and (iii) manages the parallel execution of the computational tasks in a multicore system, applied to the same or to different dataset elements. A real case study data analysis application from High Energy Physics (HEP) was used to validate the efficiency of this scheduler. Preliminary results show an impressive performance improvement of the pipeline tuning when compared to the original sequential HEP code (up to a 35x speedup in a dual 12-core system), and also show significant performance speedups over conventional parallelization approaches of this case study application (up to 10x faster in the same system).

CloseRead Abstract

2015

CumuloNimbo: A Cloud Scalable Multi-tier SQL Database

Authors
Peris, RJ; Martínez, MP; Kemme, B; Brondino, I; Pereira, J; Vilaça, R; Cruz, F; Oliveira, R; Ahmad, MY;

Publication
IEEE Data Eng. Bull.

Abstract

2015

Practical Evaluation of Large Scale Applications

Authors
Jorge, T; Maia, F; Matos, M; Pereira, J; Oliveira, R;

Publication
DAIS

Abstract
Designing and implementing distributed systems is a hard endeavor, both at an abstract level when designing the system, and at a concrete level when implementing, debugging and evaluating it. This stems not only from the inherent complexity of writing and reasoning about distributed software, but also from the lack of tools for testing and evaluating it under realistic conditions. Moreover, the gap between the protocols’ specifications found on research papers and their implementations on real code is huge, leading to inconsistencies that often result in the implementation no longer following the specification. As an example, the specification of the popular Chord DHT comprises a few dozens of lines, while its Java implementation, OpenChord, is close to twenty thousand lines, excluding libraries. This makes it hard and error prone to change the implementation to reflect changes in the specification, regardless of programmers’ skill. Besides, critical behavior due to the unpredictable interleaving of operations and network uncertainty, can only be observed on a realistic setting, limiting the usefulness of simulation tools. We believe that being able to write an algorithm implementation very close to its specification, and evaluating it in a real environment is a big step in the direction of building better distributed systems. Our approach leverages the MINHA platform to offer a set of built in primitives that allows one to program very close to pseudo-code. This high level implementation can interact with off-the-shelf existing middleware and can be gradually replaced by a production-ready Java implementation. In this paper, we present the system design and showcase it using a well-known algorithm from the literature.

CloseRead Abstract

2015

TOPiCo: detecting most frequent items from multiple high-rate event streams

Authors
Schiavoni, V; Rivière, E; Sutra, P; Felber, P; Matos, M; Oliveira, R;

Publication
DEBS

Abstract
Systems such as social networks, search engines or trading platforms operate geographically distant sites that continuously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant information in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that computes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that (i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.

CloseRead Abstract

2015

EpTO: An Epidemic Total Order Algorithm for Large-Scale Distributed Systems

Authors
Matos, M; Mercier, H; Felber, P; Oliveira, R; Pereira, J;

Publication
PROCEEDINGS OF THE 16TH ANNUAL MIDDLEWARE CONFERENCE

Abstract
The ordering of events is a fundamental problem of distributed computing and has been extensively studied over several decades. From all the available orderings, total ordering is of particular interest as it provides a powerful abstraction for building reliable distributed applications. Unfortunately, deterministic total order algorithms scale poorly and are therefore unfit for modern large-scale applications. The main contribution of this paper is EPTO, a total order algorithm with probabilistic agreement that scales both in the number of processes and events. EPTO provides deterministic safety and probabilistic liveness: integrity, total order and validity are always preserved, while agreement is achieved with arbitrarily high probability. We show that EPTO is well-suited for large-scale dynamic distributed systems: it does not require a global clock nor synchronized processes, and it is highly robust even when the network suffers from large delays and significant churn and message loss.

CloseRead Abstract