Publications

Publications by Miguel Marques Matos

2016

BUZZPSS: A Dependable and Adaptive Peer Sampling Service

Authors
Machado, N; Maia, F; Matos, M; Oliveira, R;

Publication
2016 SEVENTH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING (LADC)

Abstract
A distributed system is often built on top of an overlay network. Overlay networks enable network topology transparency while, at the same time, can be designed to provide efficient data dissemination, load balancing, and even fault tolerance. They are constructed by defining logical links between nodes creating a node graph. In practice, this is materialized by a Peer Sampling Service (PSS) that provides references to other nodes to communicate with. Depending on the configuration of the PSS, the characteristics of the overlay can be adjusted to cope with application requirements and performance concerns. Unfortunately, overlay efficiency comes at the expense of dependability. To overcome this, one often deploys an application overlay focused on efficiency, along with a safety-net overlay to ensure dependability. However, this approach results in significant resource waste since safety-net overlays are seldom used. In this paper, we focus on safety-net overlay networks and propose an adaptable mechanism to minimize resource usage while maintaining dependability guarantees. In detail, we consider a random overlay network, known to be highly dependable, and propose BUZZPSS, a new Peer Sampling Service that is able to autonomously fine-tune its resource consumption usage according to the observed system stability. When the system is stable and connectivity is not at risk, BUZZPSS autonomously changes its behavior to save resources. Alongside, it is also able to detect system instability and act accordingly to guarantee that the overlay remains operational. Through an experimental evaluation, we show that BUZZPSS is able to autonomously adapt to the system stability levels, consuming up to 6x less resources than a static approach.

CloseRead Abstract

2015

Practical evaluation of large scale applications

Authors
Jorge, T; Maia, F; Matos, M; Pereira, J; Oliveira, R;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Designing and implementing distributed systems is a hard endeavor, both at an abstract level when designing the system, and at a concrete level when implementing, debugging and evaluating it. This stems not only from the inherent complexity of writing and reasoning about distributed software, but also from the lack of tools for testing and evaluating it under realistic conditions. Moreover, the gap between the protocols’ specifications found on research papers and their implementations on real code is huge, leading to inconsistencies that often result in the implementation no longer following the specification. As an example, the specification of the popular Chord DHT comprises a few dozens of lines, while its Java implementation, OpenChord, is close to twenty thousand lines, excluding libraries. This makes it hard and error prone to change the implementation to reflect changes in the specification, regardless of programmers’ skill. Besides, critical behavior due to the unpredictable interleaving of operations and network uncertainty, can only be observed on a realistic setting, limiting the usefulness of simulation tools. We believe that being able to write an algorithm implementation very close to its specification, and evaluating it in a real environment is a big step in the direction of building better distributed systems. Our approach leverages the MINHA platform to offer a set of built in primitives that allows one to program very close to pseudo-code. This high level implementation can interact with off-the-shelf existing middleware and can be gradually replaced by a production-ready Java implementation. In this paper, we present the system design and showcase it using a well-known algorithm from the literature. © IFIP International Federation for Information Processing 2015.

CloseRead Abstract

2016

Resource Usage Prediction in Distributed Key-Value Datastores

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaca, R;

Publication
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2016

Abstract
In order to attain the promises of the Cloud Computing paradigm, systems need to be able to transparently adapt to environment changes. Such behavior benefits from the ability to predict those changes in order to handle them seamlessly. In this paper, we present a mechanism to accurately predict the resource usage of distributed key-value datastores. Our mechanism requires offline training but, in contrast with other approaches, it is sufficient to run it only once per hardware configuration and subsequently use it for online prediction of database performance under any circumstance. The mechanism accurately estimates the database resource usage for any request distribution with an average accuracy of 94 %, only by knowing two parameters: (i) cache hit ratio; and (ii) incoming throughput. Both input values can be observed in real time or synthesized for request allocation decisions. This novel approach is sufficiently simple and generic, while simultaneously being suitable for other practical applications.

CloseRead Abstract

2015

TOPiCo: Detecting most frequent items from multiple high-rate event streams

Authors
Schiavoni, V; Rivière, E; Sutra, P; Felber, P; Matos, M; Oliveira, R;

Publication
DEBS 2015 - Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems

Abstract
Systems such as social networks, search engines or trading platforms operate geographically distant sites that continuously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant information in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that computes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that (i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions. Copyright 2015 ACM.

CloseRead Abstract

2013

DATAFLASKS: an epidemic dependable key-value substrate

Authors
Maia, F; Matos, M; Vilaca, R; Pereira, J; Oliveira, R; Riviere, E;

Publication
2013 43RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN)

Abstract
Recently, tuple-stores have become pivotal structures in many information systems. Their ability to handle large datasets makes them important in an era with unprecedented amounts of data being produced and exchanged. However, these tuple-stores typically rely on structured peer-to-peer protocols which assume moderately stable environments. Such assumption does not always hold for very large scale systems sized in the scale of thousands of machines. In this paper we present a novel approach to the design of a tuple-store. Our approach follows a stratified design based on an unstructured substrate. We focus on this substrate and how the use of epidemic protocols allow reaching high dependability and scalability.

CloseRead Abstract

2014

DATAFLASKS: epidemic store for massive scale systems

Authors
Maia, F; Matos, M; Vilaca, R; Pereira, J; Oliveira, R; Riviere, E;

Publication
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation.

CloseRead Abstract