Publications

Publications by Miguel Marques Matos

2013

Evaluating Cassandra as a manager of large file sets

Authors
Beernaert, L; Gomes, P; Matos, M; Vilaça, R; Oliveira, R;

Publication
Proceedings of the 3rd International Workshop on Cloud Data and Platforms, CloudDP@EuroSys 2013, Prague, Czech Republic, April 14-17, 2013

Abstract
All companies developing their business on the Web, not only giants like Google or Facebook but also small companies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertisements on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a production system, shows that when using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system. Copyright © 2013 ACM.

CloseRead Abstract

2014

LAYSTREAM: composing standard gossip protocols for live video streaming

Authors
Matos, M; Schiavoni, V; Riviere, E; Felber, P; Oliveira, R;

Publication
14-TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)

Abstract
Gossip-based live streaming is a popular topic, as attested by the vast literature on the subject. Despite the particular merits of each proposal, all need to implement and deal with common challenges such as membership management, topology construction and video packets dissemination. Well-principled gossip-based protocols have been proposed in the literature for each of these aspects. Our goal is to assess the feasibility of building a live streaming system, LAYSTREAM, as a composition of these existing protocols, to deploy the resulting system on real testbeds, and report on lessons learned in the process. Unlike previous evaluations conducted by simulations and considering each protocol independently, we use real deployments. We evaluate protocols both independently and as a layered composition, and unearth specific problems and challenges associated with deployment and composition. We discuss and present solutions for these, such as a novel topology construction mechanism able to cope with the specificities of a large-scale and delay-sensitive environment, but also with requirements from the upper layer. Our implementation and data are openly available to support experimental reproducibility.

CloseRead Abstract

2013

Lightweight, efficient, robust epidemic dissemination

Authors
Matos, M; Schiavoni, V; Felber, P; Oliveira, R; Riviere, E;

Publication
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

Abstract
Today's intensive demand for data such as live broadcast or news feeds requires efficient and robust dissemination systems. Traditionally, designs focus on extremes of the efficiency/robustness spectrum by either using structures, such as trees for efficiency or by using loosely-coupled epidemic protocols for robustness. We present BRISA, a hybrid approach combining the robustness of epidemics with the efficiency of structured approaches. BRISA implicitly emerges embedded dissemination structures from an underlying epidemic substrate. The structures' links are chosen with local knowledge only, but still ensuring connectivity. Failures can be promptly compensated and repaired thanks to the epidemic substrate, and their impact on dissemination delays masked by the use of multiple independent structures. Besides presenting the protocol design, we conduct an extensive evaluation in real environments, analyzing the effectiveness of the structure creation mechanism and its robustness under dynamic conditions. Results confirm BRISA as an efficient and robust approach to data dissemination in large dynamic environments.

CloseRead Abstract

2013

MeT: Workload aware elasticity for NoSQL

Authors
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaca, R;

Publication
Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys 2013

Abstract
NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations. In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern. MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes. © 2013 ACM.

CloseRead Abstract

2014

On the Support of Versioning in Distributed Key-Value Stores

Authors
Felber, P; Pasin, M; Riviere, E; Schiavoni, V; Sutra, P; Coelho, F; Oliveira, R; Matos, M; Vilaca, R;

Publication
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
The ability to access and query data stored in multiple versions is an important asset for many applications, such as Web graph analysis, collaborative editing platforms, data forensics, or correlation mining. The storage and retrieval of versioned data requires a specific API and support from the storage layer. The choice of the data structures used to maintain versioned data has a fundamental impact on the performance of insertions and queries. The appropriate data structure also depends on the nature of the versioned data and the nature of the access patterns. In this paper we study the design and implementation space for providing versioning support on top of a distributed key-value store (KVS). We define an API for versioned data access supporting multiple writers and show that a plain KVS does not offer the necessary synchronization power for implementing this API. We leverage the support for listeners at the KVS level and propose a general construction for implementing arbitrary types of data structures for storing and querying versioned data. We explore the design space of versioned data storage ranging from a flat data structure to a distributed sharded index. The resulting system, ALEPH, is implemented on top of an industrial-grade open-source KVS, Infinispan. Our evaluation, based on real-world Wikipedia access logs, studies the performance of each versioning mechanisms in terms of load balancing, latency and storage overhead in the context of different access scenarios.

CloseRead Abstract

2014

A peer-to-peer service architecture for the Smart Grid

Authors
Campos, F; Matos, M; Pereira, J; Rua, D;

Publication
14-TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)

Abstract
Important challenges in interoperability, reliability, and scalability need to be addressed before the Smart Grid vision can be fulfilled. The sheer scale of the electric grid and the criticality of the communication among its subsystems for proper management, demands a scalable and reliable communication framework able to work in an heterogeneous and dynamic environment. Moreover, the need to provide full interoperability between diverse current and future energy and non-energy systems, along with seamless discovery and configuration of a large variety of networked devices, ranging from the resource constrained sensing devices to servers in data centers, requires an implementation-agnostic Service Oriented Architecture. In this position paper we propose that this challenge can be addressed with a generic framework that reconciles the reliability and scalability of Peer-to-Peer systems, with the industrial standard interoperability of Web Services. We illustrate the flexibility of the proposed framework by showing how it can be used in two specific scenarios.

CloseRead Abstract