Publications

Publications by Miguel Marques Matos

2013

Scaling Up Publish/Subscribe Overlays Using Interest Correlation for Link Sharing

Authors
Matos, M; Felber, P; Oliveira, R; Pereira, JO; Riviere, E;

Publication
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Abstract
Topic-based publish/subscribe is at the core of many distributed systems, ranging from application integration middleware to news dissemination. Therefore, much research was dedicated to publish/subscribe architectures and protocols, and in particular to the design of overlay networks for decentralized topic-based routing and efficient message dissemination. Nonetheless, existing systems fail to take full advantage of shared interests when disseminating information, hence suffering from high maintenance and traffic costs, or construct overlays that cope poorly with the scale and dynamism of large networks. In this paper, we present StaN, a decentralized protocol that optimizes the properties of gossip-based overlay networks for topic-based publish/subscribe by sharing a large number of physical connections without disrupting its logical properties. StaN relies only on local knowledge and operates by leveraging common interests among participants to improve global resource usage and promote topic and event scalability. The experimental evaluation under two real workloads, both via a real deployment and through simulation, shows that StaN provides an attractive infrastructure for scalable topic-based publish/subscribe.

CloseRead Abstract

2017

Similarity Aware Shuffling for the Distributed Execution of SQL Window Functions

Authors
Coelho, F; Matos, M; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract
Window functions are extremely useful and have become increasingly popular, allowing ranking, cumulative sums and other analytic aggregations to be computed over a highly flexible and configurable sliding window. This powerful expressiveness comes naturally at the expense of heavy computational requirements which, so far, have been addressed through optimizations around centralized approaches by works both from the industry and academia. Distribution and parallelization has the potential to improve performance, but introduces several challenges associated with data distribution that may harm data locality. In this paper, we show how data similarity can be employed across partitions during the distributed execution of these operators to improve data co-locality between instances of a Distributed Query Engine and the associated data storage nodes. Our contribution can attain network gains in the average of 3 times and it is expected to scale as the number of instances increase. In the scenario with 8 nodes, we were to able attain bandwidth and time savings of 7.3 times and 2.61 times respectively. © IFIP International Federation for Information Processing 2017.

CloseRead Abstract

2015

EpTO: An Epidemic Total Order Algorithm for Large-Scale Distributed Systems

Authors
Matos, M; Mercier, H; Felber, P; Oliveira, R; Pereira, J;

Publication
Proceedings of the 16th Annual Middleware Conference

Abstract
The ordering of events is a fundamental problem of distributed computing and has been extensively studied over several decades. From all the available orderings, total ordering is of particular interest as it provides a powerful abstraction for building reliable distributed applications. Unfortunately, deterministic total order algorithms scale poorly and are therefore unfit for modern large-scale applications. The main contribution of this paper is EPTO, a total order algorithm with probabilistic agreement that scales both in the number of processes and events. EPTO provides deterministic safety and probabilistic liveness: integrity, total order and validity are always preserved, while agreement is achieved with arbitrarily high probability. We show that EPTO is well-suited for large-scale dynamic distributed systems: it does not require a global clock nor synchronized processes, and it is highly robust even when the network suffers from large delays and significant churn and message loss.

CloseRead Abstract

2017

Performance trade-offs on a secure multi-party relational database

Authors
Pontes, R; Pinto, M; Barbosa, M; Vilaça, R; Matos, M; Oliveira, R;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
The privacy of information is an increasing concern of software applications users. This concern was caused by attacks to cloud services over the last few years, that have leaked confidential information such as passwords, emails and even private pictures. Once the information is leaked, the users and software applications are powerless to contain the spread of information and its misuse. With databases as a central component of applications that store almost all of their data, they are one of the most common targets of attacks. However, typical deployments of databases do not leverage security mechanisms to stop attacks and do not apply cryptographic schemes to protect data. This issue has been tackled by multiple secure databases that provide trade-offs between security, query capabilities and performance. Despite providing stronger security guarantees, the proposed solutions still entrust their data to a single entity that can be corrupted or hacked. Secret sharing can solve this problem by dividing data in multiple secrets and storing each secret at a different location. The division is done in such a way that if one location is hacked, no information can be leaked. Depending on the protocols used to divide data, functions can be computed over this data through secure protocols that do not disclose information or actually know which values are being calculated. We propose a SQL database prototype capable of offering a trade-off between security and query latency by using a different secure protocol. An evaluation of the protocols is also performed, showing that our most relaxed protocol has an improvement of 5% on the query latency time over the original protocol. © 2017 ACM.

CloseRead Abstract

2017

A Practical Framework for Privacy-Preserving NoSQL Databases

Authors
Macedo, R; Paulo, J; Pontes, R; Portela, B; Oliveira, T; Matos, M; Oliveira, R;

Publication
2017 IEEE 36TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Cloud infrastructures provide database services as cost-efficient and scalable solutions for storing and processing large amounts of data. To maximize performance, these services require users to trust sensitive information to the cloud provider, which raises privacy and legal concerns. This represents a major obstacle to the adoption of the cloud computing paradigm. Recent work addressed this issue by extending databases to compute over encrypted data. However, these approaches usually support a single and strict combination of cryptographic techniques invariably making them application specific. To assess and broaden the applicability of cryptographic techniques in secure cloud storage and processing, these techniques need to be thoroughly evaluated in a modular and configurable database environment. This is even more noticeable for NoSQL data stores where data privacy is still mostly overlooked. In this paper, we present a generic NoSQL framework and a set of libraries supporting data processing cryptographic techniques that can be used with existing NoSQL engines and composed to meet the privacy and performance requirements of different applications. This is achieved through a modular and extensible design that enables data processing over multiple cryptographic techniques applied on the same database. For each technique, we provide an overview of its security model, along with an extensive set of experiments. The framework is evaluated with the YCSB benchmark, where we assess the practicality and performance tradeoffs for different combinations of cryptographic techniques. The results for a set of macro experiments show that the average overhead in NoSQL operations performance is below 15%, when comparing our system with a baseline database without privacy guarantees.

CloseRead Abstract

2016

Towards Quantifiable Eventual Consistency

Authors
Maia, F; Matos, M; Coelho, F;

Publication
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
In the pursuit of highly available systems, storage systems began offering eventually consistent data models. These models are suitable for a number of applications but not applicable for all. In this paper we discuss a system that can offer a eventually consistent data model but can also, when needed, offer a strong consistent one.

CloseRead Abstract