Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

I am a researcher at HASLab and professor at the U. Minho. My research focuses on dependable distributed systems. I am interested mainly in data management, including database replication and SQL processing over NoSQL systems, and in  group communication, including consensus and gossip-based protocols for large-scale systems. I am also interested in tools for testing, evaluating, and monitoring dependable distributed systems.

Interest
Topics
Details

Details

  • Name

    José Orlando Pereira
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st November 2011
001
Publications

2018

Falcon: A Practical Log-Based Analysis Tool for Distributed Systems

Authors
Neves, F; Machado, N; Pereira, J;

Publication
48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, Luxembourg City, Luxembourg, June 25-28, 2018

Abstract
Programmers and support engineers typically rely on log data to narrow down the root cause of unexpected behaviors in dependable distributed systems. Unfortunately, the inherently distributed nature and complexity of such distributed executions often leads to multiple independent logs, scattered across different physical machines, with thousands or millions entries poorly correlated in terms of event causality. This renders log-based debugging a tedious, time-consuming, and potentially inconclusive task. We present Falcon, a tool aimed at making log-based analysis of distributed systems practical and effective. Falcon's modular architecture, designed as an extensible pipeline, allows it to seamlessly combine several distinct logging sources and generate a coherent space-time diagram of distributed executions. To preserve event causality, even in the presence of logs collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule. Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the log-based analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation. © 2018 IEEE.

2017

DDFlasks: Deduplicated Very Large Scale Data Store

Authors
Maia, F; Paulo, J; Coelho, F; Neves, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract
With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge. In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fullydecentralized and resilient design. © IFIP International Federation for Information Processing 2017.

2017

Similarity Aware Shuffling for the Distributed Execution of SQL Window Functions

Authors
Coelho, Fabio; Matos, Miguel; Pereira, Jose; Oliveira, Rui;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract

2017

Prepared scan: efficient retrieval of structured data from HBase

Authors
Neves, F; Vilaça, R; Pereira, JO; Oliveira, R;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema exibility. However, accessing structured data is costly due to such exibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29% and decreases network bandwidth consumption up to 20%. © 2017 ACM.

2017

Implementing a Linear Algebra Approach to Data Processing

Authors
Pontes, R; Matos, M; Oliveira, JN; Pereira, JO;

Publication
Lecture Notes in Computer Science - Grand Timely Topics in Software Engineering

Abstract

Supervised
thesis

2017

Automatic Adaptation to Heterogeneity in Large Scale Distributed Storage Systems

Author

Institution
UM

2017

Database Replication for Enterprise Applications

Author
Ana Luísa Parreira Nunes Alonso

Institution
UM

2017

Middleware de acesso coerente a serviços de base de dados na nuvem

Author

Institution
UM

2016

Holistic performance and scalability analysis for large scale distributed systems

Author
Francisco Nuno Teixeira Neves

Institution
UM

2016

AloTA - An IoT Platform on MonetDB

Author
Pedro Emanuel Silva Ferreira

Institution
UM