Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

I am currently a Post-Doc researcher at INESC TEC and University of Minho. I have obtained a PhD degree in Computer Science from the MAP-i Doctoral Program in Computer Science, which is a joint program of Minho, Aveiro and Porto Universities with the collaboration of CMU and UT-Austin Universities (2015). Also, I have a M.Sc. in Informatics Engineering (2009) and a B.Sc. in Informatics Engineering (2007), both concluded at the University of Minho.

Currently, my research is focused on large scale distributed systems with an emphasis on storage systems and data management. I have several publications on journals and international conferences, and I have participated in the research and development of EU (CoherentPaaS, SafeCloud) and national (Pastramy, RED) projects.

Interest
Topics
Details

Details

  • Name

    João Tiago Paulo
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st November 2011
001
Publications

2017

SafeFS: a modular architecture for secure user-space file systems: one FUSE to rule them all

Authors
Pontes, Rogerio; Burihabwa, Dorian; Maia, Francisco; Paulo, Joao; Schiavoni, Valerio; Felber, Pascal; Mercier, Hugues; Oliveira, Rui;

Publication
Proceedings of the 10th ACM International Systems and Storage Conference, SYSTOR 2017, Haifa, Israel, May 22-24, 2017

Abstract
The exponential growth of data produced, the ever faster and ubiquitous connectivity, and the collaborative processing tools lead to a clear shift of data stores from local servers to the cloud. This migration occurring across different application domains and types of users|individual or corporate|raises two immediate challenges. First, outsourcing data introduces security risks, hence protection mechanisms must be put in place to provide guarantees such as privacy, confidentiality and integrity. Second, there is no \one-size-fits-all" solution that would provide the right level of safety or performance for all applications and users, and it is therefore necessary to provide mechanisms that can be tailored to the various deployment scenarios. In this paper, we address both challenges by introducing SafeFS, a modular architecture based on software-defined storage principles featuring stackable building blocks that can be combined to construct a secure distributed file system. SafeFS allows users to specialize their data store to their specific needs by choosing the combination of blocks that provide the best safety and performance tradeoffs. The file system is implemented in user space using FUSE and can access remote data stores. The provided building blocks notably include mechanisms based on encryption, replication, and coding. We implemented SafeFS and performed indepth evaluation across a range of workloads. Results reveal that while each layer has a cost, one can build safe yet efficient storage architectures. Furthermore, the different combinations of blocks sometimes yield surprising tradeoffs. © 2017 ACM.

2017

DDFlasks: Deduplicated Very Large Scale Data Store

Authors
Maia, F; Paulo, J; Coelho, F; Neves, F; Pereira, J; Oliveira, R;

Publication
Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 19-22, 2017, Proceedings

Abstract
With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge. In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fullydecentralized and resilient design. © IFIP International Federation for Information Processing 2017.

2017

A Practical Framework for Privacy-Preserving NoSQL Databases

Authors
Macedo, R; Paulo, J; Pontes, R; Portela, B; Oliveira, T; Matos, M; Oliveira, R;

Publication
36th IEEE Symposium on Reliable Distributed Systems, SRDS 2017, Hong Kong, Hong Kong, September 26-29, 2017

Abstract

2017

HTAPBench: Hybrid Transactional and Analytical Processing Benchmark

Authors
Coelho, F; Paulo, J; Vilaça, R; Pereira, JO; Oliveira, R;

Publication
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ICPE 2017, L'Aquila, Italy, April 22-26, 2017

Abstract
The increasing demand for real-time analytics requires the fusion of Transactional (OLTP) and Analytical (OLAP) systems, eschewing ETL processes and introducing a plethora of proposals for the so-called Hybrid Analytical and Trans-actional Processing (HTAP) systems. Unfortunately, current benchmarking approaches are not able to comprehensively produce a unified metric from the assessment of an HTAP system. The evaluation of both engine types is done separately, leading to the use of disjoint sets of benchmarks such as TPC-C or TPC-H. In this paper we propose a new benchmark, HTAPBench, providing a unified metric for HTAP systems geared toward the execution of constantly increasing OLAP requests limited by an admissible impact on OLTP performance. To achieve this, a load balancer within HTAPBench regulates the coexistence of OLTP and OLAP workloads, proposing a method for the generation of both new data and requests, so that OLAP requests over freshly modified data are comparable across runs. We demonstrate the merit of our approach by validating it with different types of systems: OLTP, OLAP and HTAP; showing that the benchmark is able to highlight the differences between them, while producing queries with comparable complexity across experiments with negligible variability. © 2017 ACM.

2016

Efficient Deduplication in a Distributed Primary Storage Infrastructure

Authors
Paulo, J; Pereira, J;

Publication
ACM TRANSACTIONS ON STORAGE

Abstract
A large amount of duplicate data typically exists across volumes of virtual machines in cloud computing infrastructures. Deduplication allows reclaiming these duplicates while improving the cost-effectiveness of large-scale multitenant infrastructures. However, traditional archival and backup deduplication systems impose prohibitive storage overhead for virtual machines hosting latency-sensitive applications. Primary deduplication systems reduce such penalty but rely on special cluster filesystems, centralized components, or restrictive workload assumptions. Also, some of these systems reduce storage overhead by confining deduplication to off-peak periods that may be scarce in a cloud environment. We present DEDIS, a dependable and fully decentralized system that performs cluster-wide off-line deduplication of virtual machines' primary volumes. DEDIS works on top of any unsophisticated storage backend, centralized or distributed, as long as it exports a basic shared block device interface. Also, DEDIS does not rely on data locality assumptions and incorporates novel optimizations for reducing deduplication overhead and increasing its reliability. The evaluation of an open-source prototype shows that minimal I/O overhead is achievable even when deduplication and intensive storage I/O are executed simultaneously. Also, our design scales out and allows collocating DEDIS components and virtual machines in the same servers, thus, sparing the need of additional hardware.

Supervised
thesis

2015

Gestão de Bases de Dados Relacionais em Cloud Computing

Author
André Dias Costa

Institution
UM