Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Currently he is a senior researcher at HASLab and MACC, University of Minho & INESC TEC , working on high performance computing in both parallel and distributed systems and former query engine technical director at LeanXcale. He obtained the Ph.D in the MAP-i Doctoral Programme in Computer Science in 2012. He has a strong background in distributed systems and large scale data management and around 15 years of experience in national and international research projects in distributed systems: secure and large scale query processing, cloud computing, NoSQL and SQL databases, and database replication. He had worked in several European research projects CloudDBAppliance, CrowdHealth, BigDataStack, VineYard, CoherentPaaS and LeanBigData, CumuloNimbo, Gorda. He had co-supervise several research grant holders and master thesis and currently is co-supervisor of 2 PhD students. He had published research papers on large scale and dependable distributed systems and has served as reviewer for several highly reputed conferences such as Eurosys, SRDS, Middleware, DSN, OPODIS, LADC, DAIS. He has also created and served as chair of the WPSDS workshop.

Interest
Topics
Details

Details

  • Name

    Ricardo Pereira Vilaça
  • Cluster

    Computer Science
  • Role

    Assistant Researcher
  • Since

    01st November 2011
003
Publications

2022

AIDA-DB: A Data Management Architecture for the Edge and Cloud Continuum

Authors
Faria, N; Costa, D; Pereira, J; Vilaça, R; Ferreira, L; Coelho, F;

Publication
19th IEEE Annual Consumer Communications & Networking Conference, CCNC 2022, Las Vegas, NV, USA, January 8-11, 2022

Abstract

2022

Adaptive database synchronization for an online analytical cioud-to-edge continuum

Authors
Costa, D; Pereira, J; Vilaca, R; Faria, N;

Publication
37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Authors
Moreno, M; Vilaca, R; Ferreira, PG;

Publication
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

2021

Detailed Black-Box Monitoring of Distributed Systems

Authors
Neves, F; Vilaca, R; Pereira, J;

Publication
APPLIED COMPUTING REVIEW

Abstract
Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, inter-application traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

2021

Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

Authors
Neves, F; Machado, N; Vilaca, R; Pereira, J;

Publication
51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021)

Abstract
Logs are still the primary resource for debugging distributed systems executions. Complexity and heterogeneity of modern distributed systems, however, make log analysis extremely challenging. First, due to the sheer amount of messages, in which the execution paths of distinct system components appear interleaved. Second, due to unsynchronized physical clocks, simply ordering the log messages by timestamp does not suffice to obtain a causal trace of the execution. To address these issues, we present Horus, a system that enables the refinement of distributed system logs in a causally-consistent and scalable fashion. Horus leverages kernel-level probing to capture events for tracking causality between application-level logs from multiple sources. The events are then encoded as a directed acyclic graph and stored in a graph database, thus allowing the use of rich query languages to reason about runtime behavior. Our case study with TrainTicket, a ticket booking application with 40+ microservices, shows that Horus surpasses current widely-adopted log analysis systems in pinpointing the root cause of anomalies in distributed executions. Also, we show that Horus builds a causally-consistent log of a distributed execution with much higher performance (up to 3 orders of magnitude) and scalability than prior state-of-the-art solutions. Finally, we show that Horus' approach to query causality is up to 30 times faster than graph database built-in traversal algorithms.

Supervised
thesis

2022

Orchestration and Distribution of Services in Hybrid Cloud/Edge Environments

Author
João Pedro Machado Vilaça

Institution
UM

2022

Data Lakes em ambientes híbridos Cloud/Edge

Author
Daniel Vilar da Costa

Institution
UM

2021

Holistic performance and scalability analysis for large-scale distributed systems

Author
Francisco Nuno Teixeira Neves

Institution
UM

2021

Query Optimizers Based on Machine Learning Techniques

Author
Rui Pedro Sousa Rodrigues do Souto

Institution
UM

2021

Trade-offs between privacy and efficiency on databases

Author
Rogério António da Costa Pontes

Institution
UP-FCUP