Nuno Almeida Machado

O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais

Instituição
Investigação
Domínios de Investigação
Inteligência Artificial

Bioengenharia

Comunicações

Ciência e Engenharia dos Computadores
Fotónica

Sistemas de Energia

Robótica

Engenharia e Gestão de Sistemas
CENTROS DE INVESTIGAÇÃO
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Inovação
Inovação / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Tecnologias Disponíveis
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratórios
Laboratórios de Investigação

iilab
Comunicação
Notícias

Eventos

Media

Boletim Informativo
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Junte-se a nós
Contactos

Home
Pessoas
Nuno Almeida Machado

Ler apresentação completa

Nuno Machado é investigador de Pós-Doutoramento no Laboratório de Software Confiável (HASLab) da Universidade do Minho e INESC-TEC. O seu tópico de investigação foca-se no desenho de sistemas distribuídos resilientes e escaláveis para análise de quantidades massivas de dados. Adicionalmente, trabalha/tem interesse em soluções que garantam privacidade em cenários de computação na nuvem e Internet-das-Coisas. Nuno obteve o Doutoramento em Engenharia Informática e de Computadores no Instituto Superior Técnico (Universidade de Lisboa), sob orientação de Luís Rodrigues. Durante o doutoramento, trabalhou em técnicas automáticas de depuração de aplicações paralelas que permitem reproduzir erros de concorrência de forma determinística, bem como isolar as suas causas. No verão de 2014, Nuno estagiou na Microsoft Research (Redmond), onde trabalhou com Brandon Lucia em depuração de erros de concorrência.

Ler apresentação completa

Sobre

Nuno obteve o Doutoramento em Engenharia Informática e de Computadores no Instituto Superior Técnico (Universidade de Lisboa), sob orientação de Luís Rodrigues. Durante o doutoramento, trabalhou em técnicas automáticas de depuração de aplicações paralelas que permitem reproduzir erros de concorrência de forma determinística, bem como isolar as suas causas.

No verão de 2014, Nuno estagiou na Microsoft Research (Redmond), onde trabalhou com Brandon Lucia em depuração de erros de concorrência.

Tópicos
de interesse

Detalhes

Nome
Nuno Almeida Machado
Cargo
Investigador Colaborador Externo
Desde
13 julho 2016

Nacionalidade
Portugal
Centro
Laboratório de Software Confiável
Contactos
+351253604440
nuno.a.machado@inesctec.pt

Publicações

Ler todas as publicações

2021

Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

Autores
Neves, F; Machado, N; Vilaca, R; Pereira, J;

Publicação
51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021)

Abstract
Logs are still the primary resource for debugging distributed systems executions. Complexity and heterogeneity of modern distributed systems, however, make log analysis extremely challenging. First, due to the sheer amount of messages, in which the execution paths of distinct system components appear interleaved. Second, due to unsynchronized physical clocks, simply ordering the log messages by timestamp does not suffice to obtain a causal trace of the execution. To address these issues, we present Horus, a system that enables the refinement of distributed system logs in a causally-consistent and scalable fashion. Horus leverages kernel-level probing to capture events for tracking causality between application-level logs from multiple sources. The events are then encoded as a directed acyclic graph and stored in a graph database, thus allowing the use of rich query languages to reason about runtime behavior. Our case study with TrainTicket, a ticket booking application with 40+ microservices, shows that Horus surpasses current widely-adopted log analysis systems in pinpointing the root cause of anomalies in distributed executions. Also, we show that Horus builds a causally-consistent log of a distributed execution with much higher performance (up to 3 orders of magnitude) and scalability than prior state-of-the-art solutions. Finally, we show that Horus' approach to query causality is up to 30 times faster than graph database built-in traversal algorithms.

FecharLer Abstract

2019

Minha: Large-Scale Distributed Systems Testing Made Practical

Autores
Machado, N; Maia, F; Neves, F; Coelho, F; Pereira, J;

Publicação
23rd International Conference on Principles of Distributed Systems, OPODIS 2019, December 17-19, 2019, Neuchâtel, Switzerland.

Abstract
Testing large-scale distributed system software is still far from practical as the sheer scale needed and the inherent non-determinism make it very expensive to deploy and use realistically large environments, even with cloud computing and state-of-the-art automation. Moreover, observing global states without disturbing the system under test is itself difficult. This is particularly troubling as the gap between distributed algorithms and their implementations can easily introduce subtle bugs that are disclosed only with suitably large scale tests. We address this challenge with Minha, a framework that virtualizes multiple JVM instances in a single JVM, thus simulating a distributed environment where each host runs on a separate machine, accessing dedicated network and CPU resources. The key contributions are the ability to run off-the-shelf concurrent and distributed JVM bytecode programs while at the same time scaling up to thousands of virtual nodes; and enabling global observation within standard software testing frameworks. Our experiments with two distributed systems show the usefulness of Minha in disclosing errors, evaluating global properties, and in scaling tests orders of magnitude with the same hardware resources. © Nuno Machado, Francisco Maia, Francisco Neves, Fábio Coelho, and José Pereira; licensed under Creative Commons License CC-BY 23rd International Conference on Principles of Distributed Systems (OPODIS 2019).

FecharLer Abstract

2019

Concurrency Debugging with MaxSMT

Autores
Terra Neves, M; Machado, N; Lynce, I; Manquinho, V;

Publicação
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE

Abstract
Current Maximum Satisfiability (MaxSAT) algorithms based on successive calls to a powerful Satisfiability (SAT) solver are now able to solve real-world instances in many application domains. Moreover, replacing the SAT solver with a Satisfiability Modulo Theories (SMT) solver enables effective MaxSMT algorithms. However, MaxSMT has seldom been used in debugging multi-threaded software. Multi-threaded programs are usually non-deterministic due to the huge number of possible thread operation schedules, which makes them much harder to debug than sequential programs. A recent approach to isolate the root cause of concurrency bugs in multi-threaded software is to produce a report that shows the differences between a failing and a non-failing execution. However, since they rely solely on heuristics, these reports can be unnecessarily large. Hence, reports may contain operations that are not relevant to the bug's occurrence. This paper proposes the use of MaxSMT for the generation of minimal reports for multi-threaded software with concurrency bugs. The proposed techniques report situations that the existing techniques are not able to identify. Experimental results show that using MaxSMT can significantly improve the accuracy of the generated reports and, consequently, their usefulness in debugging the root cause of concurrency bugs.

FecharLer Abstract

2018

CoopREP: Cooperative record and replay of concurrency bugs

Autores
Machado, N; Romano, P; Rodrigues, L;

Publicação
SOFTWARE TESTING VERIFICATION & RELIABILITY

Abstract
This paper presents CoopREP, a system that provides support for fault replication of concurrent programs based on cooperative recording and partial log combination. CoopREP uses partial logging to reduce the amount of information that a given program instance is required to store to support deterministic replay. This allows reducing substantially the overhead imposed by the instrumentation of the code, but raises the problem of finding a combination of logs capable of replaying the fault. CoopREP tackles this issue by introducing several innovative statistical analysis techniques aimed at guiding the search of the partial logs to be combined and needed for the replay phase. CoopREP has been evaluated using both standard benchmarks for multithreaded applications and real-world applications. The results highlight that CoopREP can successfully replay concurrency bugs involving tens of thousands of memory accesses, while reducing recording overhead with respect to state-of-the-art noncooperative logging schemes by up to 13x (and by 2.4x on average).

FecharLer Abstract

2018

Falcon: A Practical Log-based Analysis Tool for Distributed Systems

Autores
Neves, F; Machado, N; Pereira, J;

Publicação
2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN)

Abstract
Programmers and support engineers typically rely on log data to narrow down the root cause of unexpected behaviors in dependable distributed systems. Unfortunately, the inherently distributed nature and complexity of such distributed executions often leads to multiple independent logs, scattered across different physical machines, with thousands or millions entries poorly correlated in terms of event causality. This renders log-based debugging a tedious, time-consuming, and potentially inconclusive task. We present Falcon, a tool aimed at making log-based analysis of distributed systems practical and effective. Falcon's modular architecture, designed as an extensible pipeline, allows it to seamlessly combine several distinct logging sources and generate a coherent space-time diagram of distributed executions. To preserve event causality, even in the presence of logs collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule. Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the log-based analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation.

FecharLer Abstract

Nuno Almeida Machado

Sobre

Detalhes

Nome

Cargo

Desde

Nacionalidade

Centro

Contactos

Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

Minha: Large-Scale Distributed Systems Testing Made Practical

Concurrency Debugging with MaxSMT

CoopREP: Cooperative record and replay of concurrency bugs

Falcon: A Practical Log-based Analysis Tool for Distributed Systems