Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Gama

2023

Fault Forecasting Using Data-Driven Modeling: A Case Study for Metro do Porto Data Set

Autores
Davari, N; Veloso, B; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II

Abstract
The demand for high-performance solutions for anomaly detection and forecasting fault events is increasing in the industrial area. The detection and forecasting faults from time-series data are one critical mission in the Internet of Things (IoT) data mining. The classical fault detection approaches based on physical modelling are limited to some measurable output variables. Accurate physical modelling of vehicle dynamics requires substantial prior information about the system. On the other hand, data-driven modelling techniques accurately represent the system's dynamic from data collection. Experimental results on large-scale data sets from Metro do Porto subsystems verify that our method performs high-quality fault detection and forecasting solutions. Also, health indicator obtained from the principal component analysis of the forecasting solution is applied to predict the remaining useful life.

2026

A two-stage framework for early failure detection in predictive maintenance: A case study on metro trains

Autores
Toribio, L; Veloso, B; Gama, J; Zafra, A;

Publicação
NEUROCOMPUTING

Abstract
Early fault detection remains a critical challenge in predictive maintenance (PdM), particularly within critical infrastructure, where undetected failures or delayed interventions can compromise safety and disrupt operations. Traditional anomaly detection methods are typically reactive, relying on real-time sensor data to identify deviations as they occur. This reactive nature often provides insufficient lead time for effective maintenance planning. To address this limitation, we propose a novel two-stage early detection framework that integrates time series forecasting with anomaly detection to anticipate equipment failures several hours in advance. In the first stage, future sensor signal values are predicted using forecasting models; in the second, conventional anomaly detection algorithms are applied directly to the forecasted data. By shifting from real-time to anticipatory detection, the framework aims to deliver actionable early warnings, enabling timely and preventive maintenance. We validate this approach through a case study focused on metro train systems, an environment where early fault detection is crucial for minimizing service disruptions, optimizing maintenance schedules, and ensuring passenger safety. The framework is evaluated across three forecast horizons (1, 3, and 6 hours ahead) using twelve state-of-the-art anomaly detection algorithms from diverse methodological families. Detection performance is assessed using five performance metrics. Results show that anomaly detection remains highly effective at short to medium horizons, with performance at 1-hour and 3-hour forecasts comparable to that of real-time data. Ensemble and deep learning models exhibit strong robustness to forecast uncertainty, maintaining consistent results with real-time data even at 6-hour forecasts. In contrast, distance- and density-based models suffer substantial degradation at longer horizons (6-hours), reflecting their sensitivity to distributional shifts in predicted signals. Overall, the proposed framework offers a practical and extensible solution for enhancing traditional PdM systems with proactive capabilities. By enabling early anomaly detection on forecasted data, it supports improved decision-making, operational resilience, and maintenance planning in industrial environments.

2025

Unveiling Fairness and Performance of Causal Discovery

Autores
Teixeira, S; Nogueira, AR; Gama, J;

Publicação
DSAA

Abstract
Data-driven decision models based on Artificial Intelligence (AI) are increasingly adopted across domains. However, these models are susceptible to bias that can result in unfair or discriminatory outcomes. Recent research has explored causal discovery methods as a promising way to understand and improve fairness in decision-making systems. In this work, we investigate how different conditional independence tests used in constraint-based causal discovery algorithms, specifically the PC algorithm, affect fairness and performance. We perform an empirical evaluation on several datasets, including Portuguese public contracts, COMPAS, and the German Credit dataset. Using seven conditional independence tests, we assess model behavior under fairness (demographic parity, accuracy parity, equalized odds and predictive rate parity) and performance (accuracy, F1-score, AUC) metrics. Our findings reveal that some tests, due to their statistical properties, fail to expose unfairness detectable via causal structures, even when performance metrics appear acceptable. Furthermore, we highlight significant differences in computational efficiency among the tests, with x2-Adf, sp-mi, and sp-x2 being the least efficient. This study underscores the need for careful selection of conditional independence tests in causal discovery to ensure both fairness and reliability in data-driven decision systems. © 2025 IEEE.

2021

Progress in Artificial Intelligence

Autores
Eugénio Oliveira; João Gama; Zita Vale; Henrique Lopes Cardoso;

Publicação

Abstract

2025

Network-based Anomaly Detection in Waste Transportation Data with Limited Supervision

Autores
Shaji, N; Tabassum, S; Ribeiro, RP; Gama, J; Gorgulho, J; Garcia, A; Santana, P;

Publicação
APPLIED NETWORK SCIENCE

Abstract
Detecting anomalies in Waste transportation networks is vital for uncovering illegal or unsafe activities, that can have serious environmental and regulatory consequences. Identifying anomalies in such networks presents a significant challenge due to the limited availability of labeled data and the subtle nature of illicit activities. Moreover, traditional anomaly detection methods relying solely on individual transaction data may overlook deeper, network-level irregularities that arise from complex interactions between entities, especially in the absence of labeled data. This study explores anomaly detection in a waste transport network using unsupervised learning, enhanced by limited supervision and enriched with network structure information. Initially, unsupervised models like Isolation Forest, K-Means, LOF, and Autoencoders were applied using statistical and graph-based features. These models detected outliers without prior labels. Later, information on a few confirmed anomalous users enabled weak supervision, guiding feature selection through statistical tests like Kolmogorov-Smirnov and Anderson-Darling. Results show that models trained on a reduced, graph-focused feature set improved anomaly detection, particularly under extreme class imbalance. Isolation Forest notably ranked known anomalies highly. Ego network visualizations supported these findings, demonstrating the value of integrating structural features and limited labels for identifying subtle, relational anomalies.

2025

Fish swarm parameter self-tuning for data streams

Autores
Veloso, B; Neto, HA; Buarque, F; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Hyper-parameter optimization in machine learning models is critical for achieving peak performance. Over the past few years, numerous researchers have worked on this optimization challenge. They primarily focused on batch learning tasks where data distributions remain relatively unchanged. However, addressing the properties of data streams poses a substantial challenge. With the rapid evolution of technology, the demand for sophisticated techniques to handle dynamic data streams is becoming increasingly urgent. This paper introduces a novel adaptation of the Fish School Search (FSS) Algorithm for online hyper-parameter optimization, the FSS-SPT. The FSS-SPT is a solution designed explicitly for the dynamic context of data streams. One fundamental property of the FSS-SPT is that it can change between exploration and exploitation modes to cope with the concept drift and converge to reasonable solutions. Our experiments on different datasets provide compelling evidence of the superior performance of our proposed methodology, the FSS-SPT. It outperformed existing algorithms in two machine learning tasks, demonstrating its potential for practical application.

  • 47
  • 97