Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2025

GASTeNv2: Generative Adversarial Stress Testing Networks with Gaussian Loss

Autores
Teixeira, C; Gomes, I; Cunha, L; Soares, C; van Rijn, JN;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT II

Abstract
As machine learning technologies are increasingly adopted, the demand for responsible AI practices to ensure transparency and accountability grows. To better understand the decision-making processes of machine learning models, GASTeN was developed to generate realistic yet ambiguous synthetic data near a classifier's decision boundary. However, the results were inconsistent, with few images in the low-confidence region and noise. Therefore, we propose a new GASTeN version with a modified architecture and a novel loss function. This new loss function incorporates a multi-objective measure with a Gaussian loss centered on the classifier probability, targeting the decision boundary. Our study found that while the original GASTeN architecture yields the highest Frechet Inception Distance (FID) scores, the updated version achieves lower Average Confusion Distance (ACD) values and consistent performance across low-confidence regions. Both architectures produce realistic and ambiguous images, but the updated one is more reliable, with no instances of GAN mode collapse. Additionally, the introduction of the Gaussian loss enhanced this architecture by allowing for adjustable tolerance in image generation around the decision boundary.

FecharLer Abstract

2025

Time Series Data Augmentation as an Imbalanced Learning Problem

Autores
Cerqueira, V; Moniz, N; Inácio, R; Soares, C;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT II

Abstract
Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be available. Moreover, global models may fail to capture relevant patterns unique to a particular time series. In these cases, data augmentation can be useful to increase the sample size of time series datasets. The main contribution of this work is a novel method for generating univariate time series synthetic samples. Our approach stems from the insight that the observations concerning a particular time series of interest represent only a small fraction of all observations. In this context, we frame the problem of training a forecasting model as an imbalanced learning task. Oversampling strategies are popular approaches used to handle the imbalance problem in machine learning. We use these techniques to create synthetic time series observations and improve the accuracy of forecasting models. We carried out experiments using 7 different databases that contain a total of 5502 univariate time series. We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches.

FecharLer Abstract

2025

Mast: interpretable stress testing via meta-learning for forecasting model robustness evaluation

Autores
Inácio, R; Cerqueira, V; Barandas, M; Soares, C;

Publicação
MACHINE LEARNING

Abstract
Evaluating and documenting the robustness of forecasting models to different input conditions is important for their responsible deployment in real-world applications. Time series forecasting models often exhibit degraded performance in the form of unusually large errors, high uncertainty, or hubris (high errors coupled with low uncertainty). Traditional stress testing approaches rely on manually designed adverse scenarios that fail to systematically identify unknown stress factors, in which data characteristics indicate potential issues. To overcome this limitation, this paper introduces MAST (Meta-learning and data Augmentation for Stress Testing), a novel method for stress testing forecasting models. MAST leverages model outputs (error scores and prediction intervals) to automatically identify and characterize input conditions that induce stress. Specifically, MAST is a binary probabilistic classifier that predicts the likelihood of forecasting model stress based on time series features. An additional contribution is a novel time series data augmentation approach based on oversampling or synthetic time series generation, that improves the information about stress factors in the input space, resulting in increased stress classification performance. Experiments were conducted using 6 benchmark datasets containing a total of 97.829 time series. We demonstrate how MAST is able to identify and explain input conditions that lead to manifestations of stress, namely large errors, high uncertainty, or hubris.

FecharLer Abstract

2025

Read-write LSTM: A Novel Approach Integrating Backpropagation to Data in LSTM

Autores
Baghoussi, Y; Soares, C; Moreira, JM;

Publicação
2025 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM

Abstract
Traditional recurrent neural networks operate as passive observers of data, unable to modify the information they learn from despite errors that may arise from suboptimal input representations. We introduce Read & Write LSTM (read-write LSTM), a new variant within the family of read & write machine learning (RW-ML) architectures that address this fundamental limitation by integrating input modification directly into the backpropagation process. Read-write LSTM establishes a dynamic feedback loop where input representations evolve alongside model weights through gradient transformation mechanisms. Our approach introduces a principled gradient scaling framework with an adaptive correction rate that carefully controls the extent of data modification, preserving data integrity while enhancing representational power. We comprehensively evaluate read-write LSTM against traditional LSTMs and state-of-the-art transformer models on the M4 competition and Numenta Anomaly Benchmark datasets, demonstrating significant improvements in forecasting accuracy. Notably, read-write LSTM consistently outperforms standard LSTM models in over 70% of time series with complex patterns and achieves superior performance on 55% of anomaly-rich datasets. Through extensive experimentation and analysis, we establish both the theoretical foundations and practical benefits of integrating data modification with neural computation, paving the way for a new generation of adaptive learning systems that actively reshape their inputs rather than merely adapting to them.

FecharLer Abstract

2025

Unveiling Fairness and Performance of Causal Discovery

Autores
Teixeira, S; Nogueira, AR; Gama, J;

Publicação
2025 IEEE 12TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, DSAA

Abstract
Data-driven decision models based on Artificial Intelligence (AI) are increasingly adopted across domains. However, these models are susceptible to bias that can result in unfair or discriminatory outcomes. Recent research has explored causal discovery methods as a promising way to understand and improve fairness in decision-making systems. In this work, we investigate how different conditional independence tests used in constraint-based causal discovery algorithms, specifically the PC algorithm, affect fairness and performance. We perform an empirical evaluation on several datasets, including Portuguese public contracts, COMPAS, and the German Credit dataset. Using seven conditional independence tests, we assess model behavior under fairness (demographic parity, accuracy parity, equalized odds and predictive rate parity) and performance (accuracy, F1score, AUC) metrics. Our findings reveal that some tests, due to their statistical properties, fail to expose unfairness detectable via causal structures, even when performance metrics appear acceptable. Furthermore, we highlight significant differences in computational efficiency among the tests, with x2-adf, sp-mi, and sp-x2 being the least efficient. This study underscores the need for careful selection of conditional independence tests in causal discovery to ensure both fairness and reliability in data-driven decision systems.

FecharLer Abstract

2025

Fish swarm parameter self-tuning for data streams

Autores
Veloso, B; Neto, HA; Buarque, F; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Hyper-parameter optimization in machine learning models is critical for achieving peak performance. Over the past few years, numerous researchers have worked on this optimization challenge. They primarily focused on batch learning tasks where data distributions remain relatively unchanged. However, addressing the properties of data streams poses a substantial challenge. With the rapid evolution of technology, the demand for sophisticated techniques to handle dynamic data streams is becoming increasingly urgent. This paper introduces a novel adaptation of the Fish School Search (FSS) Algorithm for online hyper-parameter optimization, the FSS-SPT. The FSS-SPT is a solution designed explicitly for the dynamic context of data streams. One fundamental property of the FSS-SPT is that it can change between exploration and exploitation modes to cope with the concept drift and converge to reasonable solutions. Our experiments on different datasets provide compelling evidence of the superior performance of our proposed methodology, the FSS-SPT. It outperformed existing algorithms in two machine learning tasks, demonstrating its potential for practical application.

FecharLer Abstract