Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Carlos Manuel Soares
  • Role

    External Research Collaborator
  • Since

    01st January 2008
006
Publications

2026

Subgroup Discovery Using Model Uncertainty: A Feasibility Study

Authors
Cravidão Pereira, A; Folgado, D; Barandas, M; Soares, C; Carreiro, V;

Publication
Lecture Notes in Computer Science

Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model’s predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis. © 2025 Elsevier B.V., All rights reserved.

2026

A New Proposal of Layer Insertion in Stacked Autoencoder Neural Networks

Authors
Dos Santos Viana, F; Lopes Pereira, BV; Santos, R; Soares, C; de Almeida Neto, A;

Publication
Lecture Notes in Computer Science

Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network’s search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process. © 2025 Elsevier B.V., All rights reserved.

2025

Time Series Data Augmentation as an Imbalanced Learning Problem

Authors
Cerqueira, V; Moniz, N; Inácio, R; Soares, C;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT II

Abstract
Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be available. Moreover, global models may fail to capture relevant patterns unique to a particular time series. In these cases, data augmentation can be useful to increase the sample size of time series datasets. The main contribution of this work is a novel method for generating univariate time series synthetic samples. Our approach stems from the insight that the observations concerning a particular time series of interest represent only a small fraction of all observations. In this context, we frame the problem of training a forecasting model as an imbalanced learning task. Oversampling strategies are popular approaches used to handle the imbalance problem in machine learning. We use these techniques to create synthetic time series observations and improve the accuracy of forecasting models. We carried out experiments using 7 different databases that contain a total of 5502 univariate time series. We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches.

2025

GASTeNv2: Generative Adversarial Stress Testing Networks with Gaussian Loss

Authors
Teixeira, C; Gomes, I; Cunha, L; Soares, C; van Rijn, JN;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT II

Abstract
As machine learning technologies are increasingly adopted, the demand for responsible AI practices to ensure transparency and accountability grows. To better understand the decision-making processes of machine learning models, GASTeN was developed to generate realistic yet ambiguous synthetic data near a classifier's decision boundary. However, the results were inconsistent, with few images in the low-confidence region and noise. Therefore, we propose a new GASTeN version with a modified architecture and a novel loss function. This new loss function incorporates a multi-objective measure with a Gaussian loss centered on the classifier probability, targeting the decision boundary. Our study found that while the original GASTeN architecture yields the highest Frechet Inception Distance (FID) scores, the updated version achieves lower Average Confusion Distance (ACD) values and consistent performance across low-confidence regions. Both architectures produce realistic and ambiguous images, but the updated one is more reliable, with no instances of GAN mode collapse. Additionally, the introduction of the Gaussian loss enhanced this architecture by allowing for adjustable tolerance in image generation around the decision boundary.

2025

Meta-learning and Data Augmentation for Stress Testing Forecasting Models

Authors
Inácio, R; Cerqueira, V; Barandas, M; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXIII, IDA 2025

Abstract
The effectiveness of time series forecasting models can be hampered by conditions in the input space that lead them to underperform. When those are met, negative behaviours, such as higher-than-usual errors or increased uncertainty are shown. Traditionally, stress testing is applied to assess how models respond to adverse, but plausible scenarios, providing insights on how to improve their robustness and reliability. This paper builds upon this technique by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical features. This way, instead of designing new stress scenarios, this method uses the information provided by instances that led to decreases in forecasting performance. An additional contribution is made, a novel time series data augmentation technique based on oversampling, that improves the information about stress factors in the input space, which elevates the classification capabilities of the method. We conducted experiments using 6 benchmark datasets containing a total of 97.829 time series. The results suggest that MAST is able to identify conditions that lead to large errors effectively.

Supervised
thesis

2024

A Framework to Interpret Multiple Related Rule-based Models

Author
Pedro Rodrigo Caetano Strecht Ribeiro

Institution
UP-FEUP

2024

A Framework to Interpret Multiple Related Rule-based Models

Author
Pedro Rodrigo Caetano Strecht Ribeiro

Institution
UP-FEUP

2024

Enhancing Forecasting using Read & Write Recurrent Neural Networks

Author
Yassine Baghoussi

Institution
UP-FEUP

2019

sistema de apoio à escolha de algoritmos para problemas de optimização

Author
Pedro Manuel Correia de Abreu

Institution
UP-FEUP

2019

Automated Feature Engineering for Classification Problems

Author
Guilherme Felipe do Nascimento Reis

Institution
UP-FEUP