Publications

Publications by Carlos Manuel Soares

2025

Mast: interpretable stress testing via meta-learning for forecasting model robustness evaluation

Authors
Inácio, R; Cerqueira, V; Barandas, M; Soares, C;

Publication
Mach. Learn.

Abstract

2025

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Authors
Vitorino, J; Maia, E; Praça, I; Soares, C;

Publication
CoRR

Abstract

2025

Evaluating Transfer Learning Methods on Real-World Data Streams: A Case Study in Financial Fraud Detection

Authors
Pereira, RR; Bono, J; Ferreira, HM; Ribeiro, P; Soares, C; Bizarro, P;

Publication
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part IX

Abstract
When the available data for a target domain is limited, transfer learning (TL) methods leverage related data-rich source domains to train and evaluate models, before deploying them on the target domain. However, most TL methods assume fixed levels of labeled and unlabeled target data, which contrasts with real-world scenarios where both data and labels arrive progressively over time. As a result, evaluations based on these static assumptions may not reflect how methods perform in practice. To support a more realistic assessment of TL methods in dynamic settings, we propose an evaluation framework that (1) simulates varying data availability over time, (2) creates multiple domains via resampling of a given dataset and (3) introduces inter-domain variability through controlled transformations, e.g., including time-dependent covariate and concept shifts. These capabilities enable the systematic simulation of a large number of variants of the experiments, providing deeper insights into how algorithms may behave when deployed. We demonstrate the usefulness of the proposed framework by performing a case study on a proprietary real-world suite of card payment datasets. To support reproducibility, we also apply the framework on the publicly available Bank Account Fraud (BAF) dataset. By providing a methodology for evaluating TL methods over time and in different data availability conditions, our framework supports a better understanding of model behavior in real-world environments, which enables more informed decisions when deploying models in new domains. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

tsMIST: Model Sensitivity Analysis with Time Series Morphing

Authors
Brito, A; Santos, M; Folgado, D; Soares, C;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
Ensuring robustness in time series classification remains a critical challenge for safety-sensitive domains like clinical decision systems. While current evaluation practices focus on accuracy measures, they fail to address model stability under semantically meaningful input deformations. We propose tsMIST (Time Series Model Sensitivity Test), a novel morphing-based framework that systematically evaluates classifier resilience through controlled interpolation between adversarial class prototypes. By calculating the switchThreshold – defined as the minimal morphing distance required to flip predictions – our method reveals critical stability patterns across synthetic benchmarks with tunable class separation and 17 medical time series datasets. Key findings show convolutional architectures (ROCKET) maintain optimal thresholds near 50% morphing (48.2±3.1%), while feature-based models (Catch22) exhibit premature decision flips at 22.7% deformation (±15.4%). In clinical scenarios, tsMIST detected critical ECG misclassifications triggered by =12% signal variation – vulnerabilities undetected by accuracy measures. Our results establish that robustness measures must complement accuracy for responsible AI in high-stakes applications. This work advances ML evaluation practices by enabling systematic sensitivity analysis, with implications for model auditing and deployment in safety-critical domains. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Hubris Benchmarking with AmbiGANs: Assessing Model Overconfidence with Synthetic Ambiguous Data

Authors
Teixeira, C; Gomes, I; Soares, C; van Rijn, JN;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
The growing deployment of artificial intelligence in critical domains exposes a pressing challenge: how reliably models make predictions for ambiguous data without exhibiting overconfidence. We introduce hubris benchmarking, a methodology to evaluate overconfidence in machine learning models. The benchmark is based on a novel architecture, ambiguous generative adversarial networks (AmbiGANs), which are trained to synthesize realistic yet ambiguous datasets. We also propose the hubris metric to quantitatively measure the extent of model overconfidence when faced with these ambiguous images. We illustrate the usage of the methodology by estimating the hubris of state-of-the-art pre-trained models (ConvNext and ViT) on binarized versions of public datasets, including MNIST, Fashion-MNIST, and Pneumonia Chest X-ray. We found that, while ConvNext is on average 3% more accurate than ViT, it often makes excessively confident predictions, on average by 10% points higher than ViT. These results illustrate the usefulness of hubris benchmarking in high-stakes decision processes. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Meta Subspace Analysis: Understanding Model (Mis)behavior in the Metafeature Space

Authors
Soares, C; Azevedo, PJ; Cerqueira, V; Torgo, L;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance when compared to the average test performance. For instance, in the marketing domain, the approach extracts subgroups such as: in groups of customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose a complementary method, Meta Subspace Analysis (MSA), MSA uses metalearning to analyze these subgroups in the metafeature space. We use association rules to relate metafeatures of the feature space represented by the subgroups to the improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. While the subgroups in the original feature space are useful for the end user and the data scientist developing the corresponding model, the meta-level rules provide a domain-independent perspective on the behavior of the model that is suitable for the same data scientist but also for ML researchers, to understand the behavior of algorithms. We illustrate the approach with the results of two well-known algorithms, naive Bayes and random forest, on the Adult dataset. The results confirm some expected behavior of algorithms. However, and most interestingly, some unexpected behaviors are also obtained, requiring additional investigation. In general, the empirical study demonstrates the usefulness of the approach to obtain additional knowledge about the behavior of models. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract