2025
Autores
Brito, A; Santos, M; Folgado, D; Soares, C;
Publicação
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings
Abstract
Ensuring robustness in time series classification remains a critical challenge for safety-sensitive domains like clinical decision systems. While current evaluation practices focus on accuracy measures, they fail to address model stability under semantically meaningful input deformations. We propose tsMIST (Time Series Model Sensitivity Test), a novel morphing-based framework that systematically evaluates classifier resilience through controlled interpolation between adversarial class prototypes. By calculating the switchThreshold – defined as the minimal morphing distance required to flip predictions – our method reveals critical stability patterns across synthetic benchmarks with tunable class separation and 17 medical time series datasets. Key findings show convolutional architectures (ROCKET) maintain optimal thresholds near 50% morphing (48.2±3.1%), while feature-based models (Catch22) exhibit premature decision flips at 22.7% deformation (±15.4%). In clinical scenarios, tsMIST detected critical ECG misclassifications triggered by =12% signal variation – vulnerabilities undetected by accuracy measures. Our results establish that robustness measures must complement accuracy for responsible AI in high-stakes applications. This work advances ML evaluation practices by enabling systematic sensitivity analysis, with implications for model auditing and deployment in safety-critical domains. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Teixeira, C; Gomes, I; Soares, C; van Rijn, JN;
Publicação
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings
Abstract
The growing deployment of artificial intelligence in critical domains exposes a pressing challenge: how reliably models make predictions for ambiguous data without exhibiting overconfidence. We introduce hubris benchmarking, a methodology to evaluate overconfidence in machine learning models. The benchmark is based on a novel architecture, ambiguous generative adversarial networks (AmbiGANs), which are trained to synthesize realistic yet ambiguous datasets. We also propose the hubris metric to quantitatively measure the extent of model overconfidence when faced with these ambiguous images. We illustrate the usage of the methodology by estimating the hubris of state-of-the-art pre-trained models (ConvNext and ViT) on binarized versions of public datasets, including MNIST, Fashion-MNIST, and Pneumonia Chest X-ray. We found that, while ConvNext is on average 3% more accurate than ViT, it often makes excessively confident predictions, on average by 10% points higher than ViT. These results illustrate the usefulness of hubris benchmarking in high-stakes decision processes. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Soares, C; Azevedo, PJ; Cerqueira, V; Torgo, L;
Publicação
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings
Abstract
A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance when compared to the average test performance. For instance, in the marketing domain, the approach extracts subgroups such as: in groups of customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose a complementary method, Meta Subspace Analysis (MSA), MSA uses metalearning to analyze these subgroups in the metafeature space. We use association rules to relate metafeatures of the feature space represented by the subgroups to the improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. While the subgroups in the original feature space are useful for the end user and the data scientist developing the corresponding model, the meta-level rules provide a domain-independent perspective on the behavior of the model that is suitable for the same data scientist but also for ML researchers, to understand the behavior of algorithms. We illustrate the approach with the results of two well-known algorithms, naive Bayes and random forest, on the Adult dataset. The results confirm some expected behavior of algorithms. However, and most interestingly, some unexpected behaviors are also obtained, requiring additional investigation. In general, the empirical study demonstrates the usefulness of the approach to obtain additional knowledge about the behavior of models. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
dos Santos, MR; Cerqueira, V; Soares, C;
Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I
Abstract
Effective selection of forecasting algorithms for time series data is a challenge in machine learning, impacting both predictive accuracy and efficiency. Metalearning, using features extracted from time series, offers a strategic approach to optimize algorithm selection. The utility of this approach depends on the amount of information the features contain about the behavior of the algorithms. Although there are several methods for systematic time series feature extraction, they have never been compared. This paper empirically analyzes the performance of each feature extraction method for algorithm selection and its impact on forecasting accuracy. Our study reveals that TSFRESH, TSFEATURES, and TSFEL exhibit comparable performance at algorithm selection accuracy, adeptly capturing time series characteristics essential for accurate algorithm selection. In contrast, Catch22 is found to be less effective for this purpose. In particular, TSFEL is identified as the most efficient method, balancing dimensionality and predictive performance. These findings provide insights for enhancing forecasting accuracy and efficiency through judicious selection of meta-feature extractors. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Inácio, R; Cerqueira, V; Barandas, M; Soares, C;
Publicação
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part X
Abstract
2025
Autores
Pereira, AC; Folgado, D; Barandas, M; Soares, C; Carreiro, AV;
Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I
Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model’s predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis. © 2025 Elsevier B.V., All rights reserved.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.