Publications

Publications by LIAAD

2025

tsMIST: Model Sensitivity Analysis with Time Series Morphing

Authors
Brito, A; Santos, M; Folgado, D; Soares, C;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
Ensuring robustness in time series classification remains a critical challenge for safety-sensitive domains like clinical decision systems. While current evaluation practices focus on accuracy measures, they fail to address model stability under semantically meaningful input deformations. We propose tsMIST (Time Series Model Sensitivity Test), a novel morphing-based framework that systematically evaluates classifier resilience through controlled interpolation between adversarial class prototypes. By calculating the switchThreshold – defined as the minimal morphing distance required to flip predictions – our method reveals critical stability patterns across synthetic benchmarks with tunable class separation and 17 medical time series datasets. Key findings show convolutional architectures (ROCKET) maintain optimal thresholds near 50% morphing (48.2±3.1%), while feature-based models (Catch22) exhibit premature decision flips at 22.7% deformation (±15.4%). In clinical scenarios, tsMIST detected critical ECG misclassifications triggered by =12% signal variation – vulnerabilities undetected by accuracy measures. Our results establish that robustness measures must complement accuracy for responsible AI in high-stakes applications. This work advances ML evaluation practices by enabling systematic sensitivity analysis, with implications for model auditing and deployment in safety-critical domains. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Hubris Benchmarking with AmbiGANs: Assessing Model Overconfidence with Synthetic Ambiguous Data

Authors
Teixeira, C; Gomes, I; Soares, C; van Rijn, JN;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
The growing deployment of artificial intelligence in critical domains exposes a pressing challenge: how reliably models make predictions for ambiguous data without exhibiting overconfidence. We introduce hubris benchmarking, a methodology to evaluate overconfidence in machine learning models. The benchmark is based on a novel architecture, ambiguous generative adversarial networks (AmbiGANs), which are trained to synthesize realistic yet ambiguous datasets. We also propose the hubris metric to quantitatively measure the extent of model overconfidence when faced with these ambiguous images. We illustrate the usage of the methodology by estimating the hubris of state-of-the-art pre-trained models (ConvNext and ViT) on binarized versions of public datasets, including MNIST, Fashion-MNIST, and Pneumonia Chest X-ray. We found that, while ConvNext is on average 3% more accurate than ViT, it often makes excessively confident predictions, on average by 10% points higher than ViT. These results illustrate the usefulness of hubris benchmarking in high-stakes decision processes. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Meta Subspace Analysis: Understanding Model (Mis)behavior in the Metafeature Space

Authors
Soares, C; Azevedo, PJ; Cerqueira, V; Torgo, L;

Publication
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance when compared to the average test performance. For instance, in the marketing domain, the approach extracts subgroups such as: in groups of customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose a complementary method, Meta Subspace Analysis (MSA), MSA uses metalearning to analyze these subgroups in the metafeature space. We use association rules to relate metafeatures of the feature space represented by the subgroups to the improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. While the subgroups in the original feature space are useful for the end user and the data scientist developing the corresponding model, the meta-level rules provide a domain-independent perspective on the behavior of the model that is suitable for the same data scientist but also for ML researchers, to understand the behavior of algorithms. We illustrate the approach with the results of two well-known algorithms, naive Bayes and random forest, on the Adult dataset. The results confirm some expected behavior of algorithms. However, and most interestingly, some unexpected behaviors are also obtained, requiring additional investigation. In general, the empirical study demonstrates the usefulness of the approach to obtain additional knowledge about the behavior of models. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Benchmarking Time Series Feature Extraction for Algorithm Selection

Authors
dos Santos, MR; Cerqueira, V; Soares, C;

Publication
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I

Abstract
Effective selection of forecasting algorithms for time series data is a challenge in machine learning, impacting both predictive accuracy and efficiency. Metalearning, using features extracted from time series, offers a strategic approach to optimize algorithm selection. The utility of this approach depends on the amount of information the features contain about the behavior of the algorithms. Although there are several methods for systematic time series feature extraction, they have never been compared. This paper empirically analyzes the performance of each feature extraction method for algorithm selection and its impact on forecasting accuracy. Our study reveals that TSFRESH, TSFEATURES, and TSFEL exhibit comparable performance at algorithm selection accuracy, adeptly capturing time series characteristics essential for accurate algorithm selection. In contrast, Catch22 is found to be less effective for this purpose. In particular, TSFEL is identified as the most efficient method, balancing dimensionality and predictive performance. These findings provide insights for enhancing forecasting accuracy and efficiency through judicious selection of meta-feature extractors. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Modeling events and interactions through temporal processes: A survey

Authors
Liguori, A; Caroprese, L; Minici, M; Veloso, B; Spinnato, F; Nanni, M; Manco, G; Gama, J;

Publication
NEUROCOMPUTING

Abstract
In real-world scenarios, numerous phenomena generate a series of events that occur in continuous time. Point processes provide a natural mathematical framework for modeling these event sequences. In this comprehensive survey, we aim to explore probabilistic models that capture the dynamics of event sequences through temporal processes. We revise the notion of event modeling and provide the mathematical foundations that underpin the existing literature on this topic. To structure our survey effectively, we introduce an ontology that categorizes the existing approaches considering three horizontal axes: modeling, inference and estimation, and application. We conduct a systematic review of the existing approaches, with a particular focus on those leveraging deep learning techniques. Finally, we delve into the practical applications where these proposed techniques can be harnessed to address real-world problems related to event modeling. Additionally, we provide a selection of benchmark datasets that can be employed to validate the approaches for point processes.

CloseRead Abstract

2025

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Authors
Salazar, T; Gama, J; Araújo, H; Abreu, PH;

Publication
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Abstract
In the evolving field of machine learning, ensuring group fairness has become a critical concern, prompting the development of algorithms designed to mitigate bias in decision-making processes. Group fairness refers to the principle that a model's decisions should be equitable across different groups defined by sensitive attributes such as gender or race, ensuring that individuals from privileged groups and unprivileged groups are treated fairly and receive similar outcomes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time, while another does not, leading to a decrease in fairness even if accuracy (ACC) remains fairly stable. Within the framework of federated learning (FL), where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. The most significant contribution of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the field of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift, which uses a multimodel approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning.

CloseRead Abstract