Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2026

Benchmarking Time Series Feature Extraction for Algorithm Selection

Authors
Santos, M; Cerqueira, V; Soares, C;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Effective selection of forecasting algorithms for time series data is a challenge in machine learning, impacting both predictive accuracy and efficiency. Metalearning, using features extracted from time series, offers a strategic approach to optimize algorithm selection. The utility of this approach depends on the amount of information the features contain about the behavior of the algorithms. Although there are several methods for systematic time series feature extraction, they have never been compared. This paper empirically analyzes the performance of each feature extraction method for algorithm selection and its impact on forecasting accuracy. Our study reveals that TSFRESH, TSFEATURES, and TSFEL exhibit comparable performance at algorithm selection accuracy, adeptly capturing time series characteristics essential for accurate algorithm selection. In contrast, Catch22 is found to be less effective for this purpose. In particular, TSFEL is identified as the most efficient method, balancing dimensionality and predictive performance. These findings provide insights for enhancing forecasting accuracy and efficiency through judicious selection of meta-feature extractors.

2026

MASTFM: Meta-learning and Data Augmentation to Stress Test Forecasting Models

Authors
Inácio, R; Cerqueira, V; Barandas, M; Soares, C;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES. APPLIED DATA SCIENCE TRACK AND DEMO TRACK, ECML PKDD 2025, PT X

Abstract
Time series forecasting is pivotal across industries, as it fosters data-driven decision-making, increasing the chances of successful outcomes. Yet, certain instances that feature adverse characteristics, may lead models to manifest stress through decreases in performance (e.g., large errors). Hence, the ability to preemptively identify such cases, while establishing their root causes, would be advantageous to elevate the understanding of forecasting processes, informing users about the trustworthiness of predictions. Hence, we propose MASTFM, a method based on meta-learning that leverages statistical characteristics of input time series, and estimations of forecasting performance from model outputs, to build a metamodel that learns conditions for stress. Given that such occurrences are naturally rare, data augmentation is employed to ensure balance during training. Moreover, SHapley Additive exPlanations (SHAP) are used to explain how features impact forecasting behaviour.

2026

Subgroup Discovery Using Model Uncertainty: A Feasibility Study

Authors
Pereira, AC; Folgado, D; Barandas, M; Soares, C; Carreiro, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model's predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis.

2026

A New Proposal of Layer Insertion in Stacked Autoencoder Neural Networks

Authors
Viana, FD; Pereira, BVL; Santos, M; Soares, C; Neto, AD;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network's search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process.

2026

Online Data Augmentation for Forecasting with Deep Learning

Authors
Cerqueira, V; Santos, M; Roque, L; Baghoussi, Y; Soares, C;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Deep learning approaches are increasingly used to tackle forecasting tasks but require substantial training data. When samples are limited, synthetic data generation techniques can effectively augment datasets to improve model performance. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.

2026

A two-stage framework for early failure detection in predictive maintenance: A case study on metro trains

Authors
Toribio, L; Veloso, B; Gama, J; Zafra, A;

Publication
NEUROCOMPUTING

Abstract
Early fault detection remains a critical challenge in predictive maintenance (PdM), particularly within critical infrastructure, where undetected failures or delayed interventions can compromise safety and disrupt operations. Traditional anomaly detection methods are typically reactive, relying on real-time sensor data to identify deviations as they occur. This reactive nature often provides insufficient lead time for effective maintenance planning. To address this limitation, we propose a novel two-stage early detection framework that integrates time series forecasting with anomaly detection to anticipate equipment failures several hours in advance. In the first stage, future sensor signal values are predicted using forecasting models; in the second, conventional anomaly detection algorithms are applied directly to the forecasted data. By shifting from real-time to anticipatory detection, the framework aims to deliver actionable early warnings, enabling timely and preventive maintenance. We validate this approach through a case study focused on metro train systems, an environment where early fault detection is crucial for minimizing service disruptions, optimizing maintenance schedules, and ensuring passenger safety. The framework is evaluated across three forecast horizons (1, 3, and 6 hours ahead) using twelve state-of-the-art anomaly detection algorithms from diverse methodological families. Detection performance is assessed using five performance metrics. Results show that anomaly detection remains highly effective at short to medium horizons, with performance at 1-hour and 3-hour forecasts comparable to that of real-time data. Ensemble and deep learning models exhibit strong robustness to forecast uncertainty, maintaining consistent results with real-time data even at 6-hour forecasts. In contrast, distance- and density-based models suffer substantial degradation at longer horizons (6-hours), reflecting their sensitivity to distributional shifts in predicted signals. Overall, the proposed framework offers a practical and extensible solution for enhancing traditional PdM systems with proactive capabilities. By enabling early anomaly detection on forecasted data, it supports improved decision-making, operational resilience, and maintenance planning in industrial environments.

  • 3
  • 513