Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2024

Kernel Corrector LSTM

Autores
Tuna, R; Baghoussi, Y; Soares, C; Mendes-Moreira, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT II, IDA 2024

Abstract
Forecasting methods are affected by data quality issues in two ways: 1. they are hard to predict, and 2. they may affect the model negatively when it is updated with new data. The latter issue is usually addressed by pre-processing the data to remove those issues. An alternative approach has recently been proposed, Corrector LSTM (cLSTM), which is a Read & Write Machine Learning (RW-ML) algorithm that changes the data while learning to improve its predictions. Despite promising results being reported, cLSTM is computationally expensive, as it uses a meta-learner to monitor the hidden states of the LSTM. We propose a new RW-ML algorithm, Kernel Corrector LSTM (KcLSTM), that replaces the meta-learner of cLSTM with a simpler method: Kernel Smoothing. We empirically evaluate the forecasting accuracy and the training time of the new algorithm and compare it with cLSTM and LSTM. Results indicate that it is able to decrease the training time while maintaining a competitive forecasting accuracy.

FecharLer Abstract

2024

Corrector LSTM: built-in training data correction for improved time-series forecasting

Autores
Baghoussi, Y; Soares, C; Moreira, JM;

Publicação
Neural Comput. Appl.

Abstract
Traditional recurrent neural networks (RNNs) are essential for processing time-series data. However, they function as read-only models, lacking the ability to directly modify the data they learn from. In this study, we introduce the corrector long short-term memory (cLSTM), a Read & Write LSTM architecture that not only learns from the data but also dynamically adjusts it when necessary. The cLSTM model leverages two key components: (a) predicting LSTM’s cell states using Seasonal Autoregressive Integrated Moving Average (SARIMA) and (b) refining the training data based on discrepancies between actual and forecasted cell states. Our empirical validation demonstrates that cLSTM surpasses read-only LSTM models in forecasting accuracy across the Numenta Anomaly Benchmark (NAB) and M4 Competition datasets. Additionally, cLSTM exhibits superior performance in anomaly detection compared to hierarchical temporal memory (HTM) models. © The Author(s) 2024.

FecharLer Abstract

2024

RHiOTS: A Framework for Evaluating Hierarchical Time Series Forecasting Algorithms

Autores
Roque, L; Soares, C; Torgo, L;

Publicação
PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024

Abstract
We introduce the Robustness of Hierarchically Organized Time Series (RHiOTS) framework, designed to assess the robustness of hierarchical time series forecasting models and algorithms on real-world datasets. Hierarchical time series, where lower-level forecasts must sum to upper-level ones, are prevalent in various contexts, such as retail sales across countries. Current empirical evaluations of forecasting methods are often limited to a small set of benchmark datasets, offering a narrow view of algorithm behavior. RHiOTS addresses this gap by systematically altering existing datasets and modifying the characteristics of individual series and their interrelations. It uses a set of parameterizable transformations to simulate those changes in the data distribution. Additionally, RHiOTS incorporates an innovative visualization component, turning complex, multidimensional robustness evaluation results into intuitive, easily interpretable visuals. This approach allows an in-depth analysis of algorithm and model behavior under diverse conditions. We illustrate the use of RHiOTS by analyzing the predictive performance of several algorithms. Our findings show that traditional statistical methods are more robust than state-of-the-art deep learning algorithms, except when the transformation effect is highly disruptive. Furthermore, we found no significant differences in the robustness of the algorithms when applying specific reconciliation methods, such as MinT. RHiOTS provides researchers with a comprehensive tool for understanding the nuanced behavior of forecasting algorithms, offering a more reliable basis for selecting the most appropriate method for a given problem.

FecharLer Abstract

2024

Machine Learning Data Market Based on Multiagent Systems

Autores
Baghcheband, H; Soares, C; Reis, LP;

Publicação
IEEE INTERNET COMPUTING

Abstract
Today, autonomous agents, the Internet of Things, and smart devices produce more and more distributed data and use them to learn models for different purposes. One challenge is that learning from local data only may lead to suboptimal models. Thus, better models are expected if agents can exchange data, leading to approaches such as federated learning. However, these approaches assume that data have no value and, thus, is exchanged for free. A machine learning data market (MLDM), a framework based on multiagent systems with a market-based perspective on data exchange, was recently proposed. In an MLDM, each agent trains its model based on both local data and data bought from other agents. Although the empirical results are interesting, several challenges are still open, including data acquisition and data valuation. The MLDM is an illustrative example of how the value of data can and should be integrated into the design of distributed ML systems.

FecharLer Abstract

2024

RIFF: Inducing Rules for Fraud Detection from Decision Trees

Autores
Martins, L; Bravo, J; Gomes, AS; Soares, C; Bizarro, P;

Publicação
RULES AND REASONING, RULEML+RR 2024

Abstract
Financial fraud is the cause of multi-billion dollar losses annually. Traditionally, fraud detection systems rely on rules due to their transparency and interpretability, key features in domains where decisions need to be explained. However, rule systems require significant input from domain experts to create and tune, an issue that rule induction algorithms attempt to mitigate by inferring rules directly from data. We explore the application of these algorithms to fraud detection, where rule systems are constrained to have a low false positive rate (FPR) or alert rate, by proposing RIFF, a rule induction algorithm that distills a low FPR rule set directly from decision trees. Our experiments show that the induced rules are often able to maintain or improve performance of the original models for low FPR tasks, while substantially reducing their complexity and outperforming rules hand-tuned by experts.

FecharLer Abstract

2024

An Empirical Evaluation of DeepAR for Univariate Time Series Forecasting

Autores
Gomes, RU; Soares, C; Reis, LP;

Publicação
Progress in Artificial Intelligence - 23rd EPIA Conference on Artificial Intelligence, EPIA 2024, Viana do Castelo, Portugal, September 3-6, 2024, Proceedings, Part III

Abstract
DeepAR is a popular probabilistic time series forecasting algorithm. According to the authors, DeepAR is particularly suitable to build global models using hundreds of related time series. For this reason, it is a common expectation that DeepAR obtains poor results in univariate forecasting [10]. However, there are no empirical studies that clearly support this. Here, we compare the performance of DeepAR with standard forecasting models to assess its performance regarding 1 step-ahead forecasts. We use 100 time series from the M4 competition to compare univariate DeepAR with univariate LSTM and SARIMAX models, both for point and quantile forecasts. Results show that DeepAR obtains good results, which contradicts common perception. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract