Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2025

A robust methodology for long-term sustainability evaluation of Machine Learning models

Autores
Ruza, JP; Gama, J; Betanzos, AA; Berdiñas, BG;

Publicação
CoRR

Abstract

2025

Histogram approaches for imbalanced data streams regression

Autores
Aminian, E; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshev's inequality as the first methodologies designed explicitly for data streams. However, these approaches operated under the restrictive assumption that rare instances exclusively reside at distribution extremes. This study introduces histogram-based sampling strategies to overcome this constraint, proposing flexible solutions for imbalanced regression in evolving data streams. The proposed techniques - Histogram-based Undersampling (HistUS) and Histogram-based Oversampling (HistOS) - employ incremental online histograms to dynamically detect and prioritize rare instances across arbitrary regions of the target distribution to improve predictions in the rare cases. Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy, outperforming baseline models while maintaining competitiveness with Chebyshev-based approaches.

FecharLer Abstract

2019

Novelty Detection for Multi-Label Stream Classification

Autores
Costa Júnior, JD; de Faria, ER; Andrade Silva, Jd; Gama, J; Cerri, R;

Publicação
BRACIS

Abstract
In Multi-Label Stream Classification (MLSC) examples arriving in a stream can be simultaneously classified into multiple classes. This is a very challenging task, especially considering that new classes can emerge during the stream (Concept Evolution), and known classes can change over time (Concept Drift). In real situations, these characteristics come together with a scenario with Infinitely Delayed Labels, where we can never access the true class labels of the examples to update classifiers. In order to overcome these issues, this paper proposes a new method called MultI-label learNing Algorithm for Data Streams with Binary Relevance transformation (MINAS-BR). Our proposal uses a new Novelty Detection (ND) procedure to detect concept evolution and concept drift, being updated in an unsupervised fashion. We also propose a new methodology to evaluate MLSC methods in scenarios with Infinitely Delayed Labels. Experiments over synthetic data sets attested the potential of MINAS-BR, which was able to adapt to different concept drift and concept evolution scenarios, obtaining superior or competitive performances in comparison to literature baselines.

FecharLer Abstract

2026

A Deep Learning Framework for Forecasting Medium-Term Covariance in Multiasset Portfolios

Autores
Reis, P; Paula Serra, A; Gama, J;

Publicação
JOURNAL OF FORECASTING

Abstract
Forecasting the covariance matrix of asset returns is central to portfolio construction, risk management, and asset pricing. However, most existing models struggle at medium-term horizons, several weeks to months, where shifting market regimes and slower dynamics prevail. We propose a novel deep learning framework that integrates three-dimensional convolutional neural networks, bidirectional long short-term memory, and multihead attention to capture complex spatiotemporal patterns in asset return dynamics. Using daily data on 14 exchange-traded funds from 2017 to 2023, we demonstrate that our model improves out-of-sample covariance forecasts by reducing Euclidean and Frobenius distance metrics by up to 20% compared with classical benchmarks such as shrinkage estimators and GARCH-type models. These gains persist across distinct market regimes, including bull and bear periods, and remain robust across various forecast horizons and under both raw and excess return specifications. Portfolio simulations based on global minimum variance strategies reveal that the proposed model consistently delivers lower volatility and moderate turnover, even under no-short-selling constraints. This balance between risk reduction and trading efficiency underscores the economic relevance of the forecasts, particularly for institutional investors managing portfolios at medium-term horizons.

FecharLer Abstract

2026

AI effect on Innovation Capacity in the context of Industry 5.0: a Bayesian Network Analysis

Autores
Bécue, A; Gama, J; Brito, PQ;

Publicação
Strategic Business Research

Abstract

2026

Conditional Motif-Based Graph Convolutional Network for Anomaly Detection in the Waste Management Network

Autores
Oliveira, S; Tabassum, S; Gama, J; Garcia, A; Santana, P;

Publicação
IDA

Abstract
Illicit activities in the waste management network, such as waste laundering, misreporting, or trade of stolen waste pose serious environmental and regulatory challenges. Detecting these behaviours is challenging, because they often emerge from higher-order interactions among multiple entities, and are not continuous over time. Furthermore, these activities often manifest as triangles in the network, and the participation of individuals in these waste transfer structures is additionally suspicious. Traditional anomaly detection methods, which rely on first-order relationships or static analyses, struggle to capture these complex, temporally dynamic patterns. To address this challenge, we propose a Conditional Motif-Based Graph Convolutional Network (CM-GCN) that integrates condition-driven triangular motifs directly into the GCN message-passing mechanism. The CM–GCN learns structural embeddings that encode both local graph topology and node attributes–based connectivity to triangular motifs. To detect sudden or sporadic changes, these weekly embeddings are processed by a Long Short–Term Memory Variational Autoencoder (LSTM–VAE), which models temporal behaviour and identifies anomalies through spikes in reconstruction error. Experiments on one year of Portuguese waste transport data demonstrate that the proposed approach effectively highlights companies with known illicit behaviour. The CM–GCN–LSTM–VAE outperformed a standard GCN–LSTM–VAE that ignores motif structure. Results are comparable to, and slightly improve upon, an LSTM–VAE trained on a manually engineered triangle–based feature. This demonstrates that higher–order structural representations learned by the model provide a more informative signal, while simple pairwise relationships contribute little to the detection of complex behaviours. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

FecharLer Abstract