Publications

Publications by João Gama

2025

Fed-VFDT: Federated Very Fast Decision Trees with Coordinated Splitting Over Data Streams

Authors
Silva, PR; Vinagre, J; Gama, J;

Publication
ICTAI

Abstract
We introduce Fed-VFDT, a federated adaptation of the Very Fast Decision Tree (VFDT) algorithm for classification over streaming data. While VFDT is a widely adopted online learning algorithm, its sequential and order-sensitive nature poses challenges in federated settings, marked by statistical heterogeneity and communication constraints. Fed-VFDT addresses these issues by having each client incrementally train a local VFDT and report split statistics to a central server when a leaf satisfies the Hoeffding criterion. The server selects a global splitting feature by aggregating clients' proposals according to a configurable strategy: quorum, merit-based selection, or majority voting. Once a feature is selected, it is broadcast to all clients, which apply the split at the corresponding tree path using their locally computed thresholds. We evaluate Fed-VFDT against its centralized counterpart using predictive and structural metrics, demonstrating that it maintains comparable performance while reducing communication and preserving synchronized tree growth. © 2025 IEEE.

CloseRead Abstract

2025

Bridging Streaming Continual Learning via In-Context Large Tabular Models

Authors
Lourenço, A; Gama, J; Xing, EP; Marreiros, G;

Publication
CoRR

Abstract

2025

A robust methodology for long-term sustainability evaluation of Machine Learning models

Authors
Ruza, JP; Gama, J; Betanzos, AA; Berdiñas, BG;

Publication
CoRR

Abstract

2025

Histogram approaches for imbalanced data streams regression

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING

Abstract
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshev's inequality as the first methodologies designed explicitly for data streams. However, these approaches operated under the restrictive assumption that rare instances exclusively reside at distribution extremes. This study introduces histogram-based sampling strategies to overcome this constraint, proposing flexible solutions for imbalanced regression in evolving data streams. The proposed techniques - Histogram-based Undersampling (HistUS) and Histogram-based Oversampling (HistOS) - employ incremental online histograms to dynamically detect and prioritize rare instances across arbitrary regions of the target distribution to improve predictions in the rare cases. Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy, outperforming baseline models while maintaining competitiveness with Chebyshev-based approaches.

CloseRead Abstract

2026

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Authors
Salazar, T; Gama, J; Araújo, H; Abreu, PH;

Publication
IEEE Trans. Neural Networks Learn. Syst.

Abstract

2019

Novelty Detection for Multi-Label Stream Classification

Authors
Costa Júnior, JD; de Faria, ER; Andrade Silva, Jd; Gama, J; Cerri, R;

Publication
BRACIS

Abstract
In Multi-Label Stream Classification (MLSC) examples arriving in a stream can be simultaneously classified into multiple classes. This is a very challenging task, especially considering that new classes can emerge during the stream (Concept Evolution), and known classes can change over time (Concept Drift). In real situations, these characteristics come together with a scenario with Infinitely Delayed Labels, where we can never access the true class labels of the examples to update classifiers. In order to overcome these issues, this paper proposes a new method called MultI-label learNing Algorithm for Data Streams with Binary Relevance transformation (MINAS-BR). Our proposal uses a new Novelty Detection (ND) procedure to detect concept evolution and concept drift, being updated in an unsupervised fashion. We also propose a new methodology to evaluate MLSC methods in scenarios with Infinitely Delayed Labels. Experiments over synthetic data sets attested the potential of MINAS-BR, which was able to adapt to different concept drift and concept evolution scenarios, obtaining superior or competitive performances in comparison to literature baselines. © 2019 IEEE.

CloseRead Abstract