2023
Autores
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, RP; Gavaldà, R; Masciari, E; Ras, ZW; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Miranda, DS; Sechidis, K; Canakoglu, A; Pidò, S; Pinoli, P; Bifet, A; Pashami, S;
Publicação
PKDD/ECML Workshops (2)
Abstract
2023
Autores
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, RP; Gavaldà, R; Masciari, E; Ras, ZW; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Miranda, DS; Sechidis, K; Canakoglu, A; Pidò, S; Pinoli, P; Bifet, A; Pashami, S;
Publicação
PKDD/ECML Workshops (1)
Abstract
2022
Autores
Jesus, S; Pombal, J; Alves, D; Cruz, AF; Saleiro, P; Ribeiro, RP; Gama, J; Bizarro, P;
Publicação
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022
Abstract
2022
Autores
Veloso, B; Gama, J; Ribeiro, RP; Pereira, PM;
Publicação
SCIENTIFIC DATA
Abstract
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed) provide a framework that can be easily used and help the development of new machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.
2025
Autores
Silva, PR; Vinagre, J; Gama, J;
Publicação
ICTAI
Abstract
We introduce Fed-VFDT, a federated adaptation of the Very Fast Decision Tree (VFDT) algorithm for classification over streaming data. While VFDT is a widely adopted online learning algorithm, its sequential and order-sensitive nature poses challenges in federated settings, marked by statistical heterogeneity and communication constraints. Fed-VFDT addresses these issues by having each client incrementally train a local VFDT and report split statistics to a central server when a leaf satisfies the Hoeffding criterion. The server selects a global splitting feature by aggregating clients' proposals according to a configurable strategy: quorum, merit-based selection, or majority voting. Once a feature is selected, it is broadcast to all clients, which apply the split at the corresponding tree path using their locally computed thresholds. We evaluate Fed-VFDT against its centralized counterpart using predictive and structural metrics, demonstrating that it maintains comparable performance while reducing communication and preserving synchronized tree growth.
2025
Autores
Lourenço, A; Gama, J; Xing, EP; Marreiros, G;
Publicação
CoRR
Abstract
In streaming scenarios, models must learn continuously, adapting to concept drifts without erasing previously acquired knowledge. However, existing research communities address these challenges in isolation. Continual Learning (CL) focuses on long-term retention and mitigating catastrophic forgetting, often without strict real-time constraints. Stream Learning (SL) emphasizes rapid, efficient adaptation to high-frequency data streams, but typically neglects forgetting. Recent efforts have tried to combine these paradigms, yet no clear algorithmic overlap exists. We argue that large in-context tabular models (LTMs) provide a natural bridge for Streaming Continual Learning (SCL). In our view, unbounded streams should be summarized on-the-fly into compact sketches that can be consumed by LTMs. This recovers the classical SL motivation of compressing massive streams with fixed-size guarantees, while simultaneously aligning with the experience-replay desiderata of CL. To clarify this bridge, we show how the SL and CL communities implicitly adopt a divide-to-conquer strategy to manage the tension between plasticity (performing well on the current distribution) and stability (retaining past knowledge), while also imposing a minimal complexity constraint that motivates diversification (avoiding redundancy in what is stored) and retrieval (re-prioritizing past information when needed). Within this perspective, we propose structuring SCL with LTMs around two core principles of data selection for in-context learning: (1) distribution matching, which balances plasticity and stability, and (2) distribution compression, which controls memory size through diversification and retrieval mechanisms. © 2026 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.