Publications

2025

Bridging Streaming Continual Learning via In-Context Large Tabular Models

Authors
Lourenço, A; Gama, J; Xing, EP; Marreiros, G;

Publication
CoRR

Abstract
In streaming scenarios, models must learn continuously, adapting to concept drifts without erasing previously acquired knowledge. However, existing research communities address these challenges in isolation. Continual Learning (CL) focuses on long-term retention and mitigating catastrophic forgetting, often without strict real-time constraints. Stream Learning (SL) emphasizes rapid, efficient adaptation to high-frequency data streams, but typically neglects forgetting. Recent efforts have tried to combine these paradigms, yet no clear algorithmic overlap exists. We argue that large in-context tabular models (LTMs) provide a natural bridge for Streaming Continual Learning (SCL). In our view, unbounded streams should be summarized on-the-fly into compact sketches that can be consumed by LTMs. This recovers the classical SL motivation of compressing massive streams with fixed-size guarantees, while simultaneously aligning with the experience-replay desiderata of CL. To clarify this bridge, we show how the SL and CL communities implicitly adopt a divide-to-conquer strategy to manage the tension between plasticity (performing well on the current distribution) and stability (retaining past knowledge), while also imposing a minimal complexity constraint that motivates diversification (avoiding redundancy in what is stored) and retrieval (re-prioritizing past information when needed). Within this perspective, we propose structuring SCL with LTMs around two core principles of data selection for in-context learning: (1) distribution matching, which balances plasticity and stability, and (2) distribution compression, which controls memory size through diversification and retrieval mechanisms. © 2026 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CloseRead Abstract

2025

A robust methodology for long-term sustainability evaluation of Machine Learning models

Authors
Ruza, JP; Gama, J; Betanzos, AA; Berdiñas, BG;

Publication
CoRR

Abstract

2025

LP-GRU Model: A Graph Analytics Approach to Detect Misinformation Infiltrators in Online Communities

Authors
Karmakar, D; Malta, MC; Maji, G; Dutta, A;

Publication
International Conference on Communication Systems and Networks, COMSNETS

Abstract
Fighting the propagation of misinformation within a social media group or community by focusing on identifying dishonest members who deliberately try to quash any constructive social movement is very challenging because such people use advanced tactics to create division and doubt by manipulating information. The present research aims to develop a hybrid heuristic model to identify those who intentionally spread misleading information on social media to jeopardize a social movement. We frame this issue under the heading of Graph Semi-supervised Learning (GSSL), and we propose a hybrid model that falls under the heuristic approach, called Label Propagation-Gated Recurrent Unit (LP-GRU). LP-GRU can effectively identify perpetrators of disinformation within social communities by fusing community structure from the Label Propagation algorithm with behavioral patterns identified by GRU. Compared to previous heuristic approaches, we achieve up to 76% accuracy when using the LP-GRU model on augmented semi-synthetic social network data. © 2025 IEEE.

CloseRead Abstract

2025

KEIGO: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy

Authors
Adao, R; Wu, ZJ; Zhou, CJ; Balmau, O; Paulo, J; Macedo, R;

Publication
PROCEEDINGS OF THE VLDB ENDOWMENT

Abstract
We present Keigo, a concurrency-and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques-concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write-and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.

CloseRead Abstract

2025

Histogram approaches for imbalanced data streams regression

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING

Abstract
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshev's inequality as the first methodologies designed explicitly for data streams. However, these approaches operated under the restrictive assumption that rare instances exclusively reside at distribution extremes. This study introduces histogram-based sampling strategies to overcome this constraint, proposing flexible solutions for imbalanced regression in evolving data streams. The proposed techniques - Histogram-based Undersampling (HistUS) and Histogram-based Oversampling (HistOS) - employ incremental online histograms to dynamically detect and prioritize rare instances across arbitrary regions of the target distribution to improve predictions in the rare cases. Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy, outperforming baseline models while maintaining competitiveness with Chebyshev-based approaches.

CloseRead Abstract

2025

Large Language Models and Intelligent Agents in Education

Authors
Brito, WAT; Paulino, A; Mendes, M; Reis, A;

Publication
TECHNOLOGY AND INNOVATION IN LEARNING, TEACHING AND EDUCATION, TECH-EDU 2024, PT I

Abstract
This study examines the potential applications of large language models (LLMs) and intelligent agents in educational environments, with a particular focus on their role in enhancing the quality of teaching and learning processes. It provides a comprehensive overview of LLMs, emphasizing their capabilities in natural language analysis and generation. Furthermore, the study examines the potential for collaboration between LLMs and intelligent agents. While LLMs offer a foundation for AI capabilities, intelligent agents utilize these technologies to perform autonomous and context-aware actions within educational systems. A comparative analysis of various intelligent agent platforms, including Autogen, Langra, Crew AI, LM Studio, and Olama, constitutes a central component of this research. This study addresses the criteria that informed the selection of Crew AI for a case study, with a particular focus on its adaptability, ease of integration, and task execution capabilities in comparison to the other platforms. The research includes an analysis of the platform's performance in a controlled educational environment, highlighting the advantages of Crew AI in system functionality. These results demonstrate the necessity for a strategic and well-structured approach to integrating LLMs and intelligent agents, as their successful implementation can foster new competencies, enhance stakeholder engagement, and offer innovative teaching and learning experiences.

CloseRead Abstract

252
4496