Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2025

On-device edge learning for IoT data streams: a survey

Authors
Lourenço, A; Rodrigo, J; Gama, J; Marreiros, G;

Publication
CoRR

Abstract

2025

In-context learning of evolving data streams with tabular foundational models

Authors
Lourenço, A; Gama, J; Xing, EP; Marreiros, G;

Publication
CoRR

Abstract

2025

DFDT: Dynamic Fast Decision Tree for IoT Data Stream Mining on Edge Devices

Authors
Lourenço, A; Rodrigo, J; Gama, J; Marreiros, G;

Publication
CoRR

Abstract

2026

Interpretable rules for online failure prediction: a case study on metro do porto datasets

Authors
Jakobs, M; Veloso, B; Gama, J;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Predictive maintenance applications have increasingly been approached with deep learning techniques in recent years due to their high predictive performance. However, as in other real-world application scenarios, the need for explainability is often stated but not sufficiently addressed, which can limit adoption in practice. In this study, we will focus on predicting failures of trains operating in Porto, Portugal. While recent works have found high-performing deep neural network architectures that feature a parallel explainability pipeline, we find that the generated explanations can be hard to comprehend in practice due to their low support over the failure range. In this work, we propose a novel online rule-learning approach that is able to generate simple rules that cover the entirety of the detected failures. We evaluate our method against AMRules, a state-of-the-art online rule-learning approach, on two datasets gathered from trains operated by Metro do Porto. Our experiments show that our approach consistently generates rules with very high support that are simultaneously short and interpretable.

2025

One-Class Learning for Data Stream Through Graph Neural Networks

Authors
Gôlo, MPS; Gama, J; Marcacini, RM;

Publication
INTELLIGENT SYSTEMS, BRACIS 2024, PT IV

Abstract
In many data stream applications, there is a normal concept, and the objective is to identify normal and abnormal concepts by training only with normal concept instances. This scenario is known in the literature as one-class learning (OCL) for data streams. In this OCL scenario for data streams, we highlight two main gaps: (i) lack of methods based on graph neural networks (GNNs) and (ii) lack of interpretable methods. We introduce OPENCAST (One-class graPh autoENCoder for dAta STream), a new method for data streams based on OCL and GNNs. Our method learns representations while encapsulating the instances of interest through a hypersphere. OPENCAST learns low-dimensional representations to generate interpretability in the representation learning process. OPENCAST achieved state-of-the-art results for data streams in the OCL scenario, outperforming seven other methods. Furthermore, OPENCAST learns low-dimensional representations, generating interpretability in the representation learning process and results.

2025

Evaluating Short Text Stream Clustering on Large E-commerce Datasets

Authors
Andrade, C; Ribeiro, RP; Gama, J;

Publication
INTELLIGENT SYSTEMS, BRACIS 2024, PT III

Abstract
Latent Dirichlet Allocation (LDA) is a fundamental method for clustering short text streams. However, when applied to large datasets, it often faces significant challenges, and its performance is typically evaluated in domain-specific datasets such as news and tweets. This study aims to fill this gap by evaluating the effectiveness of short text clustering methods in a large and diverse e-commerce dataset. We specifically investigate how well these clustering algorithms adapt to the complex dynamics and larger scale of e-commerce text streams, which differ from their usual application domains. Our analysis focuses on the impact of high homogeneity scores on the reported Normalized Mutual Information (NMI) values. We particularly examine whether these scores are inflated due to the prevalence of single-element clusters. To address potential biases in clustering evaluation, we propose using the Akaike Information Criterion (AIC) as an alternative metric to reduce the formation of single-element clusters and provide a more balanced measure of clustering performance. We present new insights for applying short text clustering methodologies in real-world situations, especially in sectors like e-commerce, where text data volumes and dynamics present unique challenges.

  • 53
  • 97