Publicacoes - INESC TEC

Publicações

Publicações por Bruno Miguel Veloso

2025

Interpretable Rules for Online Failure Prediction: A Case Study on the Metro do Porto dataset

Autores
Jakobs, M; Veloso, B; Gama, J;

Publicação
CoRR

Abstract

2025

Prioritisation of Studies In Sustainable Urban Mobility Via Fuzzy-Topsis: A Methodological Approach For Systematic Reviews

Autores
Arianna Teixeira Pereira; Janielle Da Silva Lago; Yvelyne Bianca Iunes Santos; Bruno Miguel Delindro Veloso; Norma Ely Santos Beltrão;

Publicação
Revista de Gestão Social e Ambiental

Abstract
Objective: This study investigates the applicability of systematic methods in the identification and evaluation of studies on sustainable urban mobility, providing subsidies to guide managers and policymakers in the development of efficient and environmentally responsible public policies. Method: The methodology adopted for this research comprises a Systematic Literature Review (SLR) associated with the Fuzzy-TOPSIS method, a multi-criteria model capable of evaluating and prioritizing studies considering the imprecision inherent in decision-making processes. The PICO technique was used to define the analysis criteria, and the PRISMA protocol ensured the transparency and replicability of the results. Six criteria were established in the qualitative analyses for treatment in the Fuzzy-TOPSIS method. Results and Discussion: The proposed approach proved effective in selecting the most relevant studies. The discussion points to the need to integrate Fuzzy-TOPSIS with complementary methods, such as DEMATEL and Social Network Analysis (SNA), in order to improve the modeling of causal relationships and strengthen the reliability of prioritization. Research Implications: The results offer important insights for urban planning and the formulation of public policies, contributing to energy efficiency, reducing GHG emissions and improving the quality of public transport. Originality/Value: The innovation of this study lies in the combination of quantitative and qualitative approaches to the analysis of sustainable mobility, providing a robust benchmark that can positively influence practices and strategies in urban management.

FecharLer Abstract

2025

Efficient Instance Selection in Tree-Based Models for Data Streams Classification

Autores
Paim, AM; Gama, J; Veloso, B; Enembreck, F; Ribeiro, RP;

Publicação
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
The learning from continuous data streams is a relevant area within machine learning, focusing on the creation and updating of predictive models in real time as new data becomes available for training and prediction. Among the most widely used methods for this type of task, Hoeffding Trees are highly valued for their simplicity and robustness across a variety of applications and are considered the primary choice for generating decision trees in data stream contexts. However, Hoeffding Trees tend to continuously expand as new data is incorporated, resulting in increased processing time and memory consumption, often without providing significant gains in accuracy. In this study, we propose an instance selection scheme that combines different strategies to regularize Hoeffding Trees and their variants, mitigating excessive growth without compromising model accuracy. The method selects misclassified instances and a fraction of correctly classified instances during the training phase. After extensive experimental evaluation, the instance selection scheme demonstrates superior predictive performance compared to the original models (without selection), for both real and synthetic datasets for data streams, using a reduced subset of examples. Additionally, the method achieves relevant improvements in processing time, model complexity, and memory consumption, highlighting the effectiveness of the proposed instance selection scheme.

FecharLer Abstract

2024

Anonymised Phone Call Dataset for Anomaly Detection

Autores
Veloso, B; Martins, C; Espanha, R; Silva, PR; Azevedo, R; Gama, J;

Publicação

Abstract

2025

An explainable machine learning framework for railway predictive maintenance using data streams from the metro operator of Portugal

Autores
García-Méndez, S; de Arriba-Pérez, F; Leal, F; Veloso, B; Malheiro, B; Burguillo-Rial, JC;

Publicação
SCIENTIFIC REPORTS

Abstract
The public transportation sector generates large volumes of sensor data that, if analyzed adequately, can help anticipate failures and initiate maintenance actions, thereby enhancing quality and productivity. This work contributes to a real-time data-driven predictive maintenance solution for Intelligent Transportation Systems. The proposed method implements a processing pipeline comprised of sample pre-processing, incremental classification with Machine Learning models, and outcome explanation. This novel online processing pipeline has two main highlights: (i) a dedicated sample pre-processing module, which builds statistical and frequency-related features on the fly, and (ii) an explainability module. This work is the first to perform online fault prediction with natural language and visual explainability. The experiments were performed with the Metropt data set from the metro operator of Porto, Portugal. The results are above 98 % for f-measure and 99 % for accuracy. In the context of railway predictive maintenance, achieving these high values is crucial due to the practical and operational implications of accurate failure prediction. In the specific case of a high f-measure, this ensures that the system maintains an optimal balance between detecting the highest possible number of real faults and minimizing false alarms, which is crucial for maximizing service availability. Furthermore, the accuracy obtained enables reliability, directly impacting cost reduction and increased safety. The analysis demonstrates that the pipeline maintains high performance even in the presence of class imbalance and noise, and its explanations effectively reflect the decision-making process. These findings validate the methodological soundness of the approach and confirm its practical applicability for supporting proactive maintenance decisions in real-world railway operations. Therefore, by identifying the early signs of failure, this pipeline enables decision-makers to understand the underlying problems and act accordingly swiftly.

FecharLer Abstract

2025

Informed Data Selection Strategies for Few-Shot Learning on Imbalanced Data

Autores
Alcoforado, A; Ferraz, TP; Okamura, LHT; Veloso, BM; Costa, AHR; Fama, IC; Bueno, BD;

Publicação
LINGUAMATICA

Abstract
Acquiring high-quality annotated data remains one of the most significant challenges in Natural Language Processing (NLP), especially for supervised learning approaches. In scenarios where pre-existing labeled data is unavailable, common solutions like crowdsourcing and zero-shot approaches often fall short, suffering from limitations such as the need for large datasets and a lack of guarantees regarding annotation quality. Traditionally, data for human annotation has been selected randomly, a practice that is not only costly and inefficient but also prone to bias, particularly in imbalanced datasets where minority classes are underrepresented. To address these challenges, this work introduces an automatic and informed data selection architecture designed to minimize the volume of required annotations while maximizing the diversity and representativeness of the selected data. Among the evaluated methods, Reverse Semantic Search (RSS) demonstrated superior performance, consistently outperforming random sampling in imbalanced scenarios and enhancing the effectiveness of trained classifiers. Furthermore, we compared RSS with other clustering-based approaches, providing insights into their respective strengths and weaknesses.

FecharLer Abstract