Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2025

An explainable machine learning framework for railway predictive maintenance using data streams from the metro operator of Portugal

Authors
García-Méndez, S; de Arriba-Pérez, F; Leal, F; Veloso, B; Malheiro, B; Burguillo-Rial, JC;

Publication
SCIENTIFIC REPORTS

Abstract
The public transportation sector generates large volumes of sensor data that, if analyzed adequately, can help anticipate failures and initiate maintenance actions, thereby enhancing quality and productivity. This work contributes to a real-time data-driven predictive maintenance solution for Intelligent Transportation Systems. The proposed method implements a processing pipeline comprised of sample pre-processing, incremental classification with Machine Learning models, and outcome explanation. This novel online processing pipeline has two main highlights: (i) a dedicated sample pre-processing module, which builds statistical and frequency-related features on the fly, and (ii) an explainability module. This work is the first to perform online fault prediction with natural language and visual explainability. The experiments were performed with the Metropt data set from the metro operator of Porto, Portugal. The results are above 98 % for f-measure and 99 % for accuracy. In the context of railway predictive maintenance, achieving these high values is crucial due to the practical and operational implications of accurate failure prediction. In the specific case of a high f-measure, this ensures that the system maintains an optimal balance between detecting the highest possible number of real faults and minimizing false alarms, which is crucial for maximizing service availability. Furthermore, the accuracy obtained enables reliability, directly impacting cost reduction and increased safety. The analysis demonstrates that the pipeline maintains high performance even in the presence of class imbalance and noise, and its explanations effectively reflect the decision-making process. These findings validate the methodological soundness of the approach and confirm its practical applicability for supporting proactive maintenance decisions in real-world railway operations. Therefore, by identifying the early signs of failure, this pipeline enables decision-makers to understand the underlying problems and act accordingly swiftly.

2025

Informed Data Selection Strategies for Few-Shot Learning on Imbalanced Data

Authors
Alcoforado, A; Ferraz, TP; Okamura, LHT; Veloso, BM; Costa, AHR; Fama, IC; Bueno, BD;

Publication
LINGUAMATICA

Abstract
Acquiring high-quality annotated data remains one of the most significant challenges in Natural Language Processing (NLP), especially for supervised learning approaches. In scenarios where pre-existing labeled data is unavailable, common solutions like crowdsourcing and zero-shot approaches often fall short, suffering from limitations such as the need for large datasets and a lack of guarantees regarding annotation quality. Traditionally, data for human annotation has been selected randomly, a practice that is not only costly and inefficient but also prone to bias, particularly in imbalanced datasets where minority classes are underrepresented. To address these challenges, this work introduces an automatic and informed data selection architecture designed to minimize the volume of required annotations while maximizing the diversity and representativeness of the selected data. Among the evaluated methods, Reverse Semantic Search (RSS) demonstrated superior performance, consistently outperforming random sampling in imbalanced scenarios and enhancing the effectiveness of trained classifiers. Furthermore, we compared RSS with other clustering-based approaches, providing insights into their respective strengths and weaknesses.

2025

Transversal Digital Marketing Curriculum Design

Authors
Pires, PB; Santos, JD; de Brito, PQ; Delgado, C;

Publication
Smart Innovation, Systems and Technologies

Abstract
The advent of new technologies has led to significant changes in the field of marketing, demanding a rethinking of existing knowledge and skills. This research proposes a set of transversal curricula in digital marketing. The methodology employed included an exploratory analysis of digital marketing courses offered at universities and major online platforms, focus groups, and interviews, conducted in four countries. The countries included in the study were Finland, Poland, the Netherlands, and Portugal. The findings indicated that an introductory course and specialization blocks would be beneficial. Social media, analytics, digital advertising, search engine optimization (SEO), digital marketing strategies, web content, e-mail marketing, customer experience, landing pages, user experience, leads, conversion rate optimization, and E-commerce were identified as the key subjects of study for the introductory course in digital marketing. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

2025

Resilient Agent-Based Networks in the Automotive Industry

Authors
Ana Nogueira; Conceição Rocha; Pedro Campos;

Publication
Machine Learning Perspectives of Agent-Based Models

Abstract

2025

Stress-Testing of Multimodal Models in Medical Image-Based Report Generation

Authors
Carvalhido, F; Cardoso, HL; Cerqueira, V;

Publication
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 28

Abstract
Multimodal models, namely vision-language models, present unique possibilities through the seamless integration of different information mediums for data generation. These models mostly act as a black-box, making them lack transparency and explicability. Reliable results require accountable and trustworthy Artificial Intelligence (AI), namely when in use for critical tasks, such as the automatic generation of medical imaging reports for healthcare diagnosis. By exploring stresstesting techniques, multimodal generative models can become more transparent by disclosing their shortcomings, further supporting their responsible usage in the medical field.

2025

CapyMOA: Efficient Machine Learning for Data Streams in Python

Authors
Gomes, HM; Lee, A; Gunasekara, N; Sun, Y; Cassales, GW; Liu, J; Heyden, M; Cerqueira, V; Bahri, M; Koh, YS; Pfahringer, B; Bifet, A;

Publication
CoRR

Abstract

  • 13
  • 504