Publications

Publications by LIAAD

2024

Community-Based Topic Modeling with Contextual Outlier Handling

Authors
Andrade, C; Ribeiro, RP; Gama, J;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2024

Abstract
E-commerce has become an essential aspect of modern life, providing consumers globally with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. Standard LDA-based methods often lead to clusters dominated by single elements, effectively failing to manage datasets with varied cluster sizes. Our proposed Community-Based Topic Modeling with Contextual Outlier Handling (CB-TMCOH) algorithm introduces an approach to outlier detection in text data using transformer models for similarity calculations and graph-based clustering. This method efficiently separates outliers and improves clustering in large text datasets, demonstrating its utility not only in e-commerce applications but also proving effective for news and tweets datasets.

CloseRead Abstract

2024

From fault detection to anomaly explanation: A case study on predictive maintenance

Authors
Gama, J; Ribeiro, RP; Mastelini, S; Davari, N; Veloso, B;

Publication
JOURNAL OF WEB SEMANTICS

Abstract
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black -box models are popular approaches based on deep -learning techniques due to their predictive accuracy. This paper proposes a neural -symbolic architecture that uses an online rule -learning algorithm to explain when the black -box model predicts failures. The proposed system solves two problems in parallel: (i) anomaly detection and (ii) explanation of the anomaly. For the first problem, we use an unsupervised state-of-the-art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder's reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non-linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder's reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real -world case study of Metro do Porto and provide explanations that illustrate its benefits.

CloseRead Abstract

2024

A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance

Authors
Gama, J; Ribeiro, RP; Mastelini, SM; Davari, N; Veloso, B;

Publication
CoRR

Abstract

2024

Detecting and Explaining Anomalies in the Air Production Unit of a Train

Authors
Davari, N; Veloso, B; Ribeiro, RP; Gama, J;

Publication
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024

Abstract
Predictive maintenance methods play a crucial role in the early detection of failures and errors in machinery, preventing them from reaching critical stages. This paper presents a comprehensive study on a real-world dataset called MetroPT3, with data from a Metro do Porto train's air production unit (APU) system. The dataset comprises data collected from various analogue and digital sensors installed on the APU system, enabling the analysis of behavioural changes and deviations from normal patterns. We propose a data-driven predictive maintenance framework based on a Long Short-Term Memory Autoencoder (LSTM-AE) network. The LSTM-AE efficiently identifies abnormal data instances, leading to a reduction in false alarm rates. We also implement a Sparse Autoencoder (SAE) approach for comparative analysis. The experimental results demonstrate that the LSTM-AE outperforms the SAE regarding F1 Score, Recall, and Precision. Furthermore, to gain insights into the reasons for anomaly detection, we apply the Shap method to determine the importance of features in the predictive maintenance model. This approach enhances the interpretability of the model to support the decision-making process better.

CloseRead Abstract

2024

Super-Resolution Analysis for Landfill Waste Classification

Authors
Molina, M; Ribeiro, RP; Veloso, B; Carna, J;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering the substantial quality differences and limited annotation, this research explores the adaptability of models across these domains. Motivated by the necessity for a comprehensive evaluation of waste detection algorithms, it advocates cross-domain classification and super-resolution enhancement to analyze the impact of different image resolutions on waste classification as an evaluation to combat the proliferation of illegal landfills. We observed performance improvements by enhancing image quality but noted an influence on model sensitivity, necessitating careful threshold fine-tuning.

CloseRead Abstract

2024

Immigrant groups in Luxembourg's labour market: A symbolic data analysis approach

Authors
Silva, CC; Brito, P; Campos, P;

Publication
STATISTICAL JOURNAL OF THE IAOS

Abstract
Luxembourg, known for its immigration history, attracts immigrants to work. This study analyses different immigrant groups in the labour market from 2014 to 2022 by using Labor Force Survey (LFS) data, Symbolic Data Analysis (SDA), and the Monitoring the Evolution of Clusters (MEC) framework.Based on the birthplace and length of residence in Luxembourg, in each year, microdata were aggregated into 21 symbolic objects. They were primarily described by 16 modal variables which are multi-valued variables with a frequency attached to each category. Moreover, clustering using complete linkage and the Chernoff's distance was applied. The Heuristic Identification of Noisy Variables (HINoV) suggested that with just six variables, objects may be grouped homogeneously. The MEC framework traced temporal relations and transitions between the clusters, revealing some movements across the different years.Results indicate that people from the European Union (EU) and Neighbouring countries have similar profiles while the Portuguese have opposite characteristics. The Luxembourgers are somewhere in between. Profiling people from non-EU countries was challenging.The data and methodology used make it easy to replicate the work in other nations, enabling comparison of results and monitoring to continue in the future.

CloseRead Abstract