Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2025

Network-Based Anomaly Detection in Waste Transportation Data

Autores
Shaji, N; Tabassum, S; Ribeiro, RP; Gama, J; Santana, P; Garcia, A;

Publicação
COMPLEX NETWORKS & THEIR APPLICATIONS XIII, COMPLEX NETWORKS 2024, VOL 1

Abstract
Waste transport management is a critical sector where maintaining accurate records and preventing fraudulent or illegal activities is essential for regulatory compliance, environmental protection, and public safety. However, monitoring and analyzing large-scale waste transport records to identify suspicious patterns or anomalies is a complex task. These records often involve multiple entities and exhibit variability in waste flows between them. Traditional anomaly detection methods relying solely on individual transaction data, may struggle to capture the deeper, network-level anomalies that emerge from the interactions between entities. To address this complexity, we propose a hybrid approach that integrates network-based measures with machine learning techniques for anomaly detection in waste transport data. Our method leverages advanced graph analysis techniques, such as sub-graph detection, community structure analysis, and centrality measures, to extract meaningful features that describe the network's topology. We also introduce novel metrics for edge weight disparities. Further, advanced machine learning techniques, including clustering, neural network, density-based, and ensemble methods are applied to these structural features to enhance and refine the identification of anomalous behaviors.

FecharLer Abstract

2025

Emotion-Enhanced Pain Assessment Protocol

Autores
Alves, B; Almeida, A; Silva, C; Pais, D; Ribeiro, RP; Gama, J; Fernandes, JM; Brás, S; Sebastiao, R;

Publicação
HUMAN AND ARTIFICIAL RATIONALITIES. ADVANCES IN COGNITION, COMPUTATION, AND CONSCIOUSNESS, HAR 2024

Abstract
Pain is a highly subjective phenomenon that depends on multiple factors. The common methods used to evaluate pain require the person to be awakened and cooperative, which may not always be possible. Moreover, such methods are subject to non-quantifiable influences, namely the impact of an individual's emotional state on how pain is perceived or how negative emotions may exacerbate pain perception, while positive emotions may attenuate it. The goal of this study was to conduct a novel protocol for pain induction with emotional elicitation and assess its feasibility. In this protocol, the physiological responses were monitored, and collected, through Electrocardiogram, Electrodermal Activity, and surface Electromyogram signals. Along the protocol, the pain perception was evaluated using a 0-10 numerical rating scale and by registering the time from the pain stimulus beginning to the Pain and Tolerance Thresholds. This study comprised three emotional sessions, negative, positive, and neutral, which were performed through videos of excerpts of terror, comedy, and documentary films, respectively, followed by pain induction using the Cold Pressor Task (CPT). A total of 56 participants performed the study, with a CPT mean time of about 91.70 +/- 39.64 s among all the sessions. The conducted protocol was considered feasible and safe as it allowed the collection of physiological data, pain, and questionnaires' reports from 56 participants, without any harm to them. Moreover, the collected data can be further used to assess how emotional conditions influence pain perception and to provide better emotion-calibrated pain recognition systems based on physiological signals.

FecharLer Abstract

2025

Evaluating Short Text Stream Clustering on Large E-commerce Datasets

Autores
Andrade, C; Ribeiro, RP; Gama, J;

Publicação
INTELLIGENT SYSTEMS, BRACIS 2024, PT III

Abstract
Latent Dirichlet Allocation (LDA) is a fundamental method for clustering short text streams. However, when applied to large datasets, it often faces significant challenges, and its performance is typically evaluated in domain-specific datasets such as news and tweets. This study aims to fill this gap by evaluating the effectiveness of short text clustering methods in a large and diverse e-commerce dataset. We specifically investigate how well these clustering algorithms adapt to the complex dynamics and larger scale of e-commerce text streams, which differ from their usual application domains. Our analysis focuses on the impact of high homogeneity scores on the reported Normalized Mutual Information (NMI) values. We particularly examine whether these scores are inflated due to the prevalence of single-element clusters. To address potential biases in clustering evaluation, we propose using the Akaike Information Criterion (AIC) as an alternative metric to reduce the formation of single-element clusters and provide a more balanced measure of clustering performance. We present new insights for applying short text clustering methodologies in real-world situations, especially in sectors like e-commerce, where text data volumes and dynamics present unique challenges.

FecharLer Abstract

2025

Anomaly Detection in Pet Behavioural Data

Autores
Silva, I; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
Pet owners are increasingly becoming conscious of their pet's necessities and are paying more attention to their overall wellness. The well-being of their pets is intricately linked to their own emotional and physical well-being. Some veterinary system solutions are emerging to provide proactive healthcare options for pets. One such solution offers the continuous monitoring of a pet's activity through accelerometer tracking devices. Based on data collected by this application, in this paper, we study different time aggregation and three unsupervised machine learning techniques to identify anomalies in pet behaviour data. Specifically, three algorithms, Isolation Forest, Local Outlier Factor, and K-Nearest Neighbour, with various thresholds to differentiate between normal and abnormal events. Results conducted on ten pets (five cats and five dogs) show that the most effective approach is to use daily data divided into periods. Moreover, the Local Outlier Factor is the best algorithm for detecting anomalies when prioritizing the identification of true positives. However, it also produces a high false positive ratio.

FecharLer Abstract

2025

Data Science for Fighting Environmental Crime

Autores
Barbosa, M; Ribeiro, C; Gomes, F; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
The rise of environmental crimes has become a major concern globally as they cause significant damage to ecosystems, public health and result in economic losses. The availability of vast sensor data provides an opportunity to analyze environmental data proactively. This helps to detect irregularities and uncover potential criminal activities. This paper highlights the critical role played by machine learning (ML) and remote sensing technologies in the continuously evolving scenarios of environmental crime. By examining some case studies on detecting illegal fishing, illegal oil spills, illegal landfills, and illegal logging, we delve into the practical implementation of data-driven approaches for environmental crime detection. Our goal with this study is to provide an overview of the existing research in this area and foster the use of ML and data science techniques to enhance environmental crime detection.

FecharLer Abstract

2025

Histogram approaches for imbalanced data streams regression

Autores
Aminian, E; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshev's inequality as the first methodologies designed explicitly for data streams. However, these approaches operated under the restrictive assumption that rare instances exclusively reside at distribution extremes. This study introduces histogram-based sampling strategies to overcome this constraint, proposing flexible solutions for imbalanced regression in evolving data streams. The proposed techniques - Histogram-based Undersampling (HistUS) and Histogram-based Oversampling (HistOS) - employ incremental online histograms to dynamically detect and prioritize rare instances across arbitrary regions of the target distribution to improve predictions in the rare cases. Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy, outperforming baseline models while maintaining competitiveness with Chebyshev-based approaches.

FecharLer Abstract