Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2026

Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part II

Authors
Ribeiro, RP; Pfahringer, B; Japkowicz, N; Larrañaga, P; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publication
ECML/PKDD (2)

Abstract

2026

Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part I

Authors
Ribeiro, RP; Pfahringer, B; Japkowicz, N; Larrañaga, P; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publication
ECML/PKDD (1)

Abstract

2026

Subgroup Discovery Using Model Uncertainty: A Feasibility Study

Authors
Cravidão Pereira, A; Folgado, D; Barandas, M; Soares, C; Carreiro, V;

Publication
Lecture Notes in Computer Science

Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model’s predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis. © 2025 Elsevier B.V., All rights reserved.

2026

A New Proposal of Layer Insertion in Stacked Autoencoder Neural Networks

Authors
Dos Santos Viana, F; Lopes Pereira, BV; Santos, R; Soares, C; de Almeida Neto, A;

Publication
Lecture Notes in Computer Science

Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network’s search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process. © 2025 Elsevier B.V., All rights reserved.

2026

A Scalable Approach for Unified Large Events Models in Soccer

Authors
Mendes Neves, T; Meireles, L; Mendes Moreira, JC;

Publication
Lecture Notes in Computer Science

Abstract
Large Events Models (LEMs) are a class of models designed to predict and analyze the sequence of events in soccer matches, capturing the complex dynamics of the game. The original LEM framework, based on a chain of classifiers, faced challenges such as synchronization, scalability issues, and limited context utilization. This paper proposes a unified and scalable approach to model soccer events using a tabular autoregressive model. Our models demonstrate significant improvements over the original LEM, achieving higher accuracy in event prediction and better simulation quality, while also offering greater flexibility and scalability. The unified LEM framework enables a wide range of applications in soccer analytics that we display in this paper, including real-time match outcome prediction, player performance analysis, and game simulation, serving as a general solution for many problems in the field. © 2025 Elsevier B.V., All rights reserved.

2025

Online boxplot derived outlier detection

Authors
Mazarei, A; Sousa, R; Mendes Moreira, J; Molchanov, S; Ferreira, HM;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Outlier detection is a widely used technique for identifying anomalous or exceptional events across various contexts. It has proven to be valuable in applications like fault detection, fraud detection, and real-time monitoring systems. Detecting outliers in real time is crucial in several industries, such as financial fraud detection and quality control in manufacturing processes. In the context of big data, the amount of data generated is enormous, and traditional batch mode methods are not practical since the entire dataset is not available. The limited computational resources further compound this issue. Boxplot is a widely used batch mode algorithm for outlier detection that involves several derivations. However, the lack of an incremental closed form for statistical calculations during boxplot construction poses considerable challenges for its application within the realm of big data. We propose an incremental/online version of the boxplot algorithm to address these challenges. Our proposed algorithm is based on an approximation approach that involves numerical integration of the histogram and calculation of the cumulative distribution function. This approach is independent of the dataset's distribution, making it effective for all types of distributions, whether skewed or not. To assess the efficacy of the proposed algorithm, we conducted tests using simulated datasets featuring varying degrees of skewness. Additionally, we applied the algorithm to a real-world dataset concerning software fault detection, which posed a considerable challenge. The experimental results underscored the robust performance of our proposed algorithm, highlighting its efficacy comparable to batch mode methods that access the entire dataset. Our online boxplot method, leveraging dataset distribution to define whiskers, consistently achieved exceptional outlier detection results. Notably, our algorithm demonstrated computational efficiency, maintaining constant memory usage with minimal hyperparameter tuning.

  • 2
  • 506