Publicacoes - INESC TEC

Publicações

Publicações por Carlos Manuel Soares

2024

Tabular data generation with tensor contraction layers and transformers

Autores
Silva, A; Restivo, A; Santos, M; Soares, C;

Publicação
CoRR

Abstract

2024

Meta-TadGAN: Time Series Anomaly Detection Using TadGAN with Meta-features

Autores
Silva, IOe; Soares, C; Cerqueira, V; Rodrigues, A; Bastardo, P;

Publicação
Progress in Artificial Intelligence - 23rd EPIA Conference on Artificial Intelligence, EPIA 2024, Viana do Castelo, Portugal, September 3-6, 2024, Proceedings, Part III

Abstract
TadGAN is a recent algorithm with competitive performance on time series anomaly detection. The detection process of TadGAN works by comparing observed data with generated data. A challenge in anomaly detection is that there are anomalies which are not easy to detect by analyzing the original time series but have a clear effect on its higher-order characteristics. We propose Meta-TadGAN, an adaptation of TadGAN that analyzes meta-level representations of time series. That is, it analyzes a time series that represents the characteristics of the time series, rather than the original time series itself. Results on benchmark datasets as well as real-world data from fire detectors shows that the new method is competitive with TadGAN. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract

2025

Estimating Completeness of Consensus Models: Geometrical and Distributional Approaches

Autores
Strecht, P; Mendes-Moreira, J; Soares, C;

Publicação
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2024, PT I

Abstract
In many organizations with a distributed operation, not only is data collection distributed, but models are also developed and deployed separately. Understanding the combined knowledge of all the local models may be important and challenging, especially in the case of a large number of models. The automated development of consensus models, which aggregate multiple models into a single one, involves several challenges, including fidelity (ensuring that aggregation does not penalize the predictive performance severely) and completeness (ensuring that the consensus model covers the same space as the local models). In this paper, we address the latter, proposing two measures for geometrical and distributional completeness. The first quantifies the proportion of the decision space that is covered by a model, while the second takes into account the concentration of the data that is covered by the model. The use of these measures is illustrated in a real-world example of academic management, as well as four publicly available datasets. The results indicate that distributional completeness in the deployed models is consistently higher than geometrical completeness. Although consensus models tend to be geometrically incomplete, distributional completeness reveals that they cover the regions of the decision space with a higher concentration of data.

FecharLer Abstract

2025

Reducing algorithm configuration spaces for efficient search

Autores
Freitas, F; Brazdil, P; Soares, C;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space). This increases the probability of including the best one for any dataset but makes the task of identifying it for a new dataset more difficult. In this paper, we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best algorithm configuration, with limited risk of significant loss of predictive performance. We empirically validate the method with a large set of alternatives based on five ML algorithms with different sets of hyperparameters and one preprocessing method (feature selection). Our results show that it is possible to reduce the given search space by more than one order of magnitude, from a few thousands to a few hundred items. After reduction, the search for the best algorithm configuration is about one order of magnitude faster than on the original space without significant loss in predictive performance.

FecharLer Abstract

2024

CNP-MLDM: Contract Net Protocol for Negotiation in Machine Learning Data Market

Autores
Baghcheband, H; Soares, C; Reis, LP;

Publicação
Proceedings of the Discovery Science Late Breaking Contributions 2024 (DS-LB 2024) co-located with 27th International Conference Discovery Science 2024 (DS 2024), Pisa, Italy, 14-16 October 2024.

Abstract
The Machine Learning Data Market (MLDM), which relies on multi-agent systems, necessitates robust negotiation strategies to ensure efficient and fair transactions. The Contract Net Protocol (CNP), a well-established negotiation strategy within Multi-Agent Systems (MAS), offers a promising solution. This paper explores the integration of CNP into MLDM, proposing the CNP-MLDM model to facilitate data exchanges. Characterized by its task announcement and bidding process, CNP enhances negotiation efficiency in MLDM. This paper describes CNP tailored for MLDM, detailing the proposed protocol following experimental results. © 2022 Copyright for this paper by its authors.

FecharLer Abstract

2025

Exploring percolation features with polynomial algorithms for classifying Covid-19 in chest X-ray images

Autores
Roberto, GF; Pereira, DC; Martins, AS; Tosta, TAA; Soares, C; Lumini, A; Rozendo, GB; Neves, LA; Nascimento, MZ;

Publicação
PATTERN RECOGNITION LETTERS

Abstract
Covid-19 is a severe illness caused by the Sars-CoV-2 virus, initially identified in China in late 2019 and swiftly spreading globally. Since the virus primarily impacts the lungs, analyzing chest X-rays stands as a reliable and widely accessible means of diagnosing the infection. In computer vision, deep learning models such as CNNs have been the main adopted approach for detection of Covid-19 in chest X-ray images. However, we believe that handcrafted features can also provide relevant results, as shown previously in similar image classification challenges. In this study, we propose a method for identifying Covid-19 in chest X-ray images by extracting and classifying local and global percolation-based features. This technique was tested on three datasets: one comprising 2,002 segmented samples categorized into two groups (Covid-19 and Healthy); another with 1,125 non-segmented samples categorized into three groups (Covid-19, Healthy, and Pneumonia); and a third one composed of 4,809 non-segmented images representing three classes (Covid-19, Healthy, and Pneumonia). Then, 48 percolation features were extracted and give as input into six distinct classifiers. Subsequently, the AUC and accuracy metrics were assessed. We used the 10-fold cross-validation approach and evaluated lesion sub-types via binary and multiclass classification using the Hermite polynomial classifier, a novel approach in this domain. The Hermite polynomial classifier exhibited the most promising outcomes compared to five other machine learning algorithms, wherein the best obtained values for accuracy and AUC were 98.72% and 0.9917, respectively. We also evaluated the influence of noise in the features and in the classification accuracy. These results, based in the integration of percolation features with the Hermite polynomial, hold the potential for enhancing lesion detection and supporting clinicians in their diagnostic endeavors.

FecharLer Abstract