Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Carlos Manuel Soares
  • Cluster

    Computer Science
  • Role

    External Research Collaborator
  • Since

    01st January 2008
006
Publications

2023

GASTeN: Generative Adversarial Stress Test Networks

Authors
Cunha, L; Soares, C; Restivo, A; Teixeira, LF;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023

Abstract
Concerns with the interpretability of ML models are growing as the technology is used in increasingly sensitive domains (e.g., health and public administration). Synthetic data can be used to understand models better, for instance, if the examples are generated close to the frontier between classes. However, data augmentation techniques, such as Generative Adversarial Networks (GAN), have been mostly used to generate training data that leads to better models. We propose a variation of GANs that, given a model, generates realistic data that is classified with low confidence by a given classifier. The generated examples can be used in order to gain insights on the frontier between classes. We empirically evaluate our approach on two well-known image classification benchmark datasets, MNIST and Fashion MNIST. Results show that the approach is able to generate images that are closer to the frontier when compared to the original ones, but still realistic. Manual inspection confirms that some of those images are confusing even for humans.

2023

Model Selection for Time Series Forecasting An Empirical Analysis of Multiple Estimators

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
NEURAL PROCESSING LETTERS

Abstract
Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data because observations are not independent. Several studies have analyzed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the estimators behave for model selection: the ability to select the best solution among a set of alternatives. This paper addresses this issue. The goal of this work is to compare a set of estimation methods for model selection in time series forecasting tasks. This objective is split into two main questions: (i) analyze how often a given estimation method selects the best possible model; and (ii) analyze what is the performance loss when the best model is not selected. Experiments were carried out using a case study that contains 3111 time series. The accuracy of the estimators for selecting the best solution is low, despite being significantly better than random selection. Moreover, the overall forecasting performance loss associated with the model selection process ranges from 0.28 to 0.58%. Yet, no considerable differences between different approaches were found. Besides, the sample size of the time series is an important factor in the relative performance of the estimators.

2023

Early anomaly detection in time series: a hierarchical approach for predicting critical health episodes

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
MACHINE LEARNING

Abstract
The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle early anomaly detection problems is through standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to address these tasks. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two hierarchical layers, which we hypothesize are easier to solve. The results suggest that the proposed approach leads to a better performance relative to state of the art approaches for critical health episode prediction.

2023

Exploring the Reduction of Configuration Spaces of Workflows

Authors
Freitas, F; Brazdil, P; Soares, C;

Publication
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2023

Federated Learning for Computer-Aided Diagnosis of Glaucoma Using Retinal Fundus Images

Authors
Baptista, T; Soares, C; Oliveira, T; Soares, F;

Publication
Applied Sciences

Abstract
Deep learning approaches require a large amount of data to be transferred to centralized entities. However, this is often not a feasible option in healthcare, as it raises privacy concerns over sharing sensitive information. Federated Learning (FL) aims to address this issue by allowing machine learning without transferring the data to a centralized entity. FL has shown great potential to ensure privacy in digital healthcare while maintaining performance. Despite this, there is a lack of research on the impact of different types of data heterogeneity on the results. In this study, we research the robustness of various FL strategies on different data distributions and data quality for glaucoma diagnosis using retinal fundus images. We use RetinaQualEvaluator to generate quality labels for the datasets and then a data distributor to achieve our desired distributions. Finally, we evaluate the performance of the different strategies on local data and an independent test dataset. We observe that federated learning shows the potential to enable high-performance models without compromising sensitive data. Furthermore, we infer that FedProx is more suitable to scenarios where the distributions and quality of the data of the participating clients is diverse with less communication cost.

Supervised
thesis

2019

An optimization-based wrapper approach for utility-based data mining

Author
José Francisco Cagigal da Silva Gomes

Institution
UP-FEUP

2019

Hyperband for clustering

Author
Diogo Miguel da Rocha Alves

Institution
UP-FEP

2019

Classificação Automática de Episódios Clínicos

Author
Ricardo Manuel da Rocha Melo e Castro

Institution
UP-FEUP

2019

A Supervised Approach to Detect Bias in News Sources

Author
Alexandre Marques de Castro Ribeiro

Institution
UP-FEUP

2019

Ordinal Regression for Stress Levels Classification in Real-World Scenarios

Author
Tiago Bernardes Almeida

Institution
UP-FEUP