Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Carlos Manuel Soares
  • Cluster

    Informática
  • Cargo

    Investigador Colaborador Externo
  • Desde

    01 janeiro 2008
006
Publicações

2023

GASTeN: Generative Adversarial Stress Test Networks

Autores
Cunha, L; Soares, C; Restivo, A; Teixeira, LF;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023

Abstract
Concerns with the interpretability of ML models are growing as the technology is used in increasingly sensitive domains (e.g., health and public administration). Synthetic data can be used to understand models better, for instance, if the examples are generated close to the frontier between classes. However, data augmentation techniques, such as Generative Adversarial Networks (GAN), have been mostly used to generate training data that leads to better models. We propose a variation of GANs that, given a model, generates realistic data that is classified with low confidence by a given classifier. The generated examples can be used in order to gain insights on the frontier between classes. We empirically evaluate our approach on two well-known image classification benchmark datasets, MNIST and Fashion MNIST. Results show that the approach is able to generate images that are closer to the frontier when compared to the original ones, but still realistic. Manual inspection confirms that some of those images are confusing even for humans.

2023

Model Selection for Time Series Forecasting An Empirical Analysis of Multiple Estimators

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
NEURAL PROCESSING LETTERS

Abstract
Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data because observations are not independent. Several studies have analyzed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the estimators behave for model selection: the ability to select the best solution among a set of alternatives. This paper addresses this issue. The goal of this work is to compare a set of estimation methods for model selection in time series forecasting tasks. This objective is split into two main questions: (i) analyze how often a given estimation method selects the best possible model; and (ii) analyze what is the performance loss when the best model is not selected. Experiments were carried out using a case study that contains 3111 time series. The accuracy of the estimators for selecting the best solution is low, despite being significantly better than random selection. Moreover, the overall forecasting performance loss associated with the model selection process ranges from 0.28 to 0.58%. Yet, no considerable differences between different approaches were found. Besides, the sample size of the time series is an important factor in the relative performance of the estimators.

2023

Early anomaly detection in time series: a hierarchical approach for predicting critical health episodes

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
MACHINE LEARNING

Abstract
The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle early anomaly detection problems is through standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to address these tasks. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two hierarchical layers, which we hypothesize are easier to solve. The results suggest that the proposed approach leads to a better performance relative to state of the art approaches for critical health episode prediction.

2022

Meta-features for meta-learning

Autores
Rivolli, A; Garcia, LPF; Soares, C; Vanschoren, J; de Carvalho, ACPLF;

Publicação
KNOWLEDGE-BASED SYSTEMS

Abstract
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. These recommendations are made based on meta-data, consisting of performance evaluations of algorithms and characterizations on prior datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in many studies, meta-features are not uniformly described, organized and computed, making many empirical studies irreproducible and hard to compare. This paper aims to deal with this by systematizing and standardizing data characterization measures for classification datasets used in meta-learning. Moreover, it presents an extensive list of meta-features and characterization tools, which can be used as a guide for new practitioners. By identifying particularities and subtle issues related to the characterization measures, this survey points out possible future directions that the development of meta-features for meta-learning can assume.

2022

Multidimensional Subgroup Discovery on Event Logs

Autores
Ribeiro, J; Fontes, T; Soares, C; Borges, J;

Publicação
SSRN Electronic Journal

Abstract

Teses
supervisionadas

2019

Classificação Automática de Episódios Clínicos

Autor
Ricardo Manuel da Rocha Melo e Castro

Instituição
UP-FEUP

2019

A Supervised Approach to Detect Bias in News Sources

Autor
Alexandre Marques de Castro Ribeiro

Instituição
UP-FEUP

2019

Ordinal Regression for Stress Levels Classification in Real-World Scenarios

Autor
Tiago Bernardes Almeida

Instituição
UP-FEUP

2019

Dataset morphing to analyze the performance of recommender systems

Autor
André Gomes Ferreira Araújo Correia

Instituição
UP-FEUP

2019

Automatic Interpretation of Promotional Leaflets in Retail for Pricing Strategy

Autor
António Maria Aires Pereira Teixeira de Melo

Instituição
UP-FEUP