Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Vítor Santos Costa

2024

The Impact of Feature Selection on Balancing, Based on Diabetes Data

Authors
Machado, D; Costa, VS; Brandao, P;

Publication
BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, BIOSTEC 2023

Abstract
Diabetes management data is composed of diverse factors and glycaemia indicators. Glycaemia predictive models tend to focus solely on glycaemia values. A comprehensive understanding of diabetes management requires the consideration of several aspects of diabetes management, beyond glycaemia. However, the inclusion of every aspect of diabetes management can create an overly high-dimensional data set. Excessive feature spaces increase computational complexity and may introduce over-fitting. Additionally, the inclusion of inconsequential features introduces noise that hinders a model's performance. Feature importance is a process that evaluates a feature's value, and can be used to identify optimal feature sub-sets. Depending on the context, multiple methods can be used. The drop feature method, in the literature, is considered to be the best approach to evaluate individual feature importance. To reach an optimal set, the best approach is branch and bound, albeit its heavy computational cost. This overhead can be addressed through a trade-off between the feature set's optimisation level and the process' computational feasibility. The improvement of the feature space has implications on the effectiveness of data balancing approaches. Whilst, in this study, the observed impact was not substantial, it warrants the need to reconsider the balancing approach given a superior feature space.

  • 35
  • 35