Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2018

Evaluation of Oversampling Data Balancing Techniques in the Context of Ordinal Classification

Authors
Domingues, I; Amorim, JP; Abreu, PH; Duarte, H; Santos, JAM;

Publication
2018 International Joint Conference on Neural Networks, IJCNN 2018, Rio de Janeiro, Brazil, July 8-13, 2018

Abstract
Data imbalance is characterized by a discrepancy in the number of examples per class of a dataset. This phenomenon is known to deteriorate the performance of classifiers, since they are less able to learn the characteristics of the less represented classes. For most imbalanced datasets, the application of sampling techniques improves the classifier's performance. For small datasets, oversampling has been shown to be the most appropriate strategy since it augments the original set of samples. Although several oversampling strategies have been proposed and tested over the years, the work has mostly focused on binary or multi-class tasks. Motivated by medical applications, where there is often an order associated with the classes (increasing likelihood of malignancy, for instance), the present work tests some existing oversampling techniques in ordinal contexts. Moreover, four new oversampling techniques are proposed. Experiments were made both on private and public datasets. Private datasets concern the assessment of response to treatment on oncologic diseases. The 15 public datasets were chosen since they are widely used in the literature. Results show that data balance techniques improve classification results on ordinal imbalanced datasets, even when these techniques are not specifically designed for ordinal problems. With our pipeline, better or equal to published results were obtained for 10 out of the 15 public datasets with improvements upon a decrease of 0.43 on MMAE.

2018

Improving the Classifier Performance in Motor Imagery Task Classification: What are the steps in the classification process that we should worry about?

Authors
Santos, MS; Abreu, PH; Rodriguez Bermudez, G; Garcia Laencina, PJ;

Publication
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS

Abstract
Brain-Computer Interface systems based on motor imagery are able to identify an individual's intent to initiate control through the classification of encephalography patterns. Correctly classifying such patterns is instrumental and strongly depends in a robust machine learning block that is able to properly process the features extracted from a subject's encephalograms. The main objective of this work is to provide an overall view on machine learning stages, aiming to answer the following question: "What are the steps in the classification process that we should worry about?". The obtained results suggest that future research in the field should focus on two main aspects: exploring techniques for dimensionality reduction, in particular, supervised linear approaches, and evaluating adequate validation schemes to allow a more precise interpretation of results.

2018

Missing data imputation via denoising autoencoders: The untold story

Authors
Costa, AF; Santos, MS; Soares, JP; Abreu, PH;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Missing data consists in the lack of information in a dataset and since it directly influences classification performance, neglecting it is not a valid option. Over the years, several studies presented alternative imputation strategies to deal with the three missing data mechanisms, Missing Completely At Random, Missing At Random and Missing Not At Random. However, there are no studies regarding the influence of all these three mechanisms on the latest high-performance Artificial Intelligence techniques, such as Deep Learning. The goal of this work is to perform a comparison study between state-of-the-art imputation techniques and a Stacked Denoising Autoencoders approach. To that end, the missing data mechanisms were synthetically generated in 6 different ways; 8 different imputation techniques were implemented; and finally, 33 complete datasets from different open source repositories were selected. The obtained results showed that Support Vector Machines imputation ensures the best classification performance while Multiple Imputation by Chained Equations performs better in terms of imputation quality. © Springer Nature Switzerland AG 2018.

2018

Exploring the Effects of Data Distribution in Missing Data Imputation

Authors
Soares, JP; Santos, MS; Abreu, PH; Araújo, H; Santos, JAM;

Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract

2018

Missing Data Imputation via Denoising Autoencoders: The Untold Story

Authors
Costa, AF; Santos, MS; Soares, JP; Abreu, PH;

Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract

2018

Interpreting deep learning models for ordinal problems

Authors
Amorim, JP; Domingues, I; Abreu, PH; Santos, JAM;

Publication
26th European Symposium on Artificial Neural Networks, ESANN 2018, Bruges, Belgium, April 25-27, 2018

Abstract
Machine learning algorithms have evolved by exchanging simplicity and interpretability for accuracy, which prevents their adoption in critical tasks such as healthcare. Progress can be made by improving interpretability of complex models while preserving performance. This work introduces an extension of interpretable mimic learning which teaches in-terpretable models to mimic predictions of complex deep neural networks, not only on binary problems but also in ordinal settings. The results show that the mimic models have comparative performance to Deep Neural Network models, with the advantage of being interpretable.

  • 239
  • 503