2020
Autores
Gelatti, GJ; Rodrigues, PP; Cruz Correia, RJC;
Publicação
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF
Abstract
Introduction: In 2015 the Directorate-General for Health of Portugal published new standards (DGS 001/2015) for the registration of cesarean section indicators. The existing scenario was the lack of data, influencing the quality of indicators and analyses on them. The use of a single computer tool was encouraged to register and compare indicators between hospitals with special attention to the Robson Classification as it employs basic information of pregnancy to classify all deliveries in 10 groups. The selected tool was Obscare software. Aim: Describe the scenario on data quality by analyzing the completeness of obstetric records from 2016 to 2018 of the variables used in Robson's classification collected by the Obscare tool. Methods: The completeness is evaluated using a number of missing values. The lower the completeness, the higher the number of missing values. Also, we perform the imputation of data based on basic concepts and analyzed the participation of this data in the indication of the type of delivery to be performed according to classification suggested by DGS 001/2015. Results: From 2016 to 2018. 5922 number of pregnancies resulted in 5922 of Robson Classifications. The variables with lower completeness were related to previous cesarean section (77%) and previous pregnancies (43%). After imputation, it fell to 3.9% and 0.56%, respectively causing 4.6% of discarded data from the total. Discussion: There is a significant amount of missing data in basic variables used to study the classification of delivery type. We believe that encouraging data completion with the possibility of comparing data between hospitals should be a priority in the health area.
2020
Autores
Bischoff, F; Rodrigues, PP;
Publicação
R JOURNAL
Abstract
This article describes tsmp, an R package that implements the MP concept for TS. The tsmp package is a toolkit that allows all-pairs similarity joins, motif, discords and chains discovery, semantic segmentation, etc. Here we describe how the tsmp package may be used by showing some of the use-cases from the original articles and evaluate the algorithm speed in the R environment. This package can be downloaded at https://CRAN.R-project.org/package=tsmp.
2020
Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;
Publicação
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Abstract
The missing data issue is often found in real-world datasets and it is usually handled with imputation strategies that replace the missing values with new data. Recently, generative models such as Variational Autoencoders have been applied for this imputation task. However, they were always used to perform the entire imputation, which has presented limited results when comparing to other state-of-the-art methods. In this work, a new approach called Variational Autoencoder Filter for Bayesian Ridge Imputation is introduced. It uses a Variational Autoencoder at the beginning of the imputation pipeline to filter the instances that are later fitted to a Bayesian ridge regression used to predict the new values. The approach was compared to four state-of-the-art imputation methods using 10 datasets from the healthcare context covering clinical trials, all injected with missing values under different rates. The proposed approach significantly outperformed the remaining methods in all settings, achieving an overall improvement between 26% and 67%.
2020
Autores
Pereira, RC; Santos, JC; Amorim, JP; Rodrigues, PP; Abreu, PH;
Publicação
28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2020, Bruges, Belgium, October 2-4, 2020
Abstract
Missing data is an issue often addressed with imputation strategies that replace the missing values with plausible ones. A trend in these strategies is the use of generative models, one being Variational Autoencoders. However, the default loss function of this method gives the same importance to all data, while a more suitable solution should focus on the missing values. In this work an extension of this method with a custom loss function is introduced (Variational Autoencoder with Weighted Loss). The method was compared with state-of-the-art generative models and the results showed improvements higher than 40% in several settings. © ESANN 2020 - Proceedings, 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
2020
Autores
Ferreira-Santos, D; Rodrigues, PP;
Publicação
Journal of Medical Internet Research
Abstract
2020
Autores
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;
Publicação
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
Abstract
Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.