Publicacoes - INESC TEC

Publicações

Publicações por Pedro Pereira Rodrigues

2021

COVID-19 surveillance data quality issues: a national consecutive case series

Autores
Costa Santos, C; Neves, AL; Correia, R; Santos, P; Monteiro Soares, M; Freitas, A; Ribeiro Vaz, I; Henriques, TS; Rodrigues, PP; Costa Pereira, A; Pereira, AM; Fonseca, JA;

Publicação
BMJ OPEN

Abstract
Objectives High-quality data are crucial for guiding decision-making and practising evidence-based healthcare, especially if previous knowledge is lacking. Nevertheless, data quality frailties have been exposed worldwide during the current COVID-19 pandemic. Focusing on a major Portuguese epidemiological surveillance dataset, our study aims to assess COVID-19 data quality issues and suggest possible solutions. Settings On 27 April 2020, the Portuguese Directorate-General of Health (DGS) made available a dataset (DGSApril) for researchers, upon request. On 4 August, an updated dataset (DGSAugust) was also obtained. Participants All COVID-19-confirmed cases notified through the medical component of National System for Epidemiological Surveillance until end of June. Primary and secondary outcome measures Data completeness and consistency. Results DGSAugust has not followed the data format and variables as DGSApril and a significant number of missing data and inconsistencies were found (eg, 4075 cases from the DGSApril were apparently not included in DGSAugust). Several variables also showed a low degree of completeness and/or changed their values from one dataset to another (eg, the variable 'underlying conditions' had more than half of cases showing different information between datasets). There were also significant inconsistencies between the number of cases and deaths due to COVID-19 shown in DGSAugust and by the DGS reports publicly provided daily. Conclusions Important quality issues of the Portuguese COVID-19 surveillance datasets were described. These issues can limit surveillance data usability to inform good decisions and perform useful research. Major improvements in surveillance datasets are therefore urgently needed-for example, simplification of data entry processes, constant monitoring of data, and increased training and awareness of healthcare providers-as low data quality may lead to a deficient pandemic control.

FecharLer Abstract

2022

Biomarkers for Alzheimer's Disease in the Current State: A Narrative Review

Autores
Gunes, S; Aizawa, Y; Sugashi, T; Sugimoto, M; Rodrigues, PP;

Publicação
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES

Abstract
Alzheimer's disease (AD) has become a problem, owing to its high prevalence in an aging society with no treatment available after onset. However, early diagnosis is essential for preventive intervention to delay disease onset due to its slow progression. The current AD diagnostic methods are typically invasive and expensive, limiting their potential for widespread use. Thus, the development of biomarkers in available biofluids, such as blood, urine, and saliva, which enables low or non-invasive, reasonable, and objective evaluation of AD status, is an urgent task. Here, we reviewed studies that examined biomarker candidates for the early detection of AD. Some of the candidates showed potential biomarkers, but further validation studies are needed. We also reviewed studies for non-invasive biomarkers of AD. Given the complexity of the AD continuum, multiple biomarkers with machine-learning-classification methods have been recently used to enhance diagnostic accuracy and characterize individual AD phenotypes. Artificial intelligence and new body fluid-based biomarkers, in combination with other risk factors, will provide a novel solution that may revolutionize the early diagnosis of AD.

FecharLer Abstract

2021

COVID-19 and Its Symptoms' Panoply: A Case-Control Study of 919 Suspected Cases in Locked-Down Ovar, Portugal

Autores
Sá, R; Pinho Bandeira, T; Queiroz, G; Matos, J; Ferreira, JD; Rodrigues, PP;

Publicação
Portuguese Journal of Public Health

Abstract
Background: Ovar was the first Portuguese municipality to declare active community transmission of SARS-CoV-2, with total lockdown decreed on March 17, 2020. This context provided conditions for a large-scale testing strategy, allowing a referral system considering other symptoms besides the ones that were part of the case definition (fever, cough, and dyspnea). This study aims to identify other symptoms associated with COVID-19 since it may clarify the pre-test probability of the occurrence of the disease. Methods: This case-control study uses primary care registers between March 29 and May 10, 2020 in Ovar municipality. Pre-test clinical and exposure-risk characteristics, reported by physicians, were collected through a form, and linked with their laboratory result. Results: The study population included a total of 919 patients, of whom 226 (24.6%) were COVID-19 cases and 693 were negative for SARS-CoV-2. Only 27.1% of the patients reporting contact with a confirmed or suspected case tested positive. In the multivariate analysis, statistical significance was obtained for headaches (OR 0.558), odynophagia (OR 0.273), anosmia (OR 2.360), and other symptoms (OR 2.157). The interaction of anosmia and odynophagia appeared as possibly relevant with a borderline statistically significant OR of 3.375. Conclusion: COVID-19 has a wide range of symptoms. Of the myriad described, the present study highlights anosmia itself and calls for additional studies on the interaction between anosmia and odynophagia. Headaches and odynophagia by themselves are not associated with an increased risk for the disease. These findings may help clinicians in deciding when to test, especially when other diseases with similar symptoms are more prevalent, namely in winter.

FecharLer Abstract

2022

Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publicação
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

Abstract
Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A recent trend that has been showing promising results for MNAR is the use of generative models, particularly Variational Autoencoders. However, they have a limitation: the imputed values are the result of a single sample, which can be biased. To tackle it, an extension to the Variational Autoencoder that uses a partial multiple imputation procedure is introduced in this work. The proposed method was compared to 8 state-of-the-art imputation strategies, in an experimental setup with 34 datasets from the medical context, injected with the MNAR mechanism (10% to 80% rates). The results were evaluated through the Mean Absolute Error, with the new method being the overall best in 71% of the datasets, significantly outperforming the remaining ones, particularly for high missing rates. Finally, a case study of a classification task with heart failure data was also conducted, where this method induced improvements in 50% of the classifiers.

FecharLer Abstract

2022

Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review

Autores
Ferreira Santos, D; Amorim, P; Martins, TS; Monteiro Soares, M; Rodrigues, PP;

Publicação
JOURNAL OF MEDICAL INTERNET RESEARCH

Abstract
Background: American Academy of Sleep Medicine guidelines suggest that clinical prediction algorithms can be used to screen patients with obstructive sleep apnea (OSA) without replacing polysomnography, the gold standard.Objective: We aimed to identify, gather, and analyze existing machine learning approaches that are being used for disease screening in adult patients with suspected OSA. Methods: We searched the MEDLINE, Scopus, and ISI Web of Knowledge databases to evaluate the validity of different machine learning techniques, with polysomnography as the gold standard outcome measure and used the Prediction Model Risk of Bias Assessment Tool (Kleijnen Systematic Reviews Ltd) to assess risk of bias and applicability of each included study. Results: Our search retrieved 5479 articles, of which 63 (1.15%) articles were included. We found 23 studies performing diagnostic model development alone, 26 with added internal validation, and 14 applying the clinical prediction algorithm to an independent sample (although not all reporting the most common discrimination metrics, sensitivity or specificity). Logistic regression was applied in 35 studies, linear regression in 16, support vector machine in 9, neural networks in 8, decision trees in 6, and Bayesian networks in 4. Random forest, discriminant analysis, classification and regression tree, and nomogram were each performed in 2 studies, whereas Pearson correlation, adaptive neuro-fuzzy inference system, artificial immune recognition system, genetic algorithm, supersparse linear integer models, and k-nearest neighbors algorithm were each performed in 1 study. The best area under the receiver operating curve was 0.98 (0.96-0.99) for age, waist circumference, Epworth Somnolence Scale score, and oxygen saturation as predictors in a logistic regression. Conclusions: Although high values were obtained, they still lacked external validation results in large cohorts and a standard OSA criteria definition. Trial Registration: PROSPERO CRD42021221339; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=221339(J Med Internet Res 2022;24(9):e39452) doi: 10.2196/39452

FecharLer Abstract

2022

Dataset Comparison Tool: Utility and Privacy

Autores
Almeida, JC; Cruz Correia, RJ; Rodrigues, PP;

Publicação
MIE

Abstract
Synthetic data has been more and more used in the last few years. While its applications are various, measuring its utility and privacy is seldom an easy task. Since there are different methods of evaluating these issues, which are dependent on data types, use cases and purpose, a generic method for evaluating utility and privacy does not exist at the moment. So, we introduced a compilation of the most recent methods for evaluating privacy and utility into a single executable in order to create a report of the similarities and potential privacy breaches between two datasets, whether it is related to synthetic or not. We catalogued 24 different methods, from qualitative to quantitative, column-wise or table-wise evaluations. We hope this resource can help scientists and industries get a better grasp of the synthetic data they have and produce more easily and a better basis to create a new, more broad method for evaluating dataset similarities.

FecharLer Abstract