Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Rui Camacho
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    01 janeiro 2011
005
Publicações

2022

Machine learning methods to predict attrition in a population-based cohort of very preterm infants

Autores
Teixeira, R; Rodrigues, C; Moreira, C; Barros, H; Camacho, R;

Publicação
SCIENTIFIC REPORTS

Abstract
AbstractThe timely identification of cohort participants at higher risk for attrition is important to earlier interventions and efficient use of research resources. Machine learning may have advantages over the conventional approaches to improve discrimination by analysing complex interactions among predictors. We developed predictive models of attrition applying a conventional regression model and different machine learning methods. A total of 542 very preterm (<?32 gestational weeks) infants born in Portugal as part of the European Effective Perinatal Intensive Care in Europe (EPICE) cohort were included. We tested a model with a fixed number of predictors (Baseline) and a second with a dynamic number of variables added from each follow-up (Incremental). Eight classification methods were applied: AdaBoost, Artificial Neural Networks, Functional Trees, J48, J48Consolidated, K-Nearest Neighbours, Random Forest and Logistic Regression. Performance was compared using AUC- PR (Area Under the Curve—Precision Recall), Accuracy, Sensitivity and F-measure. Attrition at the four follow-ups were, respectively: 16%, 25%, 13% and 17%. Both models demonstrated good predictive performance, AUC-PR ranging between 69 and 94.1 in Baseline and from 72.5 to 97.1 in Incremental model. Of the whole set of methods, Random Forest presented the best performance at all follow-ups [AUC-PR1: 94.1 (2.0); AUC-PR2: 91.2 (1.2); AUC-PR3: 97.1 (1.0); AUC-PR4: 96.5 (1.7)]. Logistic Regression performed well below Random Forest. The top-ranked predictors were common for both models in all follow-ups: birthweight, gestational age, maternal age, and length of hospital stay. Random Forest presented the highest capacity for prediction and provided interpretable predictors. Researchers involved in cohorts can benefit from our robust models to prepare for and prevent loss to follow-up by directing efforts toward individuals at higher risk.

2022

A Novel Multi-View Ensemble Learning Architecture to Improve the Structured Text Classification

Autores
Gonçalves, CA; Vieira, AS; Gonçalves, CT; Camacho, R; Iglesias, EL; Diz, LB;

Publicação
Information (Switzerland)

Abstract
Multi-view ensemble learning exploits the information of data views. To test its efficiency for full text classification, a technique has been implemented where the views correspond to the document sections. For classification and prediction, we use a stacking generalization based on the idea that different learning algorithms provide complementary explanations of the data. The present study implements the stacking approach using support vector machine algorithms as the baseline and a C4.5 implementation as the meta-learner. Views are created with OHSUMED biomedical full text documents. Experimental results lead to the sustained conclusion that the application of multi-view techniques to full texts significantly improves the task of text classification, providing a significant contribution for the biomedical text mining research. We also have evidence to conclude that enriched datasets with text from certain sections are better than using only titles and abstracts.

2022

A Novel Multi-View Ensemble Learning Architecture to Improve the Structured Text Classification

Autores
Gonçalves, CA; Vieira, AS; Gonçalves, CT; Camacho, R; Iglesias, EL; Borrajo Diz, ML;

Publicação
Inf.

Abstract

2022

A Novel Multi-View Ensemble Learning Architecture to Improve the Structured Text Classification

Autores
Goncalves, CA; Vieira, AS; Goncalves, CT; Camacho, R; Iglesias, EL; Diz, LB;

Publicação
INFORMATION

Abstract
Multi-view ensemble learning exploits the information of data views. To test its efficiency for full text classification, a technique has been implemented where the views correspond to the document sections. For classification and prediction, we use a stacking generalization based on the idea that different learning algorithms provide complementary explanations of the data. The present study implements the stacking approach using support vector machine algorithms as the baseline and a C4.5 implementation as the meta-learner. Views are created with OHSUMED biomedical full text documents. Experimental results lead to the sustained conclusion that the application of multi-view techniques to full texts significantly improves the task of text classification, providing a significant contribution for the biomedical text mining research. We also have evidence to conclude that enriched datasets with text from certain sections are better than using only titles and abstracts.

2021

CMIID: A comprehensive medical information identifier for clinical search harmonization in Data Safe Havens

Autores
Domingues, MAP; Camacho, R; Rodrigues, PP;

Publicação
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Over the last decades clinical research has been driven by informatics changes nourished by distinct research endeavors. Inherent to this evolution, several issues have been the focus of a variety of studies: multi-location patient data access, interoperability between terminological and classification systems and clinical practice and records harmonization. Having these problems in mind, the Data Safe Haven paradigm emerged to promote a newborn architecture, better reasoning and safe and easy access to distinct Clinical Data Repositories. This study aim is to present a novel solution for clinical search harmonization within a safe environment, making use of a hybrid coding taxonomy that enables researchers to collect information from multiple repositories based on a clinical domain query definition. Results show that is possible to query multiple repositories using a single query definition based on clinical domains and the capabilities of the Unified Medical Language System, although it leads to deterioration of the framework response times. Participants of a Focus Group and a System Usability Scale questionnaire rated the framework with a median value of 72.5, indicating the hybrid coding taxonomy could be enriched with additional metadata to further improve the refinement of the results and enable the possibility of using this system as data quality tagging mechanism. © 2020 Elsevier Inc.

Teses
supervisionadas

2021

A real-time decision support system for guiding logistics vehicle operations

Autor
Sara Cláudio

Instituição
UP-FEP

2021

Ensembles de OCRs para aplicações médicas

Autor
João Adriano Portela de Matos Silva

Instituição
UP-FEUP

2021

Trustability in data-driven decision models for Public Policy

Autor
Sónia Alexandra Carvalho Teixeira

Instituição
UP-FEUP

2021

Gestão de operações logísticas em meio urbano

Autor
Maria Inês Malafaia Baptista Sabino Marques

Instituição
UP-FEUP

2021

Graph-Based Entity-Oriented Search

Autor
José Luís da Silva Devezas

Instituição
UP-FEUP