Publications

Publications by Francisco Carvalho Silva

2020

Pre-Training Autoencoder for Lung Nodule Malignancy Assessment Using CT Images

Authors
Silva, F; Pereira, T; Frade, J; Mendes, J; Freitas, C; Hespanhol, V; Luis Costa, JL; Cunha, A; Oliveira, HP;

Publication
APPLIED SCIENCES-BASEL

Abstract
Lung cancer late diagnosis has a large impact on the mortality rate numbers, leading to a very low five-year survival rate of 5%. This issue emphasises the importance of developing systems to support a diagnostic at earlier stages. Clinicians use Computed Tomography (CT) scans to assess the nodules and the likelihood of malignancy. Automatic solutions can help to make a faster and more accurate diagnosis, which is crucial for the early detection of lung cancer. Convolutional neural networks (CNN) based approaches have shown to provide a reliable feature extraction ability to detect the malignancy risk associated with pulmonary nodules. This type of approach requires a massive amount of data to model training, which usually represents a limitation in the biomedical field due to medical data privacy and security issues. Transfer learning (TL) methods have been widely explored in medical imaging applications, offering a solution to overcome problems related to the lack of training data publicly available. For the clinical annotations experts with a deep understanding of the complex physiological phenomena represented in the data are required, which represents a huge investment. In this direction, this work explored a TL method based on unsupervised learning achieved when training a Convolutional Autoencoder (CAE) using images in the same domain. For this, lung nodules from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) were extracted and used to train a CAE. Then, the encoder part was transferred, and the malignancy risk was assessed in a binary classification-benign and malignant lung nodules, achieving an Area Under the Curve (AUC) value of 0.936. To evaluate the reliability of this TL approach, the same architecture was trained from scratch and achieved an AUC value of 0.928. The results reported in this comparison suggested that the feature learning achieved when reconstructing the input with an encoder-decoder based architecture can be considered an useful knowledge that might allow overcoming labelling constraints.

CloseRead Abstract

2021

Comprehensive Perspective for Lung Cancer Characterisation Based on AI Solutions Using CT Images

Authors
Pereira, T; Freitas, C; Costa, JL; Morgado, J; Silva, F; Negrao, E; de Lima, BF; da Silva, MC; Madureira, AJ; Ramos, I; Hespanhol, V; Cunha, A; Oliveira, HP;

Publication
JOURNAL OF CLINICAL MEDICINE

Abstract
Lung cancer is still the leading cause of cancer death in the world. For this reason, novel approaches for early and more accurate diagnosis are needed. Computer-aided decision (CAD) can be an interesting option for a noninvasive tumour characterisation based on thoracic computed tomography (CT) image analysis. Until now, radiomics have been focused on tumour features analysis, and have not considered the information on other lung structures that can have relevant features for tumour genotype classification, especially for epidermal growth factor receptor (EGFR), which is the mutation with the most successful targeted therapies. With this perspective paper, we aim to explore a comprehensive analysis of the need to combine the information from tumours with other lung structures for the next generation of CADs, which could create a high impact on targeted therapies and personalised medicine. The forthcoming artificial intelligence (AI)-based approaches for lung cancer assessment should be able to make a holistic analysis, capturing information from pathological processes involved in cancer development. The powerful and interpretable AI models allow us to identify novel biomarkers of cancer development, contributing to new insights about the pathological processes, and making a more accurate diagnosis to help in the treatment plan selection.

CloseRead Abstract

2021

Machine Learning and Feature Selection Methods for EGFR Mutation Status Prediction in Lung Cancer

Authors
Morgado, J; Pereira, T; Silva, F; Freitas, C; Negrao, E; de Lima, BF; da Silva, MC; Madureira, AJ; Ramos, I; Hespanhol, V; Costa, JL; Cunha, A; Oliveira, HP;

Publication
APPLIED SCIENCES-BASEL

Abstract
The evolution of personalized medicine has changed the therapeutic strategy from classical chemotherapy and radiotherapy to a genetic modification targeted therapy, and although biopsy is the traditional method to genetically characterize lung cancer tumor, it is an invasive and painful procedure for the patient. Nodule image features extracted from computed tomography (CT) scans have been used to create machine learning models that predict gene mutation status in a noninvasive, fast, and easy-to-use manner. However, recent studies have shown that radiomic features extracted from an extended region of interest (ROI) beyond the tumor, might be more relevant to predict the mutation status in lung cancer, and consequently may be used to significantly decrease the mortality rate of patients battling this condition. In this work, we investigated the relation between image phenotypes and the mutation status of Epidermal Growth Factor Receptor (EGFR), the most frequently mutated gene in lung cancer with several approved targeted-therapies, using radiomic features extracted from the lung containing the nodule. A variety of linear, nonlinear, and ensemble predictive classification models, along with several feature selection methods, were used to classify the binary outcome of wild-type or mutant EGFR mutation status. The results show that a comprehensive approach using a ROI that included the lung with nodule can capture relevant information and successfully predict the EGFR mutation status with increased performance compared to local nodule analyses. Linear Support Vector Machine, Elastic Net, and Logistic Regression, combined with the Principal Component Analysis feature selection method implemented with 70% of variance in the feature set, were the best-performing classifiers, reaching Area Under the Curve (AUC) values ranging from 0.725 to 0.737. This approach that exploits a holistic analysis indicates that information from more extensive regions of the lung containing the nodule allows a more complete lung cancer characterization and should be considered in future radiogenomic studies.

CloseRead Abstract

2021

EGFR Assessment in Lung Cancer CT Images: Analysis of Local and Holistic Regions of Interest Using Deep Unsupervised Transfer Learning

Authors
Silva, F; Pereira, T; Morgado, J; Frade, J; Mendes, J; Freitas, C; Negrao, E; De Lima, BF; Da Silva, MC; Madureira, AJ; Ramos, I; Hespanhol, V; Costa, JL; Cunha, A; Oliveira, HP;

Publication
IEEE ACCESS

Abstract
Statistics have demonstrated that one of the main factors responsible for the high mortality rate related to lung cancer is the late diagnosis. Precision medicine practices have shown advances in the individualized treatment according to the genetic profile of each patient, providing better control on cancer response. Medical imaging offers valuable information with an extensive perspective of the cancer, opening opportunities to explore the imaging manifestations associated with the tumor genotype in a non-invasive way. This work aims to study the relevance of physiological features captured from Computed Tomography images, using three different 2D regions of interest to assess the Epidermal growth factor receptor (EGFR) mutation status: nodule, lung containing the main nodule, and both lungs. A Convolutional Autoencoder was developed for the reconstruction of the input image. Thereafter, the encoder block was used as a feature extractor, stacking a classifier on top to assess the EGFR mutation status. Results showed that extending the analysis beyond the local nodule allowed the capture of more relevant information, suggesting the presence of useful biomarkers using the lung with nodule region of interest, which allowed to obtain the best prediction ability. This comparative study represents an innovative approach for gene mutations status assessment, contributing to the discussion on the extent of pathological phenomena associated with cancer development, and its contribution to more accurate Artificial Intelligence-based solutions, and constituting, to the best of our knowledge, the first deep learning approach that explores a comprehensive analysis for the EGFR mutation status classification.

CloseRead Abstract

2021

Sharing Biomedical Data: Strengthening AI Development in Healthcare

Authors
Pereira, T; Morgado, J; Silva, F; Pelter, MM; Dias, VR; Barros, R; Freitas, C; Negrao, E; de Lima, BF; da Silva, MC; Madureira, AJ; Ramos, I; Hespanhol, V; Costa, JL; Cunha, A; Oliveira, HP;

Publication
HEALTHCARE

Abstract
Artificial intelligence (AI)-based solutions have revolutionized our world, using extensive datasets and computational resources to create automatic tools for complex tasks that, until now, have been performed by humans. Massive data is a fundamental aspect of the most powerful AI-based algorithms. However, for AI-based healthcare solutions, there are several socioeconomic, technical/infrastructural, and most importantly, legal restrictions, which limit the large collection and access of biomedical data, especially medical imaging. To overcome this important limitation, several alternative solutions have been suggested, including transfer learning approaches, generation of artificial data, adoption of blockchain technology, and creation of an infrastructure composed of anonymous and abstract data. However, none of these strategies is currently able to completely solve this challenge. The need to build large datasets that can be used to develop healthcare solutions deserves special attention from the scientific community, clinicians, all the healthcare players, engineers, ethicists, legislators, and society in general. This paper offers an overview of the data limitation in medical predictive models; its impact on the development of healthcare solutions; benefits and barriers of sharing data; and finally, suggests future directions to overcome data limitations in the medical field and enable AI to enhance healthcare. This perspective is dedicated to the technical requirements of the learning models, and it explains the limitation that comes from poor and small datasets in the medical domain and the technical options that try or can solve the problem related to the lack of massive healthcare data.

CloseRead Abstract

2021

The Impact of Interstitial Diseases Patterns on Lung CT Segmentation

Authors
Silva, F; Pereira, T; Morgado, J; Cunha, A; Oliveira, HP;

Publication
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC)

Abstract
Lung segmentation represents a fundamental step in the development of computer-aided decision systems for the investigation of interstitial lung diseases. In a holistic lung analysis, eliminating background areas from Computed Tomography (CT) images is essential to avoid the inclusion of noise information and spend unnecessary computational resources on non-relevant data. However, the major challenge in this segmentation task relies on the ability of the models to deal with imaging manifestations associated with severe disease. Based on U-net, a general biomedical image segmentation architecture, we proposed a light-weight and faster architecture. In this 2D approach, experiments were conducted with a combination of two publicly available databases to improve the heterogeneity of the training data. Results showed that, when compared to the original U-net, the proposed architecture maintained performance levels, achieving 0.894 +/- 0.060, 4.493 +/- 0.633 and 4.457 +/- 0.628 for DSC, HD and HD-95 metrics, respectively, when using all patients from the ILD database for testing only, while allowing a more efficient computational usage. Quantitative and qualitative evaluations on the ability to cope with high-density lung patterns associated with severe disease were conducted, supporting the idea that more representative and diverse data is necessary to build robust and reliable segmentation tools.

CloseRead Abstract