Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Joana Vale Sousa

2022

Lung Segmentation in CT Images: A Residual U-Net Approach on a Cross-Cohort Dataset

Authors
Sousa, J; Pereira, T; Silva, F; Silva, MC; Vilares, AT; Cunha, A; Oliveira, HP;

Publication
APPLIED SCIENCES-BASEL

Abstract
Lung cancer is one of the most common causes of cancer-related mortality, and since the majority of cases are diagnosed when the tumor is in an advanced stage, the 5-year survival rate is dismally low. Nevertheless, the chances of survival can increase if the tumor is identified early on, which can be achieved through screening with computed tomography (CT). The clinical evaluation of CT images is a very time-consuming task and computed-aided diagnosis systems can help reduce this burden. The segmentation of the lungs is usually the first step taken in image analysis automatic models of the thorax. However, this task is very challenging since the lungs present high variability in shape and size. Moreover, the co-occurrence of other respiratory comorbidities alongside lung cancer is frequent, and each pathology can present its own scope of CT imaging appearances. This work investigated the development of a deep learning model, whose architecture consists of the combination of two structures, a U-Net and a ResNet34. The proposed model was designed on a cross-cohort dataset and it achieved a mean dice similarity coefficient (DSC) higher than 0.93 for the 4 different cohorts tested. The segmentation masks were qualitatively evaluated by two experienced radiologists to identify the main limitations of the developed model, despite the good overall performance obtained. The performance per pathology was assessed, and the results confirmed a small degradation for consolidation and pneumocystis pneumonia cases, with a DSC of 0.9015 +/- 0.2140 and 0.8750 +/- 0.1290, respectively. This work represents a relevant assessment of the lung segmentation model, taking into consideration the pathological cases that can be found in the clinical routine, since a global assessment could not detail the fragilities of the model.

2022

Towards Machine Learning-Aided Lung Cancer Clinical Routines: Approaches and Open Challenges

Authors
Silva, F; Pereira, T; Neves, I; Morgado, J; Freitas, C; Malafaia, M; Sousa, J; Fonseca, J; Negrao, E; de Lima, BF; da Silva, MC; Madureira, AJ; Ramos, I; Costa, JL; Hespanhol, V; Cunha, A; Oliveira, HP;

Publication
JOURNAL OF PERSONALIZED MEDICINE

Abstract
Advancements in the development of computer-aided decision (CAD) systems for clinical routines provide unquestionable benefits in connecting human medical expertise with machine intelligence, to achieve better quality healthcare. Considering the large number of incidences and mortality numbers associated with lung cancer, there is a need for the most accurate clinical procedures; thus, the possibility of using artificial intelligence (AI) tools for decision support is becoming a closer reality. At any stage of the lung cancer clinical pathway, specific obstacles are identified and motivate the application of innovative AI solutions. This work provides a comprehensive review of the most recent research dedicated toward the development of CAD tools using computed tomography images for lung cancer-related tasks. We discuss the major challenges and provide critical perspectives on future directions. Although we focus on lung cancer in this review, we also provide a more clear definition of the path used to integrate AI in healthcare, emphasizing fundamental research points that are crucial for overcoming current barriers.

2022

The Influence of a Coherent Annotation and Synthetic Addition of Lung Nodules for Lung Segmentation in CT Scans

Authors
Sousa, J; Pereira, T; Neves, I; Silva, F; Oliveira, HP;

Publication
SENSORS

Abstract
Lung cancer is a highly prevalent pathology and a leading cause of cancer-related deaths. Most patients are diagnosed when the disease has manifested itself, which usually is a sign of lung cancer in an advanced stage and, as a consequence, the 5-year survival rates are low. To increase the chances of survival, improving the cancer early detection capacity is crucial, for which computed tomography (CT) scans represent a key role. The manual evaluation of the CTs is a time-consuming task and computer-aided diagnosis (CAD) systems can help relieve that burden. The segmentation of the lung is one of the first steps in these systems, yet it is very challenging given the heterogeneity of lung diseases usually present and associated with cancer development. In our previous work, a segmentation model based on a ResNet34 and U-Net combination was developed on a cross-cohort dataset that yielded good segmentation masks for multiple pathological conditions but misclassified some of the lung nodules. The multiple datasets used for the model development were originated from different annotation protocols, which generated inconsistencies for the learning process, and the annotations are usually not adequate for lung cancer studies since they did not comprise lung nodules. In addition, the initial datasets used for training presented a reduced number of nodules, which was showed not to be enough to allow the segmentation model to learn to include them as a lung part. In this work, an objective protocol for the lung mask's segmentation was defined and the previous annotations were carefully reviewed and corrected to create consistent and adequate ground-truth masks for the development of the segmentation model. Data augmentation with domain knowledge was used to create lung nodules in the cases used to train the model. The model developed achieved a Dice similarity coefficient (DSC) above 0.9350 for all test datasets and it showed an ability to cope, not only with a variety of lung patterns, but also with the presence of lung nodules as well. This study shows the importance of using consistent annotations for the supervised learning process, which is a very time-consuming task, but that has great importance to healthcare applications. Due to the lack of massive datasets in the medical field, which consequently brings a lack of wide representativity, data augmentation with domain knowledge could represent a promising help to overcome this limitation for learning models development.

2023

Learning Models for Bone Marrow Edema Detection in Magnetic Resonance Imaging

Authors
Ribeiro, G; Pereira, T; Silva, F; Sousa, J; Carvalho, DC; Dias, SC; Oliveira, HP;

Publication
APPLIED SCIENCES-BASEL

Abstract
Bone marrow edema (BME) is the term given to the abnormal fluid signal seen within the bone marrow on magnetic resonance imaging (MRI). It usually indicates the presence of underlying pathology and is associated with a myriad of conditions/causes. However, it can be misleading, as in some cases, it may be associated with normal changes in the bone, especially during the growth period of childhood, and objective methods for assessment are lacking. In this work, learning models for BME detection were developed. Transfer learning was used to overcome the size limitations of the dataset, and two different regions of interest (ROI) were defined and compared to evaluate their impact on the performance of the model: bone segmention and intensity mask. The best model was obtained for the high intensity masking technique, which achieved a balanced accuracy of 0.792 +/- 0.034. This study represents a comparison of different models and data regularization techniques for BME detection and showed promising results, even in the most difficult range of ages: children and adolescents. The application of machine learning methods will help to decrease the dependence on the clinicians, providing an initial stratification of the patients based on the probability of edema presence and supporting their decisions on the diagnosis.

2023

Machine learning-based approaches for cancer prediction using microbiome data

Authors
Freitas, P; Silva, F; Sousa, JV; Ferreira, RM; Figueiredo, C; Pereira, T; Oliveira, HP;

Publication
SCIENTIFIC REPORTS

Abstract
Emerging evidence of the relationship between the microbiome composition and the development of numerous diseases, including cancer, has led to an increasing interest in the study of the human microbiome. Technological breakthroughs regarding DNA sequencing methods propelled microbiome studies with a large number of samples, which called for the necessity of more sophisticated data-analytical tools to analyze this complex relationship. The aim of this work was to develop a machine learning-based approach to distinguish the type of cancer based on the analysis of the tissue-specific microbial information, assessing the human microbiome as valuable predictive information for cancer identification. For this purpose, Random Forest algorithms were trained for the classification of five types of cancer-head and neck, esophageal, stomach, colon, and rectum cancers-with samples provided by The Cancer Microbiome Atlas database. One versus all and multi-class classification studies were conducted to evaluate the discriminative capability of the microbial data across increasing levels of cancer site specificity, with results showing a progressive rise in difficulty for accurate sample classification. Random Forest models achieved promising performances when predicting head and neck, stomach, and colon cancer cases, with the latter returning accuracy scores above 90% across the different studies conducted. However, there was also an increased difficulty when discriminating esophageal and rectum cancers, failing to differentiate with adequate results rectum from colon cancer cases, and esophageal from head and neck and stomach cancers. These results point to the fact that anatomically adjacent cancers can be more complex to identify due to microbial similarities. Despite the limitations, microbiome data analysis using machine learning may advance novel strategies to improve cancer detection and prevention, and decrease disease burden.

2023

Single Modality vs. Multimodality: What Works Best for Lung Cancer Screening?

Authors
Sousa, JV; Matos, P; Silva, F; Freitas, P; Oliveira, HP; Pereira, T;

Publication
SENSORS

Abstract
In a clinical context, physicians usually take into account information from more than one data modality when making decisions regarding cancer diagnosis and treatment planning. Artificial intelligence-based methods should mimic the clinical method and take into consideration different sources of data that allow a more comprehensive analysis of the patient and, as a consequence, a more accurate diagnosis. Lung cancer evaluation, in particular, can benefit from this approach since this pathology presents high mortality rates due to its late diagnosis. However, many related works make use of a single data source, namely imaging data. Therefore, this work aims to study the prediction of lung cancer when using more than one data modality. The National Lung Screening Trial dataset that contains data from different sources, specifically, computed tomography (CT) scans and clinical data, was used for the study, the development and comparison of single-modality and multimodality models, that may explore the predictive capability of these two types of data to their full potential. A ResNet18 network was trained to classify 3D CT nodule regions of interest (ROI), whereas a random forest algorithm was used to classify the clinical data, with the former achieving an area under the ROC curve (AUC) of 0.7897 and the latter 0.5241. Regarding the multimodality approaches, three strategies, based on intermediate and late fusion, were implemented to combine the information from the 3D CT nodule ROIs and the clinical data. From those, the best model-a fully connected layer that receives as input a combination of clinical data and deep imaging features, given by a ResNet18 inference model-presented an AUC of 0.8021. Lung cancer is a complex disease, characterized by a multitude of biological and physiological phenomena and influenced by multiple factors. It is thus imperative that the models are capable of responding to that need. The results obtained showed that the combination of different types may have the potential to produce more comprehensive analyses of the disease by the models.