2025
Autores
Fontes, M; Bakon, M; Cunha, A; Sousa, JJ;
Publicação
SENSORS
Abstract
Monitoring civil infrastructure is increasingly critical due to aging assets, urban expansion, and the need for early detection of structural instabilities. Interferometric Synthetic Aperture Radar (InSAR) offers high-resolution, all-weather surface deformation monitoring capabilities, which are being enhanced by recent advances in Deep Learning (DL). Despite growing interest, the existing literature lacks a comprehensive synthesis of how DL models are applied specifically to infrastructure monitoring using InSAR data. This review addresses this gap by systematically analyzing 67 peer-reviewed articles published between 2020 and February 2025. We examine the DL architectures employed, ranging from LSTMs and CNNs to Transformer-based and hybrid models, and assess their integration within various stages of the InSAR monitoring pipeline, including pre-processing, temporal analysis, segmentation, prediction, and risk classification. Our findings reveal a predominance of LSTM and CNN-based approaches, limited exploration of pre-processing tasks, and a focus on urban and linear infrastructures. We identify methodological challenges such as data sparsity, low coherence, and lack of standard benchmarks, and we highlight emerging trends including hybrid architectures, attention mechanisms, end-to-end pipelines, and data fusion with exogenous sources. The review concludes by outlining key research opportunities, such as enhancing model explainability, expanding applications to underexplored infrastructure types, and integrating DL-InSAR workflows into operational structural health monitoring systems.
2025
Autores
Costa, ROC; França, PAF; Pessoa, ACP; Júnior, GB; de Almeida, JDS; Cunha, A;
Publicação
VISION
Abstract
Deep learning for glaucoma screening often relies on high-resolution clinical images and convolutional neural networks (CNNs). However, these methods face significant performance drops when applied to noisy, low-resolution images from portable devices. To address this, our work investigates ensemble methods using multiple Transformer architectures for automated glaucoma detection in challenging scenarios. We use the Brazil Glaucoma (BrG) and private D-Eye datasets to assess model robustness. These datasets include images typical of smartphone-coupled ophthalmoscopes, which are often noisy and variable in quality. Four Transformer models-Swin-Tiny, ViT-Base, MobileViT-Small, and DeiT-Base-were trained and evaluated both individually and in ensembles. We evaluated the results at both image and patient levels to reflect clinical practice. The results show that, although performance drops on lower-quality images, ensemble combinations and patient-level aggregation significantly improve accuracy and sensitivity. We achieved up to 85% accuracy and an 84.2% F1-score on the D-Eye dataset, with a notable reduction in false negatives. Grad-CAM attention maps confirmed that Transformers identify anatomical regions relevant to diagnosis. These findings reinforce the potential of Transformer ensembles as an accessible solution for early glaucoma detection in populations with limited access to specialized equipment.
2025
Autores
Gonzalez, DG; Leite, MI; Magalhaes, L; Cunha, A;
Publicação
APPLIED SCIENCES-BASEL
Abstract
The collection and annotation of data for supervised machine learning remain challenging and costly tasks, particularly in domains that demand expert knowledge. Depending on the application, labelling may require highly specialised professionals, significantly increasing the overall effort and expense. Active learning techniques offer a promising solution by reducing the number of annotations needed, thereby lowering costs without compromising model performance. This work proposes an active learning with a decreasing-budget-based strategy to reduce the effort required to annotate medical images. The strategy encourages data annotators to focus on initial iterations, optimise budget allocation, and ensure that the trained model achieves maximum performance with reduced effort in subsequent iterations. This strategy also improves the performance of deep learning models, which perform better with fewer images, reducing the specialists' workload. This work also introduces three experiments that contribute to understanding the impact of the strategy in the annotation process.
2025
Autores
Leite, D; Marques, P; Pádua, L; Sousa, JJ; Morais, R; Cunha, A;
Publicação
PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE
Abstract
Accurate segmentation of grapevines in imagery acquired from unmanned aerial vehicles (UAVs) is important for precision viticulture, as it supports vineyard management by monitoring grapevine health, growth, and environmental stress. However, the structural diversity of vineyards, including differences in training systems, row curvatures, and foliage density, presents challenges for grapevine segmentation methods. This study evaluates the performance of deep learning (DL) models-Feature Pyramid Network (FPN), Pyramid Scene Parsing Network (PSPNet) and U-Net-each combined with different backbones for grapevine segmentation in UAV-based RGB orthophoto mosaics. Data were collected under a range of vineyard conditions and scenarios from Portugal's Douro and Vinhos Verdes regions, providing a representative dataset across multiple vineyard configurations. The DL models were trained, tested, and evaluated using orthorectified RGB imagery, and their segmentation accuracy was compared to thresholding techniques. The results show that DL models, particularly U-Net, achieved accurate grapevine segmentation and reduced over-segmentation and false detections that are common in thresholding methods. FPN models with Inception-v4 and Xception backbones performed well in vineyards with inter-row vegetation, while PSPNet models showed segmentation limitations. Overall, DL-based segmentation models demonstrated advantages over thresholding approaches, demonstrating their suitability for UAV-based grapevine segmentation in diverse and challenging vineyard environments. These results support the scalability of DL-based segmentation for vineyard monitoring applications and indicate that improved segmentation accuracy can contribute to decision support in precision viticulture.
2025
Autores
Moreira, V; Machado, E; Barbosa, D; Salgado, M; Braz, G; Cunha, A;
Publicação
Procedia Computer Science
Abstract
This article presents an investigation into the classification of endoscopic capsule pathologies using Multiple Instance Learning (MIL) methods in conjunction with deep neural network architectures. The primary problem addressed in this study is the accurate and efficient detection of gastrointestinal pathologies, a significant challenge in medical diagnostics that can have a profound impact on patient outcomes. The use of endoscopic capsules is particularly important as they provide a minimally invasive method to capture comprehensive images of the gastrointestinal tract, facilitating early detection of conditions such as ulcers, polyps, bleeding, and Crohn's disease. Specifically, we explore three variants of MIL-Max, Mean, and Attention-for analysing sets of images captured by the endoscopic capsule. MIL was employed because it effectively handles scenarios where individual image instances are not explicitly labelled but are grouped in bags with known labels, making it suitable for the complex nature of endoscopic data. Furthermore, MIL has not yet been extensively applied in this modality, highlighting the innovative aspect of our approach. In addition, we evaluated the performance of three convolutional neural network architectures-VGG16, ResNet50, and DenseNet121-in the classification task. The results indicate that the combination of MIL methods and deep neural network architectures offers a promising approach to the detection and classification of gastrointestinal pathologies, with significant improvements in diagnostic accuracy and efficiency. © 2025 The Author(s).
2025
Autores
Fernandes, I; Fernandes, R; Pessoa, A; Salgado, M; Paiva, A; Paçal, I; Cunha, A;
Publicação
Procedia Computer Science
Abstract
Capsule endoscopy is a medical technique for gastrointestinal examinations that is much more advantageous than traditional endoscopy. Medical specialists use RapidReaderTM to annotate endoscopic capsule video images (VCE). This process is time-consuming, error-prone, and expensive. The videos do not retain temporal markers, making it challenging to locate the annotated frames directly. Moreover, the annotated images often undergo enhancement and artifacts creation, which changes their resolution and visual properties compared to the original frames. This study proposes an approach to aid annotation using Deep Learning and content-based image Retrieval (CBIR) techniques to address this issue. A Siamese network with ResNet-18 architecture was trained to compare two medical images through their features and, with a classifier, assess whether they are a match or a mismatch. This methodology was evaluated on a dataset totalling 5792 image pairs and was subjected to several performance metrics: loss, accuracy, AUC (Area Under the Curve), precision, and recall. Various learning rates and optimizers were tested: Adam, SGD, and Adadelta highlighted the Adam optimizer with the best results. This approach produced an accuracy of 97.6% and an AUC of 0.9764 using the Adam optimizer, highlighting the model's potential to reduce manual annotation time significantly. © 2025 The Author(s).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.