2026
Authors
Tabosa, C; Salgado, M; Leite, D; Cunha, A;
Publication
Procedia Computer Science
Abstract
Video capsule endoscopy (VCE) enables high-resolution visualisation of the small bowel but remains constrained by manual review of thousands of frames, which is time-consuming and error-prone under class imbalance. This study investigates deep learning for automatic multiclass lesion classification in VCE, comparing two convolutional networks (ResNet-50, EfficientNet-B3) with two Vision Transformers (Swin, DeiT) on the public Kvasir-Capsule dataset (47,161 images; 11 classes). The pipeline comprises standard preprocessing, class-aware augmentation and adaptive data augmentation, stratified data partitioning, hyperparameter optimisation with Optuna, and evaluation using accuracy, precision, recall, and F1-score. DeiT achieved the best overall performance (accuracy = 0.98; F1 = 0.96), with strong class-wise results in clinically salient categories (e.g., ulcer, fresh blood, angiectasia), indicating effective modelling of long-range dependencies and subtle patterns. We further assess computational feasibility by reporting training configuration and indicative inference time per image, supporting potential integration into assisted reading workflows. Limitations include reliance on a single public dataset, pronounced class imbalance, and the absence of prospective clinical validation, which may affect generalisability. These findings position Transformer-based models as promising candidates for VCE decision support, while underscoring the need for future work on (i) multicentric datasets and external validation, (ii) comprehensive statistical analysis with confidence intervals and robust baselines under imbalance, and (iii) prospective studies quantifying end-to-end impact on reading time and diagnostic safety. © 2025 The Authors. Published by Elsevier B.V.
2026
Authors
Costa, T; Castro, J; Salgado, M; Cunha, A;
Publication
Procedia Computer Science
Abstract
Video Capsule Endoscopy (VCE) is a pivotal technology in modern gastroenterology, offering a non-invasive method to visualize the entire small bowel. However, the clinical application of VCE is hampered by the extensive review time required, as specialists must manually analyze thousands of images from each procedure. This process is not only laborious and costly but also prone to diagnostic errors due to fatigue, subtle abnormalities, and variability in interpretation across clinicians. To address this challenge, deep learning methods have been explored to automate VCE image analysis. However, most existing approaches rely on a single model architecture, which often fails to generalize across the broad visual diversity found in gastrointestinal imagery. This limitation becomes especially pronounced in multiclass classification tasks, where the ability to distinguish between visually similar tissues and lesions is essential. Ensemble-based methods such as Mixture of Experts (MoE) have shown promising results in general computer vision by leveraging multiple specialized models for improved robustness. However, no prior work has investigated MoE or Hierarchical MoE (HMoE) architectures for multiclass classification of VCE or endoscopic images more broadly. To explore this opportunity, we present a comparative framework evaluating three deep learning strategies for VCE image classification: individual models, flat MoE systems, and Hierarchical MoE architectures. Using a subset of the Kvasir-Capsule dataset, which contains 12 gastrointestinal tissue and lesion classes, we first train and evaluate four backbone models (InceptionNeXt, EfficientViT, ConvNeXtV2, and DeiT3) to establish a performance baseline. The two best-performing architectures, ConvNeXtV2 and DeiT3, are then used as expert backbones within both MoE and HMoE systems. In the MoE configuration, a gating network assigns dynamic per-image weights to multiple expert instances. In contrast, the HMoE configuration constructs a learned binary tree that routes samples based on class similarity through increasingly specialized branches. In the HMoE models, ConvNeXtV2 outperformed DeiT3 in accuracy, whereas DeiT3 showed superior routing accuracy. These results indicate that expert-driven ensemble methods not only outperform standalone models but also offer complementary advantages depending on architecture and routing strategy. This study provides new evidence for the clinical potential of MoE and HMoE frameworks in scalable, accurate VCE image analysis. © 2025 The Authors. Published by Elsevier B.V.
2026
Authors
Machado, C; Pereira, P; Ferreira, M; Braz, G; Correia, N; Cunha, A;
Publication
Procedia Computer Science
Abstract
Glaucoma is one of the leading causes of irreversible blindness worldwide, affecting millions of people, often silently and progressively. Early diagnosis is crucial to slow its progression, but it remains challenging due to the need for manual analysis of large volumes of retinal images by trained specialists. In this context, automatic detection systems based on deep learning offer a promising opportunity to facilitate and accelerate the diagnostic process, providing scalability and high accuracy. This work presents the development of an automatic method for optic disc and optic cup segmentation in retinal fundus photographs, aiming to support early glaucoma detection. The proposed methodology is based on convolutional neural networks (CNNs), specifically an enhanced U-Net architecture with a ResNet50 backbone, incorporating attention mechanisms and data augmentation strategies to improve segmentation accuracy. The model was trained and validated using the REFUGE dataset, which contains high-quality fundus images with manual annotations of the disc and cup regions. Experimental results demonstrate that the developed model achieved an average Dice coefficient of 0.937 for optic disc segmentation and 0.828 for optic cup segmentation. Analysis of the cup-to-disc ratio (CDR) yielded mean values of VCDR = 0.497 ± 0.059, ACDR = 0.252 ± 0.060, and mean CDR = 0.375 ± 0.058, with 55.0% of cases classified as low risk, 43.3% as moderate risk, and 1.7% as high risk for glaucoma. These results highlight the potential of the proposed method as an assistive tool for automated glaucoma screening. © 2025 The Authors. Published by Elsevier B.V.
2026
Authors
Penedo, P; Machado, J; Anjos, R; Marta, A; Silva, AC; Cunha, A;
Publication
APPLIED SCIENCES-BASEL
Abstract
Eye diseases, such as glaucoma, diabetic retinopathy, and age-related macular degeneration, drive the growing need for reliable and scalable analyses of fundus and optical coherence tomography (OCT) images. Deep learning performs strongly in ocular structure segmentation. However, it typically relies on dense pixel-wise annotations, which are costly and difficult to obtain at scale. Weakly supervised learning (WSL) can reduce this burden by leveraging coarse labels, limited strong annotations, and unlabeled data. This systematic umbrella review synthesizes survey and review articles on weakly supervised deep learning for image segmentation, with a focus on ocular imaging (fundus and OCT/OCTA). After analyzing twenty-one secondary studies, the main finding reveals an empty intersection: WSL-focused segmentation surveys are often modality-agnostic. Conversely, ocular reviews are predominantly fully supervised and seldom offer quantitative evidence on annotation-effort savings or direct comparisons between weak and fully supervised methods on identical datasets. Across the included reviews, label-efficient strategies cluster around CAM/MIL formulations, sparse supervision (points/scribbles/boxes), pseudo-labelling/self-training, and semi-/self-supervised learning, implemented mainly with U-Net/DeepLab families and increasingly Transformer or hybrid backbones. These results provide a structured map of available WSL mechanisms and, critically, identify reproducible reporting gaps that currently prevent fair benchmarking in ocular segmentation. Therefore, this review supports the development of ocular-specific benchmarks and minimum reporting practices that link segmentation performance to annotation effort.
2023
Authors
Cunha, A; Garcia, NM; Gómez, JM; Pereira, S;
Publication
MobiHealth
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.