Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Hugo Manuel Oliveira
  • Role

    Research Assistant
  • Since

    01st March 2018
Publications

2026

Optimizing Medical Image Captioning with Conditional Prompt Encoding

Authors
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II

Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.

2025

A Unified Approach to Video Anomaly Detection: Advancements in Feature Extraction, Weak Supervision, and Strategies for Class Imbalance

Authors
Barbosa, Z; Oliveira, S;

Publication
IEEE Access

Abstract
This paper explores advancements in Video Anomaly Detection (VAD), combining theoretical insights with practical solutions to address model limitations. Through comprehensive experimental analysis, the study examines the role of feature representations, sampling strategies, and curriculum learning in enhancing VAD performance. Key findings include the impact of class imbalance on the Cross-Modal Awareness-Local Arousal (CMALA) architecture and the effectiveness of techniques like pseudo-curriculum learning in mitigating noisy classes, such as “Car Accident.” Novel strategies like the Sample-Batch Selection (SBS) dynamic segment selection and pre-trained image-text models, including Contrastive Language-Image Pre-training (CLIP) and ViTamin encoder, significantly improve anomaly detection. The research underscores the potential of multimodal VAD, highlighting the integration of audio and visual modalities and the development of multimodal fusion techniques. To support this evolution, the study proposes a Unified WorkStation 4 VAD (UWS4VAD) to streamline research workflows and introduces a new VAD benchmark incorporating multimodal data and textual information. The work envisions enhanced anomaly interpretation and performance by leveraging joint representation learning and Large Language Models (LLMs). The findings set the stage for future advancements, advocating for large-scale pre-training on audio-visual datasets and shifting toward a more integrated, multimodal approach to VADs. © 2013 IEEE.

2025

A Unified Approach to Video Anomaly Detection: Advancements in Feature Extraction, Weak Supervision, and Strategies for Class Imbalance

Authors
Barbosa, RZ; Oliveira, HS;

Publication
IEEE ACCESS

Abstract
This paper explores advancements in Video Anomaly Detection (VAD), combining theoretical insights with practical solutions to address model limitations. Through comprehensive experimental analysis, the study examines the role of feature representations, sampling strategies, and curriculum learning in enhancing VAD performance. Key findings include the impact of class imbalance on the Cross-Modal Awareness-Local Arousal (CMALA) architecture and the effectiveness of techniques like pseudo-curriculum learning in mitigating noisy classes, such as Car Accident. Novel strategies like the Sample-Batch Selection (SBS) dynamic segment selection and pre-trained image-text models, including Contrastive Language-Image Pre-training (CLIP) and ViTamin encoder, significantly improve anomaly detection. The research underscores the potential of multimodal VAD, highlighting the integration of audio and visual modalities and the development of multimodal fusion techniques. To support this evolution, the study proposes a Unified WorkStation 4 VAD (UWS4VAD) to streamline research workflows and introduces a new VAD benchmark incorporating multimodal data and textual information. The work envisions enhanced anomaly interpretation and performance by leveraging joint representation learning and Large Language Models (LLMs). The findings set the stage for future advancements, advocating for large-scale pre-training on audio-visual datasets and shifting toward a more integrated, multimodal approach to VADs. Source code of the project available at https://github.com/zuble/uws4vad

2025

Optimizing crowd evacuation: evaluation of strategies for safety and efficiency

Authors
Oliveira, S;

Publication
Journal of Reliable Intelligent Environments

Abstract
Predicting and controlling crowd dynamics in emergencies is one of the main objectives of simulated emergency exercises. However, during emergency exercises, there is often a lack of sense of danger by the actors involved and concerns about exposing real people to potentially dangerous environments. These problems impose limitations in running an emergency drill, harming the collection of valuable information for posterior analysis and decision-making. This work aims to mitigate these problems by using Agent Based Modelling (ABM) simulator to deepen the comprehension of human actions when exposed to a sudden variation in extensive crowded environmental conditions and how evacuation strategies affect evacuation performance. To assess the impact of the evacuation strategy employed, we propose a modified informed leader-flowing approach and compare it with common evacuation strategies in a simulated environment, replicating stadium benches with narrow corridors leading to different exit points. The objective is to determine the impact of each set of configurations and evacuation strategies and compare them against other established ones. Our experiments determined that agents following the crowd generally lead to a higher number of victims due to the rise of herding phenomena near the exits, which was significantly reduced when agents were guided towards the exit via knowing the exit beforehand or following leader agent with real-time information regarding exit location and exit current state, proving that relevant and controlled information in combination with Follow Leader strategies can be crucial in an emergency evacuation scenario with limited evacuation exit capabi and distribution. © The Author(s) 2024.

2025

Robust Visual Transformers for Medical Image Classification

Authors
Montrezol J.; Oliveira H.S.; Araujo J.; Oliveira H.P.;

Publication
Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual International Conference

Abstract
The Vision Transformer (ViT) architecture has emerged as a potential game-changer in computer vision, offering scalability and global attention that have generated considerable interest in recent years. Its adaptability has fueled enthusiasm for its application. This work investigates the boundaries of the architecture, focusing on developing new techniques targeting explicitly complex tasks, such as medical imaging datasets, which often exhibit high variability, class imbalance, and limited sample sizes. We propose a set of mixed regularisation and augmentation techniques to enhance the performance of models. These include a novel loss function and a smoothly differentiable activation function, leading to more stable training and model performance. The results show that incorporating these techniques improves model performance and training convergence.