Publicacoes - INESC TEC

Publicações

Publicações por Andry Maykol Pinto

2025

Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction

Autores
Leite, PN; Pinto, AM;

Publicação
INFORMATION FUSION

Abstract
Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tridimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder-decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124 m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.

FecharLer Abstract