Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CTM

2023

Synthesizing Human Activity for Data Generation

Authors
Romero, A; Carvalho, P; Corte-Real, L; Pereira, A;

Publication
JOURNAL OF IMAGING

Abstract
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.

2023

Misalignment-Resilient Propagation Model for Underwater Optical Wireless Links

Authors
Araujo, JH; Tavares, JS; Marques, VM; Salgado, HM; Pessoa, LM;

Publication
SENSORS

Abstract
This paper proposes a multiple-lens receiver scheme to increase the misalignment tolerance of an underwater optical wireless communications link between an autonomous underwater vehicle (AUV) and a sensor plane. An accurate model of photon propagation based on the Monte Carlo simulation is presented which accounts for the lens(es) photon refraction at the sensor interface and angular misalignment between the emitter and receiver. The results show that the ideal divergence of the beam of the emitter is around 15 degrees for a 1 m transmission length, increasing to 22 degrees for a shorter distance of 0.5 m but being independent of the water turbidity. In addition, it is concluded that a seven-lense scheme is approximately three times more tolerant to offset than a single lens. A random forest machine learning algorithm is also assessed for its suitability to estimate the offset and angle of the AUV in relation to the fixed sensor, based on the power distribution of each lens, in real time. The algorithm is able to estimate the offset and angular misalignment with a mean square error of 5 mm (6 mm) and 0.157 rad (0.174 rad) for a distance between the transmitter and receiver of 1 m and 0.5 m, respectively.

2023

Sigma-Delta Modulation for Enhanced Underwater Optical Wireless Communication Systems

Authors
Araújo J.H.; Rocha H.J.; Tavares J.S.; Salgado H.M.;

Publication
International Conference on Transparent Optical Networks

Abstract
This paper presents an experimental investigation of sigma-delta modulation (SDM) as a means of improving the performance of underwater optical communication systems. The study considers the impact of the key parameters of SDM, including oversampling ratio, the system's signal-to-noise ratio, bandwidth, and optical link distance. The results of this study provide insights into the design and optimization of SDM-based underwater optical communication systems, paving the way for future research in this field. A fully digital solution, albeit operating at a lower bit rate than previously published OFDM counterparts, provides immunity against nonlinearities of the system and robustness to noise, which is relevant in harsh environments. Moreover, the proposed solution based on a first-order bandpass SDM architecture avoids the employment of a DAC at the receiver, simplifying its operation and reducing costs. An experimental investigation is carried out for the transmission of 16-QAM over SDM, and a transmission distance of 4.8 m over the underwater channel is achieved with a maximum transmission rate of 400 Mbit/s with an MER of 28 dB.

2023

A Dataset for User Visual Behaviour with Multi-View Video Content

Authors
da Costa, TS; Andrade, MT; Viana, P; Silva, NC;

Publication
PROCEEDINGS OF THE 2023 PROCEEDINGS OF THE 14TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2023

Abstract
Immersive video applications impose unpractical bandwidth requirements for best-effort networks. With Multi-View(MV) streaming, these can be minimized by resorting to view prediction techniques. SmoothMV is a multi-view system that uses a non-intrusive head tracking mechanism to detect the viewer's interest and select appropriate views. By coupling Neural Networks (NNs) to anticipate the viewer's interest, a reduction of view-switching latency is likely to be obtained. The objective of this paper is twofold: 1) Present a solution for acquisition of gaze data from users when viewing MV content; 2) Describe a dataset, collected with a large-scale testbed, capable of being used to train NNs to predict the user's viewing interest. Tracking data from head movements was obtained from 45 participants using an Intel Realsense F200 camera, with 7 video playlists, each being viewed a minimum of 17 times. This dataset is publicly available to the research community and constitutes an important contribution to reducing the current scarcity of such data. Tools to obtain saliency/heat maps and generate complementary plots are also provided as an open-source software package.

2023

Deep Learning Approach for Seamless Navigation in Multi-View Streaming Applications

Authors
Costa, TS; Viana, P; Andrade, MT;

Publication
IEEE ACCESS

Abstract
Quality of Experience (QoE) in multi-view streaming systems is known to be severely affected by the latency associated with view-switching procedures. Anticipating the navigation intentions of the viewer on the multi-view scene could provide the means to greatly reduce such latency. The research work presented in this article builds on this premise by proposing a new predictive view-selection mechanism. A VGG16-inspired Convolutional Neural Network (CNN) is used to identify the viewer's focus of attention and determine which views would be most suited to be presented in the brief term, i.e., the near-term viewing intentions. This way, those views can be locally buffered before they are actually needed. To this aim, two datasets were used to evaluate the prediction performance and impact on latency, in particular when compared to the solution implemented in the previous version of our multi-view streaming system. Results obtained with this work translate into a generalized improvement in perceived QoE. A significant reduction in latency during view-switching procedures was effectively achieved. Moreover, results also demonstrated that the prediction of the user's visual interest was achieved with a high level of accuracy. An experimental platform was also established on which future predictive models can be integrated and compared with previously implemented models.

2023

Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech

Authors
Jesus, LMT; Castilho, S; Ferreira, A; Costa, MC;

Publication
JOURNAL OF PHONETICS

Abstract
Purpose: The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.Method: Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ?/.Results: Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were esti-mated to be lower for whispered speech when compared to voiced speech. Fricatives' broad peak frequency was statistically significant when discriminating between /s/ and /?/.Conclusions: First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The rel-ative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform reha-bilitation strategies that can safely explore the production mechanisms of whispering.CO 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

  • 4
  • 318