Publications

Publications by CTM

2023

A Dataset for User Visual Behaviour with Multi-View Video Content

Authors
da Costa, TS; Andrade, MT; Viana, P; Silva, NC;

Publication
PROCEEDINGS OF THE 2023 PROCEEDINGS OF THE 14TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2023

Abstract
Immersive video applications impose unpractical bandwidth requirements for best-effort networks. With Multi-View(MV) streaming, these can be minimized by resorting to view prediction techniques. SmoothMV is a multi-view system that uses a non-intrusive head tracking mechanism to detect the viewer's interest and select appropriate views. By coupling Neural Networks (NNs) to anticipate the viewer's interest, a reduction of view-switching latency is likely to be obtained. The objective of this paper is twofold: 1) Present a solution for acquisition of gaze data from users when viewing MV content; 2) Describe a dataset, collected with a large-scale testbed, capable of being used to train NNs to predict the user's viewing interest. Tracking data from head movements was obtained from 45 participants using an Intel Realsense F200 camera, with 7 video playlists, each being viewed a minimum of 17 times. This dataset is publicly available to the research community and constitutes an important contribution to reducing the current scarcity of such data. Tools to obtain saliency/heat maps and generate complementary plots are also provided as an open-source software package.

CloseRead Abstract

2023

Analysis and Re-Synthesis of Natural Cricket Sounds Assessing the Perceptual Relevance of Idiosyncratic Parameters

Authors
Oliveira, M; Almeida, V; Silva, J; Ferreira, A;

Publication
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract
Cricket sounds are usually regarded as pleasant and, thus, can be used as suitable test signals in psychoacoustic experiments assessing the human listening acuity to specific temporal and spectral features. In addition, the simple structure of cricket sounds makes them prone to reverse engineering such that they can be analyzed and re-synthesized with desired alterations in their defining parameters. This paper describes cricket sounds from a parametric point of view, characterizes their main temporal and spectral features, namely jitter, shimmer and frequency sweeps, and explains a re-synthesis process generating modified natural cricket sounds. These are subsequently used in listening tests helping to shed light on the sound identification and discrimination capabilities of humans that are important, for example, in voice recognition. © 2023 IEEE.

CloseRead Abstract

2023

One-Step Discrete Fourier Transform-Based Sinusoid Frequency Estimation under Full-Bandwidth Quasi-Harmonic Interference

Authors
Silva, JM; Oliveira, MA; Saraiva, AF; Ferreira, AJS;

Publication
ACOUSTICS

Abstract
The estimation of the frequency of sinusoids has been the object of intense research for more than 40 years. Its importance in classical fields such as telecommunications, instrumentation, and medicine has been extended to numerous specific signal processing applications involving, for example, speech, audio, and music processing. In many cases, these applications run in real-time and, thus, require accurate, fast, and low-complexity algorithms. Taking the normalized Cramer-Rao lower bound as a reference, this paper evaluates the relative performance of nine non-iterative discrete Fourier transform-based individual sinusoid frequency estimators when the target sinusoid is affected by full-bandwidth quasi-harmonic interference, in addition to stationary noise. Three levels of the quasi-harmonic interference severity are considered: no harmonic interference, mild harmonic interference, and strong harmonic interference. Moreover, the harmonic interference is amplitude-modulated and frequency-modulated reflecting real-world conditions, e.g., in singing and musical chords. Results are presented for when the Signal-to-Noise Ratio varies between -10 dB and 70 dB, and they reveal that the relative performance of different frequency estimators depends on the SNR and on the selectivity and leakage of the window that is used, but also changes drastically as a function of the severity of the quasi-harmonic interference. In particular, when this interference is strong, the performance curves of the majority of the tested frequency estimators collapse to a few trends around and above 0.4% of the DFT bin width.

CloseRead Abstract

2023

Identification of words in whispered speech: The role of cues to fricatives' place and voicing

Authors
Jesus, LMT; Ferreira, JFS; Ferreira, AJS;

Publication
JASA EXPRESS LETTERS

Abstract
The temporal distribution of acoustic cues in whispered speech was analyzed using the gating paradigm. Fifteen Portuguese participants listened to real disyllabic words produced by four Portuguese speakers. Lexical choices, confidence scores, isolation points (IPs), and recognition points (RPs) were analyzed. Mixed effects models predicted that the first syllable and 70% of the total duration of the second syllable were needed for lexical choices to be above chance level. Fricatives' place, not voicing, had a significant effect on the percentage of correctly identified words. IP and RP values of words with postalveolar voiced and voiceless fricatives were significantly different.

CloseRead Abstract

2023

Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech

Authors
Jesus, LMT; Castilho, S; Ferreira, A; Costa, MC;

Publication
JOURNAL OF PHONETICS

Abstract
Purpose: The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.Method: Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ?/.Results: Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were esti-mated to be lower for whispered speech when compared to voiced speech. Fricatives' broad peak frequency was statistically significant when discriminating between /s/ and /?/.Conclusions: First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The rel-ative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform reha-bilitation strategies that can safely explore the production mechanisms of whispering.CO 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

CloseRead Abstract

2023

Joint Traffic and Obstacle-aware UAV Positioning Algorithm for Aerial Networks

Authors
Shafafi, K; Coelho, A; Campos, R; Ricardo, M;

Publication
2023 IEEE 9TH WORLD FORUM ON INTERNET OF THINGS, WF-IOT

Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly used as cost-effective and flexible Wi-Fi Access Points (APs) and cellular Base Stations (BSs) to enhance Quality of Service (QoS). In disaster management scenarios, UAV-based networks provide on-demand wireless connectivity when traditional infrastructures fail. In obstacle-rich environments like urban areas, reliable high-capacity communications links depend on Line-of-Sight (LoS) availability, especially at higher frequencies. Positioning UAVs to consider obstacles and enable LoS communications represents a promising solution that requires further exploration and development. The main contribution of this paper is the Traffic- and Obstacle-aware UAV Positioning Algorithm (TOPA). TOPA takes into account the users' traffic demand and the need for LoS between the UAV and the ground users in the presence of obstacles. The network performance achieved when using TOPA was evaluated through ns-3 simulations. The results show up to 100% improvement in the aggregate throughput without compromising fairness.

CloseRead Abstract