Publicacoes - INESC TEC

Publicações

Publicações por Pedro Miguel Carvalho

2009

Partition-distance methods for assessing spatial segmentations of images and videos

Autores
Cardoso, JS; Carvalho, P; Teixeira, LF; Corte Real, L;

Publicação
COMPUTER VISION AND IMAGE UNDERSTANDING

Abstract
The primary goal of the research on image segmentation is to produce better segmentation algorithms. In spite of almost 50 years of research and development in this Held, the general problem of splitting in image into meaningful regions remains unsolved. New and emerging techniques are constantly being applied with reduced Success. The design of each of these new segmentation algorithms requires spending careful attention judging the effectiveness of the technique. This paper demonstrates how the proposed methodology is well suited to perform a quantitative comparison between image segmentation algorithms using I ground-truth segmentation. It consists of a general framework already partially proposed in the literature, but dispersed over several works. The framework is based on the principle of eliminating the minimum number of elements Such that a specified condition is met. This rule translates directly into a global optimization procedure and the intersection-graph between two partitions emerges as the natural tool to solve it. The objective of this paper is to summarize, aggregate and extend the dispersed work. The principle is clarified, presented striped of unnecessary supports and extended to sequences of images. Our Study shows that the proposed framework for segmentation performance evaluation is simple, general and mathematically sound.

FecharLer Abstract

2023

From a Visual Scene to a Virtual Representation: A Cross-Domain Review

Autores
Pereira, A; Carvalho, P; Pereira, N; Viana, P; Corte-Real, L;

Publicação
IEEE ACCESS

Abstract
The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.

FecharLer Abstract Ler Publicação Completa

2023

Unveiling the performance of video anomaly detection models - A benchmark-based review

Autores
Caetano, F; Carvalho, P; Cardoso, JS;

Publicação
Intell. Syst. Appl.

Abstract
Deep learning has recently gained popularity in the field of video anomaly detection, with the development of various methods for identifying abnormal events in visual data. The growing need for automated systems to monitor video streams for anomalies, such as security breaches and violent behaviours in public areas, requires the development of robust and reliable methods. As a result, there is a need to provide tools to objectively evaluate and compare the real-world performance of different deep learning methods to identify the most effective approach for video anomaly detection. Current state-of-the-art metrics favour weakly-supervised strategies stating these as the best-performing approaches for the task. However, the area under the ROC curve, used to justify this statement, has been shown to be an unreliable metric for highly unbalanced data distributions, as is the case with anomaly detection datasets. This paper provides a new perspective and insights on the performance of video anomaly detection methods. It reports the results of a benchmark study with state-of-the-art methods using a novel proposed framework for evaluating and comparing the different models. The results of this benchmark demonstrate that using the currently employed set of reference metrics led to the misconception that weakly-supervised methods consistently outperform semi-supervised ones. © 2023 The Authors

FecharLer Abstract Ler Publicação Completa

2023

Benchmarking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models

Autores
Magalhaes, SC; dos Santos, FN; Machado, P; Moreira, AP; Dias, J;

Publicação
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU-Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU-Tensor Processing Unit (such as Coral Dev Board TPU), and DPU-Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Methods: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency.Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5 W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.

FecharLer Abstract

2023

A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition

Autores
Guimaraes, V; Nascimento, J; Viana, P; Carvalho, P;

Publicação
APPLIED SCIENCES-BASEL

Abstract
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.

FecharLer Abstract