2024
Authors
Sulun, S; Viana, P; Davies, MEP;
Publication
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
2024
Authors
Pereira, B; Cunha, B; Viana, P; Lopes, M; Melo, ASC; Sousa, ASP;
Publication
SENSORS
Abstract
Shoulder rehabilitation is a process that requires physical therapy sessions to recover the mobility of the affected limbs. However, these sessions are often limited by the availability and cost of specialized technicians, as well as the patient's travel to the session locations. This paper presents a novel smartphone-based approach using a pose estimation algorithm to evaluate the quality of the movements and provide feedback, allowing patients to perform autonomous recovery sessions. This paper reviews the state of the art in wearable devices and camera-based systems for human body detection and rehabilitation support and describes the system developed, which uses MediaPipe to extract the coordinates of 33 key points on the patient's body and compares them with reference videos made by professional physiotherapists using cosine similarity and dynamic time warping. This paper also presents a clinical study that uses QTM, an optoelectronic system for motion capture, to validate the methods used by the smartphone application. The results show that there are statistically significant differences between the three methods for different exercises, highlighting the importance of selecting an appropriate method for specific exercises. This paper discusses the implications and limitations of the findings and suggests directions for future research.
2024
Authors
Gouveia, F; Silva, V; Lopes, J; Moreira, RS; Torres, JM; Guerreiro, MS;
Publication
DISCOVER APPLIED SCIENCES
Abstract
Accurate and up-to-date information about the building stock is fundamental to better understand and mitigate the impact caused by catastrophic earthquakes, as seen recently in Turkey, Syria, Morocco and Afghanistan. Planning for such events is necessary to increase the resilience of the building stock and to minimize casualties and economic losses. Although in several parts of the world new constructions follow more strict compliance with modern seismic codes, a large proportion of existing building stock still demands a more detailed and automated vulnerability analysis. Hence, this paper proposes the use of computer vision deep learning models to automatically classify buildings and create large scale (city or region) exposure models. Such approach promotes the use of open databases covering most cities in the world (cf. OpenStreetMap, Google Street View, Bing Maps and satellite imagery), Therefore providing valuable geographical, topological and image data that may cheaply be used to extract valuable information to feed exposure models. Our previous work using deep learning models achieved, in line with the results from other projects, high classification accuracy concerning building materials and number of storeys. This paper extends the approach by: (i) implementing four CNN-based models to perform classification of three sets of different/extended buildings' characteristics; (ii) training and comparing the performance of the four models for each of the sets; (iii) comparing the risk assessment results based on data extracted from the best CNN-based model against the results obtained with traditional ground data. In brief, the best accuracy obtained with the three tested sets of buildings' characteristics is higher than 80%. Moreover, it is shown that the error resulting from using exposure models fed by automatic classification is not only acceptable, but also far outweighs the time and costs of obtaining a manual and specialised classification of building stocks. Finally, we recognize that automatic assessment of certain complex buildings' characteristics compares to similar limitations of traditional assessments performed by specialized civil engineers, typically related with the identification of the number of storeys and the construction material. However, the identified limitations do not show worse results when compared against the use of manual buildings' assessment. Implement an AI/ML framework for automating the collection of buildings' fa & ccedil;ades pictures annotated with several characteristics required by Exposure Models.Collect, process and filter a 4.239 pictures dataset of buildings' fa & ccedil;ades, which was made publicly available.Train, validate and test several Deep Learning models using 3 sets of building characteristics to produce exposure models with accuracies above 80%.Use heatmaps to show which image areas are more activated for a given prediction, thus helping to explain classification results.Compare simulation results using the predicted exposure model and a manually created exposure model, for the same set of buildings.
2024
Authors
Moita, S; Moreira, RS; Gouveia, F; Torres, JM; Gerreiro, MS; Ferreira, D; Sucena, S; Dinis, MA;
Publication
2024 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS-2024
Abstract
There is a widespread social awareness for the need of adequate accessibility (e.g. missing ramps at crosswalks, obstacles and potholes at sidewalks) in the planning of safe and inclusive city spaces for all citizens. Therefore, municipal authorities responsible for planning urban spaces could benefit from the use of tools for automating the identification of areas in need of accessibility improving interventions. This paper builds on the assumption that it is possible to use Machine Learning (ML) pipelines for automating the detection of accessibility constraints in public spaces, particularly on sidewalks. Those pipelines rely mostly on Deep Learning algorithms to automate the detection of common accessibility issues. Current literature approaches rely on the use of traditional classifiers focused on images' datasets containing single-labelled accessibility classes. We propose an alternative approach using object-detection models that provide a more generic and human-like mode, as it will look into wider city pictures to spot multiple accessibility problems at once. Hence, we evaluate and compare the results of a more generic YOLO model against previous results obtained by more traditional ResNet classification models. The ResNet models used in Project Sidewalk were trained and tested on per-city basis datasets of images crowd-labeled with accessibility attributes. By combining the use of the Project Sidewalk and Google Street View (GSV) service APIs, we re-assembled a world-cities-mix dataset used to train, validate and test the YOLO object-detection model, which exhibited precision and recall values above 84%. Our team of architects and civil engineers also collected a labeled image dataset from two central areas of Porto city, which was used to jointly train and test the YOLO model. The results show that training (even with a small dataset of Porto) the cities-mix-trained YOLO model, provides comparable precision values against the ones obtained by ResNet per-city classifiers. Furthermore, the YOLO approach offers a more human-like generic and efficient pipeline, thus justifying its future exploitation on automating cataloging accessibility mappings in cities.
2024
Authors
Lopes, JM; Mota, LP; Mota, SM; Torres, JM; Moreira, RS; Soares, C; Pereira, I; Gouveia, F; Sobral, P;
Publication
Abstract
2024
Authors
Lopes, JM; Mota, LP; Mota, SM; Torres, JM; Moreira, RS; Soares, C; Pereira, I; Gouveia, FR; Sobral, P;
Publication
FUTURE INTERNET
Abstract
All types of sports are potential application scenarios for automatic and real-time visual object and event detection. In rink hockey, the popular roller skate variant of team hockey, it is of great interest to automatically track player movements, positions, and sticks, and also to make other judgments, such as being able to locate the ball. In this work, we present a real-time pipeline consisting of an object detection model specifically designed for rink hockey games, followed by a knowledge-based event detection module. Even in the presence of occlusions and fast movements, our deep learning object detection model effectively identifies and tracks important visual elements in real time, such as: ball, players, sticks, referees, crowd, goalkeeper, and goal. Using a curated dataset consisting of a collection of rink hockey videos containing 2525 annotated frames, we trained and evaluated the algorithm's performance and compared it to state-of-the-art object detection techniques. Our object detection model, based on YOLOv7, presents a global accuracy of 80% and, according to our results, good performance in terms of accuracy and speed, making it a good choice for rink hockey applications. In our initial tests, the event detection module successfully detected an important event type in rink hockey games, namely, the occurrence of penalties.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.