Publicacoes - INESC TEC

Publicações

Publicações por HumanISE

2024

Indexing Portuguese NLP Resources with PT-Pump-Up

Autores
Almeida, R; Campos, R; Jorge, A; Nunes, S;

Publicação
PROPOR (2)

Abstract
The recent advances in natural language processing (NLP) are linked to training processes that require vast amounts of corpora. Access to this data is commonly not a trivial process due to resource dispersion and the need to maintain these infrastructures online and up-to-date. New developments in NLP are often compromised due to the scarcity of data or lack of a shared repository that works as an entry point to the community. This is especially true in low and mid-resource languages, such as Portuguese, which lack data and proper resource management infrastructures. In this work, we propose PT-Pump-Up, a set of tools that aim to reduce resource dispersion and improve the accessibility to Portuguese NLP resources. Our proposal is divided into four software components: a) a web platform to list the available resources; b) a client-side Python package to simplify the loading of Portuguese NLP resources; c) an administrative Python package to manage the platform and d) a public GitHub repository to foster future collaboration and contributions. © 2024 PROPOR. All Rights Reserved.

FecharLer Abstract

2024

Network-based Approach for Stopwords Detection

Autores
António Ali, FDM; Jesus, Gd; Cardoso, HL; Nunes, S; Silva, RS;

Publicação
PROPOR (2)

Abstract
Stopword lists, an essential resource for natural language processing and information retrieval, are often unavailable for low-resource languages. Creating these lists is time-consuming and expensive, making automated stopword detection a viable alternative. This paper introduces a novel stopword detection approach that exploits the topological properties of co-occurrence networks to identify function words. By leveraging the connectivity patterns of function words in these networks, the proposed approach aims to achieve higher precision compared to traditional frequency-based methods. To assess the effectiveness of the network-based approach, we constructed co-occurrence networks for Tetun and Emakhuwa (low-resourced languages), as well as English and Portuguese. We then compared the performance of this approach with traditional frequency-based methods. The results indicate that the network-based approach consistently outperforms traditional methods, with in-degree emerging as the most reliable indicator of function words. This finding suggests promising prospects for automatically generating stopword lists in other low-resource languages, paving the way for developing natural language processing tools for these linguistic contexts. © 2024 PROPOR. All Rights Reserved.

FecharLer Abstract

2024

A Fast and Energy-Efficient Method for Online and Incremental Pareto-Front Update

Autores
Ferreira, PJS; Moreira, JM; Cardoso, JMP;

Publicação
WF-IoT

Abstract
Self-adaptive Systems (SaS) are becoming increasingly important for adapting to dynamic environments and for optimizing performance on resource-constrained devices. A practical approach to achieving self-adaptability involves using a Pareto-Front (PF) to store the system's hyper-parameters and the outcomes of hyperparameter combinations. This paper proposes a novel method to approximate a PF, offering a configurable number of solutions that can be adapted to the device's limitations. We conducted extensive experiments across various scenarios, where all PF solutions were replaced, and real world scenarios were performed using actual measurements from a Human Activity Recognition (HAR) system. Our results show that our method consistently outperforms previous methods, mainly when the maximum number of PF solutions is in the order of hundreds. The effectiveness of our method is most apparent in real-case scenarios where it achieves, when executed in a Raspberry Pi 5, up to 87% energy consumption reduction and lower execution times than the second-best algorithm. Additionally, our method ensures a more evenly distributed solution across the PF, preventing the high concentration of solutions.

FecharLer Abstract

2024

Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2024, Porto, Portugal, June 19-21, 2024

Autores
Josipovic, L; Zhou, P; Shanker, S; Cardoso, JMP; Anderson, J; Yuichiro, S;

Publicação
HEART

Abstract

2024

A Flexible-Granularity Task Graph Representation and Its Generation from C Applications (WIP)

Autores
Santos, T; Bispo, J; Cardoso, JMP;

Publicação
PROCEEDINGS OF THE 25TH ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2024

Abstract
Modern hardware accelerators, such as FPGAs, allow offloading large regions of C/C++ code in order to improve the execution time and/or the energy consumption of software applications. An outstanding challenge with this approach, however, is solving the Hardware/Software (Hw/Sw) partitioning problem. Given the increasing complexity of both the accelerators and the potential code regions, one needs to adopt a holistic approach when selecting an offloading region by exploring the interplay between communication costs, data usage patterns, and target-specific optimizations. To this end, we propose representing a C application as an extended task graph (ETG) with flexible granularity, which can be manipulated through the merging and splitting of tasks. This approach involves generating a task graph overlay on the program's Abstract Syntax Tree (AST) that maps tasks to functions and the flexible granularity operations onto inlining/outlining operations. This maintains the integrity and readability of the original source code, which is paramount for targeting different accelerators and enabling code optimizations, while allowing the offloading of code regions of arbitrary complexity based on the data patterns of their tasks. To evaluate the ETG representation and its compiler, we use the latter to generate ETGs for the programs in Rosetta and MachSuite benchmark suites, and extract several metrics regarding data communication, task-level parallelism, and dataflow patterns between pairs of tasks. These metrics provide important information that can be used by Hw/Sw partitioning methods.

FecharLer Abstract

2024

Cues to fast-forward collaboration: A Survey of Workspace Awareness and Visual Cues in XR Collaborative Systems

Autores
Assaf, R; Mendes, D; Rodrigues, R;

Publicação
COMPUTER GRAPHICS FORUM

Abstract
Collaboration in extended reality (XR) environments presents complex challenges that revolve around how users perceive the presence, intentions, and actions of their collaborators. This paper delves into the intricate realm of group awareness, focusing specifically on workspace awareness and the innovative visual cues designed to enhance user comprehension. The research begins by identifying a spectrum of collaborative situations drawn from an analysis of XR prototypes in the existing literature. Then, we describe and introduce a novel classification for workspace awareness, along with an exploration of visual cues recently employed in research endeavors. Lastly, we present the key findings and shine a spotlight on promising yet unexplored topics. This work not only serves as a reference for experienced researchers seeking to inform the design of their own collaborative XR applications but also extends a welcoming hand to newcomers in this dynamic field.

FecharLer Abstract