Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Carlos Ferreira
  • Cargo

    Investigador Sénior
  • Desde

    01 janeiro 2010
007
Publicações

2025

Fine-Tuning Transformer-Based LLMs in Hierarchical Text Classification

Autores
Santos, J; Silva, N; Ferreira, C; Gama, J;

Publicação
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
Hierarchical document classification is essential for structuring large-scale textual corpora in domains such as digital libraries and academic repositories. While recent advances in large language models (LLMs) have opened new possibilities for text classification, their applicability to hierarchical settings under real-world constraints remains underexplored. This study investigates both generative and discriminative transformer-based models, evaluating their effectiveness across multiple inference strategies: zero-shot baseline, local fine-tuning, and a global approach using category-specific models. Experiments on two real-world hierarchical datasets provide a comprehensive comparison of classification accuracy, F1-macro scores, and inference times. The results highlight that, although generative LLMs can deliver competitive (yet variable) performance at higher levels of the hierarchy, their high inference costs hinder their use in time-sensitive applications. In contrast, fine-tuned discriminative models—particularly BERT-based architectures—consistently offer a more favorable trade-off between performance and efficiency. © 2025 Elsevier B.V., All rights reserved.

2024

Online News Classification Using Large Language Models with Semantic Enrichment

Autores
Santos, J; Silva, N; Ferreira, C; Gama, J;

Publicação
Joint Proceedings of Posters, Demos, Workshops, and Tutorials of the 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW-PDWT 2024) co-located with 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024), Amsterdam, Netherlands, November 26-28, 2024.

Abstract
This paper addresses a critical gap in applying semantic enrichment for online news text classification using large language models (LLMs) in fast-paced newsroom environments. While LLMs excel in static text classification tasks, they struggle in real-time scenarios where news topics and narratives evolve rapidly. The dynamic nature of news, with frequent introductions of new concepts and events, challenges pre-trained models, which often fail to adapt quickly to changes. Additionally, the potential of ontology-based semantic enrichment to enhance model adaptability in these contexts has been underexplored. To address these challenges, we propose a novel supervised news classification system that incorporates semantic enrichment to enhance real-time adaptability. This approach bridges the gap between static language models and the dynamic nature of modern newsrooms. The system operates on an adaptive prequential learning framework, continuously assessing model performance on incoming data streams to simulate real-time newsroom decision-making. It supports diverse content formats - text, images, audio, and video - and multiple languages, aligning with the demands of digital journalism. We explore three strategies for deploying LLMs in this dynamic environment: using pre-trained models directly, fine-tuning classifier layers while freezing the initial layers to accommodate new data, and continuously fine-tuning the entire model using real-time feedback combined with data selected based on specified criteria to enhance adaptability and learning over time. These approaches are evaluated incrementally as new data is introduced, reflecting real-time news cycles. Our findings demonstrate that ontology-based semantic enrichment consistently improves classification performance, enabling models to adapt effectively to emerging topics and evolving contexts. This study highlights the critical role of semantic enrichment, prequential evaluation, and continuous learning in building robust and adaptive news classification systems capable of thriving in the rapidly evolving digital news landscape. By augmenting news content with third-party ontology-based knowledge, our system provides deeper contextual understanding, enabling LLMs to navigate emerging topics and shifting narratives more effectively. Copyright © 2024 for this paper by its authors.

2024

More (Enough) Is Better: Towards Few-Shot Illegal Landfill Waste Segmentation

Autores
Molina, M; Veloso, B; Ferreira, CA; Ribeiro, RP; Gama, J;

Publicação
ECAI 2024

Abstract
Image segmentation for detecting illegal landfill waste in aerial images is essential for environmental crime monitoring. Despite advancements in segmentation models, the primary challenge in this domain is the lack of annotated data due to the unknown locations of illegal waste disposals. This work mainly focuses on evaluating segmentation models for identifying individual illegal landfill waste segments using limited annotations. This research seeks to lay the groundwork for a comprehensive model evaluation to contribute to environmental crime monitoring and sustainability efforts by proposing to harness the combination of agnostic segmentation and supervised classification approaches. We mainly explore different metrics and combinations to better understand how to measure the quality of this applied segmentation problem.

2024

Map-matching methods in agriculture

Autores
Silva, A; Mendes Moreira, J; Ferreira, C; Costa, N; Dias, D;

Publicação
COMPUTERS AND ELECTRONICS IN AGRICULTURE

Abstract
In this paper, a solution to monitor the location of humans during their activity in the agriculture sector with the aim to boost productivity and efficiency is provided. Our solution is based on map-matching methods, that are used to track the path spanned by a worker along a specific activity in an agriculture culture. Two different cultures are taken into consideration in this study olives and vines. We leverage the symmetry of the geometry of these cultures into our solution and divide the problem three-fold initially, we estimate a path of a worker along the fields, then we apply the map-matching to such path and finally, a post-processing method is applied to ensure local continuity of the sequence obtained from map-matching. The proposed methods are experimentally evaluated using synthetic and real data in the region of Mirandela, Portugal. Evaluation metrics show that results for synthetic data are robust under several sampling periods, while for real-world data, results for the vine culture are on par with synthetic, and for the olive culture performance is reduced.

2023

Modeling the Ink Tuning Process Using Machine Learning

Autores
Costa, C; Ferreira, CA;

Publicação
Intelligent Data Engineering and Automated Learning - IDEAL 2023 - 24th International Conference, Évora, Portugal, November 22-24, 2023, Proceedings

Abstract
Paint bases are the essence of the color palette, allowing for the creation of a wide range of tones by combining them in different proportions. In this paper, an Artificial Neural Network is developed incorporating a pre-trained Decoder to predict the proportion of each paint base in an ink mixture in order to achieve the desired color. Color coordinates in the CIELAB space and the final finish are considered as input parameters. The proposed model is compared with commonly used models such as Linear Regression, Random Forest and Artificial Neural Network. It is important to note that the Artificial Neural Network was implemented with the same architecture as the proposed model but without incorporating the pre-trained Decoder. Experimental results demonstrate that the Artificial Neural Network with a pre-trained Decoder consistently outperforms the other models in predicting the proportions of paint bases for color tuning. This model exhibits lower Mean Absolute Error and Root Mean Square Error values across multiple objectives, indicating its superior accuracy in capturing the complexities of color relationships. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.