Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

2024

Systematic Analysis of the Impact of Label Noise Correction on ML Fairness

Autores
Silva, IOE; Soares, C; Sousa, I; Ghani, R;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction [20] method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction [14] can reduce discrimination the most, however, at the cost of lower predictive performance.

2024

Anonymised Phone Call Dataset for Anomaly Detection

Autores
Veloso, B; Martins, C; Espanha, R; Silva, PR; Azevedo, R; Gama, J;

Publicação

Abstract

2024

Online News Classification Using Large Language Models with Semantic Enrichment

Autores
Santos, J; Silva, N; Ferreira, C; Gama, J;

Publicação
Joint Proceedings of Posters, Demos, Workshops, and Tutorials of the 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW-PDWT 2024) co-located with 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024), Amsterdam, Netherlands, November 26-28, 2024.

Abstract
This paper addresses a critical gap in applying semantic enrichment for online news text classification using large language models (LLMs) in fast-paced newsroom environments. While LLMs excel in static text classification tasks, they struggle in real-time scenarios where news topics and narratives evolve rapidly. The dynamic nature of news, with frequent introductions of new concepts and events, challenges pre-trained models, which often fail to adapt quickly to changes. Additionally, the potential of ontology-based semantic enrichment to enhance model adaptability in these contexts has been underexplored. To address these challenges, we propose a novel supervised news classification system that incorporates semantic enrichment to enhance real-time adaptability. This approach bridges the gap between static language models and the dynamic nature of modern newsrooms. The system operates on an adaptive prequential learning framework, continuously assessing model performance on incoming data streams to simulate real-time newsroom decision-making. It supports diverse content formats - text, images, audio, and video - and multiple languages, aligning with the demands of digital journalism. We explore three strategies for deploying LLMs in this dynamic environment: using pre-trained models directly, fine-tuning classifier layers while freezing the initial layers to accommodate new data, and continuously fine-tuning the entire model using real-time feedback combined with data selected based on specified criteria to enhance adaptability and learning over time. These approaches are evaluated incrementally as new data is introduced, reflecting real-time news cycles. Our findings demonstrate that ontology-based semantic enrichment consistently improves classification performance, enabling models to adapt effectively to emerging topics and evolving contexts. This study highlights the critical role of semantic enrichment, prequential evaluation, and continuous learning in building robust and adaptive news classification systems capable of thriving in the rapidly evolving digital news landscape. By augmenting news content with third-party ontology-based knowledge, our system provides deeper contextual understanding, enabling LLMs to navigate emerging topics and shifting narratives more effectively. Copyright © 2024 for this paper by its authors.

2024

Recent Advances in Learning from Data Streams

Autores
Gama, J;

Publicação
Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2024, Volume 1: KDIR, Porto, Portugal, November 17-19, 2024.

Abstract

2024

Next Location Prediction with Time-Evolving Markov Models over Data Streams

Autores
Andrade, T; Gama, J;

Publicação
Progress in Artificial Intelligence - 23rd EPIA Conference on Artificial Intelligence, EPIA 2024, Viana do Castelo, Portugal, September 3-6, 2024, Proceedings, Part III

Abstract
Various relevant aspects of our lives relate to the places we visit and our daily activities. The movement of individuals between regular places, such as work, school, or other important personal locations is getting increasing attention due to the pervasiveness of geolocation devices and the amount of data they generate. This paper presents an approach for personal location prediction using a probabilistic model and data mining techniques over mobility data streams. We extract the individuals’ locations from relevant events in a data stream to build and maintain a Markov Chain over the important places. We evaluate the method over 3 real-world datasets. The results show the usefulness of the proposal in comparison with other well-known approaches. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

  • 41
  • 515