2025
Authors
Wu, X; Spiliopoulou, M; Wang, C; Kumar, V; Cao, L; Zhou, X; Pang, G; Gama, J;
Publication
PAKDD (7)
Abstract
2025
Authors
Wu, X; Spiliopoulou, M; Wang, C; Kumar, V; Cao, L; Zhou, X; Pang, G; Gama, J;
Publication
PAKDD (6)
Abstract
2024
Authors
Veloso, B; Martins, C; Espanha, R; Silva, PR; Azevedo, R; Gama, J;
Publication
Abstract
2024
Authors
Santos, J; Silva, N; Ferreira, C; Gama, J;
Publication
EKAW (Companion)
Abstract
This paper addresses a critical gap in applying semantic enrichment for online news text classification using large language models (LLMs) in fast-paced newsroom environments. While LLMs excel in static text classification tasks, they struggle in real-time scenarios where news topics and narratives evolve rapidly. The dynamic nature of news, with frequent introductions of new concepts and events, challenges pre-trained models, which often fail to adapt quickly to changes. Additionally, the potential of ontology-based semantic enrichment to enhance model adaptability in these contexts has been underexplored. To address these challenges, we propose a novel supervised news classification system that incorporates semantic enrichment to enhance real-time adaptability. This approach bridges the gap between static language models and the dynamic nature of modern newsrooms. The system operates on an adaptive prequential learning framework, continuously assessing model performance on incoming data streams to simulate real-time newsroom decision-making. It supports diverse content formats - text, images, audio, and video - and multiple languages, aligning with the demands of digital journalism. We explore three strategies for deploying LLMs in this dynamic environment: using pre-trained models directly, fine-tuning classifier layers while freezing the initial layers to accommodate new data, and continuously fine-tuning the entire model using real-time feedback combined with data selected based on specified criteria to enhance adaptability and learning over time. These approaches are evaluated incrementally as new data is introduced, reflecting real-time news cycles. Our findings demonstrate that ontology-based semantic enrichment consistently improves classification performance, enabling models to adapt effectively to emerging topics and evolving contexts. This study highlights the critical role of semantic enrichment, prequential evaluation, and continuous learning in building robust and adaptive news classification systems capable of thriving in the rapidly evolving digital news landscape. By augmenting news content with third-party ontology-based knowledge, our system provides deeper contextual understanding, enabling LLMs to navigate emerging topics and shifting narratives more effectively.
2025
Authors
Zhang, C; Wu, S; Chen, Y; Aßenmacher, M; Heumann, C; Men, Y; Fan, G; Gama, J;
Publication
CoRR
Abstract
2025
Authors
Zhao, RR; You, YQ; Sun, JB; Gama, J; Jiang, J;
Publication
INFORMATION PROCESSING & MANAGEMENT
Abstract
Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.