Publications

2025

Histogram approaches for imbalanced data streams regression

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING

Abstract
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshev's inequality as the first methodologies designed explicitly for data streams. However, these approaches operated under the restrictive assumption that rare instances exclusively reside at distribution extremes. This study introduces histogram-based sampling strategies to overcome this constraint, proposing flexible solutions for imbalanced regression in evolving data streams. The proposed techniques - Histogram-based Undersampling (HistUS) and Histogram-based Oversampling (HistOS) - employ incremental online histograms to dynamically detect and prioritize rare instances across arbitrary regions of the target distribution to improve predictions in the rare cases. Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy, outperforming baseline models while maintaining competitiveness with Chebyshev-based approaches.

CloseRead Abstract

2025

Specifying Distributed Hash Tables with Allen Temporal Logic

Authors
Policarpo, N; Santos, JF; Cunha, A; Leitao, J; Costa, PA;

Publication
2025 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON FORMAL METHODS IN SOFTWARE ENGINEERING, FORMALISE

Abstract
Distributed Hash Tables (DHTs) remain to this day a central component of modern peer-to-peer (P2P) systems, which rely on complex DHT protocols to scale to millions of nodes. The correct operation of DHTs is therefore essential for the proper functioning of these systems. For this reason, formal methods have been employed to model and verify a range of correctness properties of various DHT protocols. However, these verification efforts have focused on protocol-specific properties, such as topological invariants, instead of functional properties. This focus limits our understanding of the precise guarantees offered by each protocol. We propose a protocol-independent axiomatization of DHT properties using Allen Temporal Logic (ATL). To validate our axiomatization, we have implemented it in the Alloy analyser and used our implementation both to establish a number of DHT-derived properties and to generate a set of DHT execution traces that cover an exhaustive list of DHT corner case behaviours.

CloseRead Abstract

2025

PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles

Authors
Nikolaidis, N; Stefanovitch, N; Silvano, P; Dimitrov, D; Yangarber, R; Guimaraes, N; Sartori, E; Androutsopoulos, I; Nakov, P; Da San Martino, G; Piskorski, J;

Publication
PROCEEDINGS OF THE 63RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS

Abstract
We present PolyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.

CloseRead Abstract

2025

A Reinforcement Learning Based Recommender System Framework for Web Apps: Radio and Game Aggregators Scenarios

Authors
Batista, A; Torres, JM; Sobral, P; Moreira, RS; Soares, C; Pereira, I;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT I

Abstract
Recommendation systems can play an important role in today's digital content platforms by supporting the suggestion of relevant content in a personalised manner for each customer. Such content customisation has not been consistent across most media domains, and particularly on radio streaming and gaming aggregators, which are the two real-world application domains focused in this work. The challenges faced in these application areas are the dynamic nature of user preferences and the difficulty of generating recommendations for less popular content, due to the overwhelming choice and polarisation of available top content. We present the design and implementation of a Reinforcement Learning-based Recommendation System (RLRS) for web applications, using a Deep Deterministic Policy Gradient (DDPG) agent and, as a reward function, a weighted sum of the user Click Distribution (CD) across the recommended items and the Dwell Time (DT), a measure of the time users spend interacting with those items. Our system has been deployed in real production scenarios with preliminary but promising results. Several metrics are used to track the effectiveness of our approach, such as content coverage, category diversity, and intra-list similarity. In both scenarios tested, the system shows consistent improvement and adaptability over time, reinforcing its applicability.

CloseRead Abstract

2025

A blockchain architecture with smart contracts for an additive symbiotic network - a case study

Authors
Ferreira, IA; Palazzo, G; Pinto, A; Pinto, P; Sousa, P; Godina, R; Carvalho, H;

Publication
OPERATIONS MANAGEMENT RESEARCH

Abstract
Adopting innovative technologies such as blockchain and additive manufacturing can help organisations promote the development of additive symbiotic networks, thus pursuing higher sustainable goals and implementing circular economy strategies. These symbiotic networks correspond to industrial symbiosis networks in which wastes and by-products from other industries are incorporated into additive manufacturing processes. The adoption of blockchain technology in such a context is still in a nascent stage. Using the case study method, this research demonstrates the adoption of blockchain technology in an additive symbiotic network of a real-life context. The requirements to use a blockchain network are identified, and an architecture based on smart contracts is proposed as an enabler of the additive symbiotic network under study. The proposed solution uses the Hyperledger Fabric Attribute-Based Access Control as the distributed ledger technology. Even though this solution is still in the proof-of-concept stage, the results show that adopting it would allow the elimination of intermediary entities, keep available tracking records of the resources exchanged, and improve trust among the symbiotic stakeholders (that do not have any trust or cooperation mechanisms established before the symbiotic relationship). This study highlights that the complexity associated with introducing a novel technology and the technology's immaturity compared to other data storage technologies are some of the main challenges related to using blockchain technology in additive symbiotic networks.

CloseRead Abstract

2025

Verifying Multiple TLA+ Configurations with Blast

Authors
Somson, P; Cunha, A;

Publication
2025 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON FORMAL METHODS IN SOFTWARE ENGINEERING, FORMALISE

Abstract
TLA(+) is one of the most popular formal methods for designing concurrent and distributed systems. TLA(+) specifications can be verified with the TLC model checker, but unfortunately only one user-specified configuration of the system is verified at a time. If configurations are simple (e.g. the number of processes in a concurrent algorithm) it is viable to run TLC for several configurations to gain confidence that the system indeed works correctly for all of them. However, for complex configurations it is difficult to do so, and critical configurations can easily be missed. This paper introduces Blast, a tool that simplifies this task, by enabling the user to easily verify a TLA(+) specification for all configurations inside a given scope. Our evaluation using a large benchmark of TLA(+) examples, shows that Blast can be effectively used in a wide range of specifications and helped us uncover issues in several of them.

CloseRead Abstract

188
4387