Publicacoes - INESC TEC

Publicações

Publicações por HASLab

2025

KEIGO: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy

Autores
Adao, R; Wu, ZJ; Zhou, CJ; Balmau, O; Paulo, J; Macedo, R;

Publicação
PROCEEDINGS OF THE VLDB ENDOWMENT

Abstract
We present Keigo, a concurrency-and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques-concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write-and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.

FecharLer Abstract

2025

Promoting sustainable and personalized travel behaviors while preserving data privacy

Autores
Brito C.; Pina N.; Esteves T.; Vitorino R.; Cunha I.; Paulo J.;

Publicação
Transportation Engineering

Abstract
Cities worldwide have agreed on ambitious goals regarding carbon neutrality. To do so, policymakers seek ways to foster smarter and cleaner transportation solutions. However, citizens lack awareness of their carbon footprint and of greener mobility alternatives such as public transports. With this, three main challenges emerge: (i) increase users’ awareness regarding their carbon footprint, (ii) provide personalized recommendations and incentives for using sustainable transportation alternatives and, (iii) guarantee that any personal data collected from the user is kept private. This paper addresses these challenges by proposing a new methodology. Created under the FranchetAI project, the methodology combines federated Artificial Intelligence (AI) and Greenhouse Gas (GHG) estimation models to calculate the carbon footprint of users when choosing different transportation modes (e.g., foot, car, bus). Through a mobile application that keeps the privacy of users’ personal information, the project aims at providing detailed reports to inform citizens about their impact on the environment, and an incentive program to promote the usage of more sustainable mobility alternatives.

FecharLer Abstract Ler Publicação Completa

2025

Addressing the Agony of Recruitment for Human-centric Computing Studies

Autores
Madampe, K; Grundy, J; Good, J; Hidellaarachchi, D; Cunha, J; Brown, C; Kuang, P; Tamime, RA; Anik, AI; Sarkar, A; Zhou, W; Khalid, S; Turchi, T; Wickramathilaka, S; Jiang, Y;

Publicação
ACM SIGSOFT Softw. Eng. Notes

Abstract
We conducted a workshop on ''Addressing Challenges in Recruiting Participants for Human-Centric Computing Research Studies'' at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)'24 Conference. In the workshop, we conducted a brainstorming session on ''roadmap development of making participant recruitment easier for human-centric computing studies in both industry and academia''. This article presents 7 stages of participant recruitment and key strategies identified by the authors (workshop participants) during the brainstorming session.

FecharLer Abstract

2025

Let's Talk About It: Making Scientific Computational Reproducibility Easier

Autores
Costa, L; Barbosa, S; Cunha, J;

Publicação
VL/HCC

Abstract
Computational reproducibility-the ability to reexecute a scientific experiment using the same code, data, and configuration-should be straightforward. However, researchers often struggle with inconsistencies in documentation, missing dependencies, and environment setup, which undermines the credibility of scientific results. To address this, we propose a conversational, text-based tool that aids researchers in reproducing and packaging computational experiments into a single file. This file can be re-executed with a double-click on any machine, requiring only a single tool. SciConv is designed to support two key scenarios: (i) enabling researchers to prepare their own experiments in a reproducible, shareable format, and (ii) helping other researchers reproduce existing experiments from shared code repositories. In both cases, the tool reduces technical overhead and simplifies environment configuration through conversational interaction. We evaluated the tool through two studies. In the first, we reproduced 15 of 18 published experiments, with most requiring little or no user interaction. In the second, we conducted a user study comparing our tool with a professional platform, using the System Usability Scale (SUS) and NASA Task Load Index (TLX). The results show a statistically significant advantage for our tool in both usability and workload, demonstrating its effectiveness in supporting reproducibility. © 2025 IEEE.

FecharLer Abstract

2025

CompRep: A Dataset For Computational Reproducibility

Autores
Costa, L; Barbosa, S; Cunha, J;

Publicação
PROCEEDINGS OF THE 3RD ACM CONFERENCE ON REPRODUCIBILITY AND REPLICABILITY, ACM REP 2025

Abstract
Reproducibility in computational science is increasingly dependent on the ability to faithfully re-execute experiments involving code, data, and software environments. However, assessing the effectiveness of reproducibility tools is difficult due to the lack of standardized benchmarks. To address this, we collected 38 computational experiments from diverse scientific domains and attempted to reproduce each using 8 different reproducibility tools. From this initial pool, we identified 18 experiments that could be successfully reproduced using at least one tool. These experiments form our curated benchmark dataset, which we release along with reproducibility packages to support ongoing evaluation efforts. This article introduces the curated dataset, incorporating details about software dependencies, execution steps, and configurations necessary for accurate reproduction. The dataset is structured to reflect diverse computational requirements and methodologies, ranging from simple scripts to complex, multi-language workflows, ensuring it presents the wide range of challenges researchers face in reproducing computational studies. It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of reproducibility tools. Each experiment included in the dataset is carefully documented to ensure ease of use. We added clear instructions following a standard, so each experiment has the same kind of instructions, making it easier for researchers to run each of them with their own reproducibility tool.The utility of the dataset is demonstrated through extensive evaluations using multiple reproducibility tools.

FecharLer Abstract

2025

Mind the gap: The missing features of the tools to support user studies in software engineering

Autores
Costa, L; Barbosa, S; Cunha, J;

Publicação
JOURNAL OF COMPUTER LANGUAGES

Abstract
User studies are paramount for advancing research in software engineering, particularly when evaluating tools and techniques involving programmers. However, researchers face several barriers when performing them despite the existence of supporting tools. We base our study on a set of tools and researcher-reported barriers identified in prior work on user studies in software engineering. In this work, we study how existing tools and their features cope with previously identified barriers. Moreover, we propose new features for the barriers that lack support. We validated our proposal with 102 researchers, achieving statistically significant positive support for all but one feature. We study the current gap between tools and barriers, using features as the bridge. We show there is a significant lack of support for several barriers, as some have no single tool to support them.

FecharLer Abstract