Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2026

Synthetic Time Series Generation via Complex Networks

Autores
Vale, J; Silva, VF; Silva, ME; Silva, F;

Publicação
CoRR

Abstract
Time series data are essential for a wide range of applications, particularly in developing robust machine learning models. However, access to high-quality datasets is often limited due to privacy concerns, acquisition costs, and labeling challenges. Synthetic time series generation has emerged as a promising solution to address these constraints. In this work, we present a framework for generating synthetic time series by leveraging complex networks mappings. Specifically, we investigate whether time series transformed into Quantile Graphs (QG) -- and then reconstructed via inverse mapping -- can produce synthetic data that preserve the statistical and structural properties of the original. We evaluate the fidelity and utility of the generated data using both simulated and real-world datasets, and compare our approach against state-of-the-art Generative Adversarial Network (GAN) methods. Results indicate that our quantile graph-based methodology offers a competitive and interpretable alternative for synthetic time series generation.

FecharLer Abstract

2026

Handling missing time series count data: A comparative study of two imputation approaches via GDA

Autores
Pereira I.; Silva I.; Silva M.E.;

Publicação
Aip Conference Proceedings

Abstract
Analyzing time series of counts often encounters the challenge of missing data, which can significantly hinder the accuracy and reliability of statistical models. This study addresses this issue by employing Poisson first-order integer-valued au-toregressive (PoINAR) models in conjunction with the Gibbs sampler with data augmentation. This method is particularly effective as it accounts for both the mechanisms behind missing data and the intrinsic serial correlation within the time series. Two distinct approaches to data augmentation are explored and compared in this work and illustrated using both simulated and real data.

FecharLer Abstract

2026

Time Series Analysis of Atlantic Salmon Catches in the Minho River over a Century

Autores
Dias, E; Antunes, C; Ilarri, M; Cunha, J; Silva, ME;

Publicação
FISHES

Abstract
Atlantic salmon populations have declined in many regions and are affected by several natural and anthropogenic factors throughout their lives. We investigated the role of environmental drivers and the effect of dam construction on the trend in catches of spawning adults of a migratory population currently at risk. For this purpose, we examined the salmon catches from 1914 to 2020 in the Minho River (NW Portugal, SW Europe), located at the southern limit of this species' distribution. There was a decline in catches over time with an inverse and significant relationship between the trend in catches and lagged temperature. Delayed effects of this type may indicate temperature influences on survival during early life history stages. Similarly, the trend in catches decreased with the increasing number of dams. A forecast model built for the period before the construction of the first major dam in this river (before 1955), including lagged temperature, resulted in a decreasing trend in the number of catches. This demonstrates that catches would have declined due to temperature effects even without dam construction. This does not diminish the role of dams in the observed decline; rather, it reveals that temperature-driven declines would have occurred independently. Nonetheless, efficient management and conservation of this imperiled population require further detailed biological information on the number of returning spawning adults and salmons' survival throughout their life cycle.

FecharLer Abstract

2026

Outlier Analysis in Personnel Attendance Timesheet Records

Autores
Gonçalo Duarte Nunes; João Pinto da Silva; Leandro Magalhães; Ricardo Sousa;

Publicação
SSRN Electronic Journal

Abstract
?Accurate recording of employee working hours is fundamental for workforce management, operational planning, and regulatory compliance. Despite the widespread adoption of digital time-tracking systems, timesheet records remain susceptible to irregularities that can distort labor metrics, productivity indicators, and cost estimations. This study proposes a domain-informed analytical framework for detecting, classifying, and interpreting anomalous entries in employee attendance data.The methodology integrates outlier detection with operational context in a structured workflow. First, six relative deviation features are engineered to capture directional differences between planned and recorded work and lunch periods, including start times, end times, and durations. These features are normalized to ensure comparability across heterogeneous shifts. Second, univariate Tukey’s fences are applied to identify mild and extreme outliers for each deviation feature. Extreme outliers are interpreted as potential measurement errors, whereas mild outliers are classified according to domain-defined directional rules as either operationally acceptable or operationally detrimental deviations. Third, unauthorized deviations are analyzed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to reveal recurring behavioral patterns within the multidimensional deviation space. Finally, employee-level behavioral risk is quantified through a normalized Severity Index based on the frequency of unauthorized deviations relative to attendance frequency, enabling both global ranking and temporal monitoring.Applied to 4,726 anonymized timesheet records, the proposed approach effectively distinguishes measurement errors, acceptable deviations, and operationally detrimental behaviors while revealing structured patterns of noncompliance. By integrating robust statistics with domain knowledge, it enables scalable attendance analytics and workforce governance.

FecharLer Abstract

2026

A survey on group fairness in federated learning: challenges, taxonomy of solutions and directions for future research

Autores
Salazar, T; Araujo, H; Cano, A; Abreu, PH;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Group fairness in machine learning is an important area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple clients, amplifies the need for fairness methodologies due to its inherent heterogeneous data distributions that can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 48 research works specifically dedicated to addressing this issue. However, no comprehensive survey has specifically focused on group fairness in Federated Learning. In this work, we analyze the key challenges of this topic, propose practices for its identification and benchmarking, and create a novel taxonomy based on criteria such as data partitioning, location, and strategy. Furthermore, we analyze broader concerns, review how different approaches handle the complexities of various sensitive attributes, examine common datasets and applications, and discuss the ethical, legal, and policy implications of group fairness in FL. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.

FecharLer Abstract

2026

LogicMix: Sample mixing data augmentation for multi-label image classification with partial labels

Autores
Chong, CF; Guo, JL; Yang, X; Ke, W; Abreu, PH; Wang, YP; Im, SK;

Publicação
PATTERN RECOGNITION

Abstract
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. Most existing approaches assume the missing labels as negatives and/or exploit image and category relationships to regularize training. Orthogonally, this paper studies blending samples in such incomplete datasets as new samples, extending the training data magnitude to increase generalization. First, the proposed LogicMix mixes multiple partially labeled samples to produce new samples, where their unknown labels are naturally mixed by OR's logical equivalences, without replacement with constants. Subsequently, a Decouple Partial-Asymmetric Loss is proposed to assign separate label-focusing policies to original and new samples, addressing the learning imbalance from the different positive-negative label imbalances between original and augmented samples. Finally, we propose a complete learning framework called 2WayAug-PL. LogicMix and conventional data augmentation collaborate to extend the diversity of new samples in both the sample-sample relation and human prior knowledge, while pseudo-labeling compensates for the lack of labels to provide more supervision signals. 27 partially labeled dataset scenarios derived from three benchmarking datasets with various learning difficulties are utilized for comprehensive experiments. LogicMix has shown remarkable effectiveness and generality in improving mAP against compared sample-mixing data augmentation methods. In particular, 2WayAug-PL achieves state-of-the-art average mAP of 84.3%, 50.1 %, and 93.8% on MS-COCO, VG-200, and Pascal VOC 2007, respectively. It further pushes the previous best performance achieved by different frameworks by 0.6% (CFT), 0.6% (CFT), and 0.1 % (SR). Moreover, 2WayAug-PL significantly outperforms all compared frameworks, as shown by statistical tests. Code is available at: https://github.com/maxium0526/logic_mix.

FecharLer Abstract