2025
Authors
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;
Publication
Abstract
2025
Authors
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;
Publication
DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Multivariate time series analysis is a vital but challenging task, with multidisciplinary applicability, tackling the characterization of multiple interconnected variables over time and their dependencies. Traditional methodologies often adapt univariate approaches or rely on assumptions specific to certain domains or problems, presenting limitations. A recent promising alternative is to map multivariate time series into high-level network structures such as multiplex networks, with past work relying on connecting successive time series components with interconnections between contemporary timestamps. In this work, we first define a novel cross-horizontal visibility mapping between lagged timestamps of different time series and then introduce the concept of multilayer horizontal visibility graphs. This allows describing cross-dimension dependencies via inter-layer edges, leveraging the entire structure of multilayer networks. To this end, a novel parameter-free topological measure is proposed and common measures are extended for the multilayer setting. Our approach is general and applicable to any kind of multivariate time series data. We provide an extensive experimental evaluation with both synthetic and real-world datasets. We first explore the proposed methodology and the data properties highlighted by each measure, showing that inter-layer edges based on cross-horizontal visibility preserve more information than previous mappings, while also complementing the information captured by commonly used intra-layer edges. We then illustrate the applicability and validity of our approach in multivariate time series mining tasks, showcasing its potential for enhanced data analysis and insights.
2025
Authors
Silva, I; Silva, ME; Pereira, I;
Publication
Springer Proceedings in Mathematics and Statistics
Abstract
The presence of missing data poses a common challenge for time series analysis in general since the most usual requirement is that the data is equally spaced in time and therefore imputation methods are required. For time series of counts, the usual imputation methods which usually produce real valued observations, are not adequate. This work employs Bayesian principles for handling missing data within time series of counts, based on first-order integer-valued autoregressive (INAR) models, namely Approximate Bayesian Computation (ABC) and Gibbs sampler with Data Augmentation (GDA) algorithms. The methodologies are illustrated with synthetic and real data and the results indicate that the estimates are consistent and present less bias when the percentage of missing observations decreases, as expected. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Cruz, A; Salazar, T; Carvalho, M; Maças, C; Machado, P; Abreu, PH;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
The use of machine learning in decision-making has become increasingly pervasive across various fields, from healthcare to finance, enabling systems to learn from data and improve their performance over time. The transformative impact of these new technologies warrants several considerations that demand the development of modern solutions through responsible artificial intelligence-the incorporation of ethical principles into the creation and deployment of AI systems. Fairness is one such principle, ensuring that machine learning algorithms do not produce biased outcomes or discriminate against any group of the population with respect to sensitive attributes, such as race or gender. In this context, visualization techniques can help identify data imbalances and disparities in model performance across different demographic groups. However, there is a lack of guidance towards clear and effective representations that support entry-level users in fairness analysis, particularly when considering that the approaches to fairness visualization can vary significantly. In this regard, the goal of this work is to present a comprehensive analysis of current tools directed at visualizing and examining group fairness in machine learning, with a focus on both data and binary classification model outcomes. These visualization tools are reviewed and discussed, concluding with the proposition of a focused set of visualization guidelines directed towards improving the comprehensibility of fairness visualizations.
2025
Authors
Guo, J; Chong, CF; Abreu, PH; Mao, C; Li, J; Lam, CT; Ng, BK;
Publication
Eng. Appl. Artif. Intell.
Abstract
Solar photovoltaic technology has grown significantly as a renewable energy, with unmanned aerial vehicles equipped with thermal infrared cameras effectively inspecting solar panels. However, long-distance capture and low-resolution infrared cameras make the targets small, complicating feature extraction. Additionally, the large number of normal photovoltaic modules results in a significant imbalance in the dataset. Furthermore, limited computing resources on unmanned aerial vehicles further challenge real-time fault classification. These factors limit the performance of current fault classification systems for solar panels. The multi-scale and multi-branch Reparameterization of convolutional neural networks can improve model performance while reducing computational demands at the deployment stage, making them suitable for practical applications. This study proposes an efficient framework based on reparameterization for infrared solar panel fault classification. We propose a Proportional Balanced Weight asymmetric loss function to address the class imbalance and employ multi-branch, multi-scale convolutional kernels for extracting tiny features from low-resolution images. The designed models were trained with Exponential Moving Average for better performance and reparameterized for efficient deployment. We evaluated the designed models using the Infrared Solar Module dataset. The proposed framework achieved an accuracy of 83.8% for the 12-Class classification task and 74.0% for the 11-Class task, both without data augmentation to enhance generalization. The accuracy improvements of up to 16.4% and F1-Score gains of up to 18.7%. Additionally, we achieved an inference speed that is 3.4 times faster than the training speed, while maintaining high fault classification performance. © 2025 Elsevier Ltd
2025
Authors
Mangussi, AD; Pereira, RC; Abreu, PH; Lorena, AC;
Publication
INTELLIGENT SYSTEMS, BRACIS 2024, PT I
Abstract
In real-world scenarios, a wide variety of datasets contain inconsistencies. One example of such inconsistency is missing data (MD), which refers to the absence of information in one or more variables. Missing imputation strategies emerged as a possible solution for addressing this problem, which can replace the missing values based on mean, median, or Machine Learning (ML) techniques. The performance of such strategies depends on multiple factors. One factor that influences the missing value imputation (MVI) methods is the presence of noisy instances, described as anything that obscures the relationship between the features of an instance and its class, having an adversarial effect. However, the interaction between MD and noisy instances has received little attention in the literature. This work fills this gap by investigating missing and noisy data interplay. Our experimental setup begins with generating missingness under the Missing Not at Random (MNAR) mechanism in a multivariate scenario and performing imputation using seven state-of-the-art MVI methods. Our methodology involves applying a noise filter before performing the imputation task and evaluating the quality of the imputation directly. Additionally, we measure the classification performance with the new estimates. This approach is applied to both synthetic data and 11 real-world datasets. The effects of noise filtering before imputation are evaluated. The results show that noise preprocessing before the imputation task improves the imputation quality and the classification performance for imputed datasets.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.