2025
Authors
Ribeiro, RP; Pfahringer, B; Japkowicz, N; Larrañaga, P; Jorge, AM; Soares, C; Abreu, PH; Gama, J;
Publication
Lecture Notes in Computer Science
Abstract
2025
Authors
Loureiro, P; Oliveira, M; Brito, P; Oliveira, L;
Publication
Springer Proceedings in Mathematics and Statistics
Abstract
Air pollution is a global challenge with deep implications in public health and environment. We examine air quality data from a monitoring station in Entrecampos, Lisbon, Portugal, using Symbolic Data Analysis. The dataset consists of hourly concentrations of nine pollutants during three years, which are logarithmically transformed and aggregated in intervals, taking the daily minimum and maximum values. The symbolic mean and variance are estimated for each variable through the method of moments, and the pairwise dependencies are captured using a bivariate copula. Symbolic principal component scores are obtained from the estimated covariance matrix and used to fit generalized extreme value distributions. Outlier maps, based on these distributions’ quantiles, are used to identify outlying observations. A comparative analysis with daily average-based outlier detection methods is conducted. The results show the relevance of Symbolic Data Analysis in revealing new insights into air quality. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Brito, P; Silva, APD;
Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Abstract
We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance-covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.
2025
Authors
Vitorino, J; Maia, E; Praça, I; Soares, C;
Publication
CoRR
Abstract
2025
Authors
Pereira, RR; Bono, J; Ferreira, HM; Ribeiro, P; Soares, C; Bizarro, P;
Publication
ECML/PKDD (9)
Abstract
When the available data for a target domain is limited, transfer learning (TL) methods leverage related data-rich source domains to train and evaluate models, before deploying them on the target domain. However, most TL methods assume fixed levels of labeled and unlabeled target data, which contrasts with real-world scenarios where both data and labels arrive progressively over time. As a result, evaluations based on these static assumptions may not reflect how methods perform in practice. To support a more realistic assessment of TL methods in dynamic settings, we propose an evaluation framework that (1) simulates varying data availability over time, (2) creates multiple domains via resampling of a given dataset and (3) introduces inter-domain variability through controlled transformations, e.g., including time-dependent covariate and concept shifts. These capabilities enable the systematic simulation of a large number of variants of the experiments, providing deeper insights into how algorithms may behave when deployed. We demonstrate the usefulness of the proposed framework by performing a case study on a proprietary real-world suite of card payment datasets. To support reproducibility, we also apply the framework on the publicly available Bank Account Fraud (BAF) dataset. By providing a methodology for evaluating TL methods over time and in different data availability conditions, our framework supports a better understanding of model behavior in real-world environments, which enables more informed decisions when deploying models in new domains.
2025
Authors
Brito, A; Santos, M; Folgado, D; Soares, C;
Publication
DISCOVERY SCIENCE, DS 2025
Abstract
Ensuring robustness in time series classification remains a critical challenge for safety-sensitive domains like clinical decision systems. While current evaluation practices focus on accuracy measures, they fail to address model stability under semantically meaningful input deformations. We propose tsMIST (Time Series Model Sensitivity Test), a novel morphing-based framework that systematically evaluates classifier resilience through controlled interpolation between adversarial class prototypes. By calculating the switchThreshold - defined as the minimal morphing distance required to flip predictions - our method reveals critical stability patterns across synthetic benchmarks with tunable class separation and 17 medical time series datasets. Key findings show convolutional architectures (ROCKET) maintain optimal thresholds near 50% morphing (48.2 +/- 3.1%), while feature-based models (Catch22) exhibit premature decision flips at 22.7% deformation (+/- 15.4%). In clinical scenarios, tsMIST detected critical ECG misclassifications triggered by <= 12% signal variation - vulnerabilities undetected by accuracy measures. Our results establish that robustness measures must complement accuracy for responsible AI in high-stakes applications. This work advances ML evaluation practices by enabling systematic sensitivity analysis, with implications for model auditing and deployment in safety-critical domains.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.