Publicacoes - INESC TEC

Publicações

Publicações por CTM

2025

Tax Optimization in the European Union: A Laffer Curve Perspective

Autores
Sentinelo, T; Queiros, M; Oliveira, JM; Ramos, P;

Publicação
ECONOMIES

Abstract
This study explores the applicability of the Laffer Curve in the context of the European Union (EU) by analyzing the relationship between taxation and fiscal revenue across personal income tax (PIT), corporate income tax (CIT), and value-added tax (VAT). Utilizing a comprehensive panel data set spanning 1995 to 2022 across all 27 EU member states, the research also integrates the Bird Index to assess fiscal effort and employs advanced econometric techniques, including the Hausman Test and log-quadratic regression models, to capture the non-linear dynamics of the Laffer Curve. The findings reveal that excessively high tax rates, particularly in some larger member states, may lead to revenue losses due to reduced economic activity and tax evasion, highlighting the existence of optimal tax rates that maximize revenue while sustaining economic growth. By estimating threshold tax rates and incorporating the Bird Index, the study provides a nuanced perspective on tax efficiency and fiscal sustainability, offering evidence-based policy recommendations for optimizing tax systems in the European Union to balance revenue generation with economic competitiveness.

FecharLer Abstract

2025

Optimizing Credit Risk Prediction for Peer-to-Peer Lending Using Machine Learning

Autores
Souadda, LI; Halitim, AR; Benilles, B; Oliveira, JM; Ramos, P;

Publicação
FORECASTING

Abstract
Hyperparameter optimization (HPO) is critical for enhancing the predictive performance of machine learning models in credit risk assessment for peer-to-peer (P2P) lending. This study evaluates four HPO methods, Grid Search, Random Search, Hyperopt, and Optuna, across four models, Logistic Regression, Random Forest, XGBoost, and LightGBM, using three real-world datasets (Lending Club, Australia, Taiwan). We assess predictive accuracy (AUC, Sensitivity, Specificity, G-Mean), computational efficiency, robustness, and interpretability. LightGBM achieves the highest AUC (e.g., 70.77% on Lending Club, 93.25% on Australia, 77.85% on Taiwan), with XGBoost performing comparably. Bayesian methods (Hyperopt, Optuna) match or approach Grid Search's accuracy while reducing runtime by up to 75.7-fold (e.g., 3.19 vs. 241.47 min for LightGBM on Lending Club). A sensitivity analysis confirms robust hyperparameter configurations, with AUC variations typically below 0.4% under +/- 10% perturbations. A feature importance analysis, using gain and SHAP metrics, identifies debt-to-income ratio and employment title as key default predictors, with stable rankings (Spearman correlation > 0.95, p<0.01) across tuning methods, enhancing model interpretability. Operational impact depends on data quality, scalable infrastructure, fairness audits for features like employment title, and stakeholder collaboration to ensure compliance with regulations like the EU AI Act and U.S. Equal Credit Opportunity Act. These findings advocate Bayesian HPO and ensemble models in P2P lending, offering scalable, transparent, and fair solutions for default prediction, with future research suggested to explore advanced resampling, cost-sensitive metrics, and feature interactions.

FecharLer Abstract

2025

Transformer-Based Models for Probabilistic Time Series Forecasting with Explanatory Variables

Autores
Caetano, R; Oliveira, JM; Ramos, P;

Publicação
MATHEMATICS

Abstract
Accurate demand forecasting is essential for retail operations as it directly impacts supply chain efficiency, inventory management, and financial performance. However, forecasting retail time series presents significant challenges due to their irregular patterns, hierarchical structures, and strong dependence on external factors such as promotions, pricing strategies, and socio-economic conditions. This study evaluates the effectiveness of Transformer-based architectures, specifically Vanilla Transformer, Informer, Autoformer, ETSformer, NSTransformer, and Reformer, for probabilistic time series forecasting in retail. A key focus is the integration of explanatory variables, such as calendar-related indicators, selling prices, and socio-economic factors, which play a crucial role in capturing demand fluctuations. This study assesses how incorporating these variables enhances forecast accuracy, addressing a research gap in the comprehensive evaluation of explanatory variables within multiple Transformer-based models. Empirical results, based on the M5 dataset, show that incorporating explanatory variables generally improves forecasting performance. Models leveraging these variables achieve up to 12.4% reduction in Normalized Root Mean Squared Error (NRMSE) and 2.9% improvement in Mean Absolute Scaled Error (MASE) compared to models that rely solely on past sales. Furthermore, probabilistic forecasting enhances decision making by quantifying uncertainty, providing more reliable demand predictions for risk management. These findings underscore the effectiveness of Transformer-based models in retail forecasting and emphasize the importance of integrating domain-specific explanatory variables to achieve more accurate, context-aware predictions in dynamic retail environments.

FecharLer Abstract

2025

Deep Learning-Driven Integration of Multimodal Data for Material Property Predictions

Autores
Costa, V; Oliveira, JM; Ramos, P;

Publicação
COMPUTATION

Abstract
Advancements in deep learning have revolutionized materials discovery by enabling predictive modeling of complex material properties. However, single-modal approaches often fail to capture the intricate interplay of compositional, structural, and morphological characteristics. This study introduces a novel multimodal deep learning framework for enhanced material property prediction, integrating textual (chemical compositions), tabular (structural descriptors), and image-based (2D crystal structure visualizations) modalities. Utilizing the Alexandriadatabase, we construct a comprehensive multimodal dataset of 10,000 materials with symmetry-resolved crystallographic data. Specialized neural architectures, such as FT-Transformer for tabular data, Hugging Face Electra-based model for text, and TIMM-based MetaFormer for images, generate modality-specific embeddings, fused through a hybrid strategy into a unified latent space. The framework predicts seven critical material properties, including electronic (band gap, density of states), thermodynamic (formation energy, energy above hull, total energy), magnetic (magnetic moment per volume), and volumetric (volume per atom) features, many governed by crystallographic symmetry. Experimental results demonstrated that multimodal fusion significantly outperforms unimodal baselines. Notably, the bimodal integration of image and text data showed significant gains, reducing the Mean Absolute Error for band gap by approximately 22.7% and for volume per atom by 22.4% compared to the average unimodal models. This combination also achieved a 28.4% reduction in Root Mean Squared Error for formation energy. The full trimodal model (tabular + images + text) yielded competitive, and in several cases the lowest, error metrics, particularly for band gap, magnetic moment per volume and density of states per atom, confirming the value of integrating all three modalities. This scalable, modular framework advances materials informatics, offering a powerful tool for data-driven materials discovery and design.

FecharLer Abstract

2025

Conditional Generative Adversarial Network for Predicting the Aesthetic Outcomes of Breast Cancer Treatment

Autores
Montenegro, H; Cardoso, MJ; Cardoso, JS;

Publicação
2025 47TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)

Abstract
The alterations to the visual appearance of patients' breasts that occur due to breast cancer locoregional treatment can impact the self-esteem and satisfaction of the patients, affecting quality-of-life after treatment. As such, it is imperative that the patients are adequately informed of the potential aesthetic outcomes of treatment, to facilitate the choice of treatment and promote realistic expectations. As breast asymmetries are among the most notable effects of treatment, we propose a conditional generative adversarial network for manipulating the breast shape in torso images, applying it to simulate how the breasts' shape may change through surgical interventions. Experiments on a private breast dataset suggest that the proposed model outperforms the state-of-the-art in the realistic reconstruction of the torso of the patient while effectively manipulating the breasts.

FecharLer Abstract

2025

Fusion Strategies for Breast Cancer Characterization Using Traditional and Deep Learning Models

Autores
Lima, PV; Cardoso, JS; Oliveira, HP;

Publicação
BIBE

Abstract
Breast cancer remains one of the most prevalent and deadly cancers worldwide, making accurate evaluation of molecular markers important for effective disease management. Biomarkers such as ER, PR, and HER2 are typically assessed because they help inform prognosis and guide treatment decisions. Predicting these characteristics from imaging can support earlier clinical intervention, reduce reliance on invasive procedures, and contribute to more personalized care. While radiomics and deep learning approaches have demonstrated potential, comprehensive comparisons across these methods are still limited. This study evaluated handcrafted features, deep features, and end-to-end deep learning models for predicting ER, PR, and HER2 status from DCE-MRI. Each feature type was first assessed individually and then combined using early and late fusion. Handcrafted and deep features were processed through a pipeline that included resampling, dimensionality reduction, and model selection, while end-to-end models were trained using different initialization strategies and loss functions. The best models achieved AUCs of 0.659 for ER, 0.679 for PR, and 0.686 for HER2. Although late fusion generally improved performance, bias toward the majority classes persisted. Overall, the results suggest that combining different modeling strategies may enhance robustness in breast cancer characterization.

FecharLer Abstract