Publicacoes - INESC TEC

Publicações

Publicações por José Manuel Oliveira

2025

Deep Learning-Driven Integration of Multimodal Data for Material Property Predictions

Autores
Costa, V; Oliveira, JM; Ramos, P;

Publicação

Abstract
This study investigates the integration of deep learning for single-modality and multimodal data within materials science. Traditional methods for materials discovery are often resource-intensive and slow, prompting the exploration of machine learning to streamline the prediction of material properties. While single-modality models have been effective, they often miss the complexities inherent in material data. The paper explores multimodal data integration—combining text, images, and tabular data—and demonstrates its potential to improve predictive accuracy. Utilizing the Alexandria dataset, the research introduces a custom methodology involving multimodal data creation, model tuning with AutoGluon framework, and evaluation through targeted fusion techniques. Results reveal that multimodal approaches enhance predictive accuracy and efficiency, particularly when text and image data are integrated. However, challenges remain in predicting complex features like band gaps. Future directions include incorporating new data types and refining specialized models to improve materials discovery and innovation.

FecharLer Abstract

2025

Optimizing Credit Risk Prediction for Peer-to-Peer Lending Using Machine Learning

Autores
Souadda, LI; Halitim, AR; Benilles, B; Oliveira, JM; Ramos, P;

Publicação

Abstract
This study investigates the effectiveness of different hyperparameter tuning strategies for peer-to-peer risk management. Ensemble learning techniques have shown superior performance in this field compared to individual classifiers and traditional statistical methods. However, model performance is influenced not only by the choice of algorithm but also by hyperparameter tuning, which impacts both predictive accuracy and computational efficiency. This research compares the performance and efficiency of three widely used hyperparameter tuning methods, Grid Search, Random Search, and Optuna, across XGBoost, LightGBM, and Logistic Regression models. The analysis uses the Lending Club dataset, spanning from 2007 Q1 to 2020 Q3, with comprehensive data preprocessing to address missing values, class imbalance, and feature engineering. Model explainability is assessed through feature importance analysis to identify key drivers of default probability. The findings reveal comparable predictive performance among the tuning methods, evaluated using metrics such as G-mean, sensitivity, and specificity. However, Optuna significantly outperforms the others in computational efficiency; for instance, it is 10.7 times faster than Grid Search for XGBoost and 40.5 times faster for LightGBM. Additionally, variations in feature importance rankings across tuning methods influence model interpretability and the prioritization of risk factors. These insights underscore the importance of selecting appropriate hyperparameter tuning strategies to optimize both performance and explainability in peer-to-peer risk management models.

FecharLer Abstract

2025

Optimizing Credit Risk Prediction for Peer-to-Peer Lending Using Machine Learning

Autores
Souadda, LI; Halitim, AR; Benilles, B; Oliveira, JM; Ramos, P;

Publicação
Forecasting

Abstract
Hyperparameter optimization (HPO) is critical for enhancing the predictive performance of machine learning models in credit risk assessment for peer-to-peer (P2P) lending. This study evaluates four HPO methods, Grid Search, Random Search, Hyperopt, and Optuna, across four models, Logistic Regression, Random Forest, XGBoost, and LightGBM, using three real-world datasets (Lending Club, Australia, Taiwan). We assess predictive accuracy (AUC, Sensitivity, Specificity, G-Mean), computational efficiency, robustness, and interpretability. LightGBM achieves the highest AUC (e.g., 70.77% on Lending Club, 93.25% on Australia, 77.85% on Taiwan), with XGBoost performing comparably. Bayesian methods (Hyperopt, Optuna) match or approach Grid Search’s accuracy while reducing runtime by up to 75.7-fold (e.g., 3.19 vs. 241.47 min for LightGBM on Lending Club). A sensitivity analysis confirms robust hyperparameter configurations, with AUC variations typically below 0.4% under ±10% perturbations. A feature importance analysis, using gain and SHAP metrics, identifies debt-to-income ratio and employment title as key default predictors, with stable rankings (Spearman correlation > 0.95, p<0.01) across tuning methods, enhancing model interpretability. Operational impact depends on data quality, scalable infrastructure, fairness audits for features like employment title, and stakeholder collaboration to ensure compliance with regulations like the EU AI Act and U.S. Equal Credit Opportunity Act. These findings advocate Bayesian HPO and ensemble models in P2P lending, offering scalable, transparent, and fair solutions for default prediction, with future research suggested to explore advanced resampling, cost-sensitive metrics, and feature interactions.

FecharLer Abstract