Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Strecht

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Authors
Strecht, P; Mendes-Moreira, J; Soares, C;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

2023

Curbing Dropout: Predictive Analytics at the University of Porto

Authors
Blanquet, L; Grilo, J; Strecht, P; Camanho, A;

Publication
Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao

Abstract
This study explores data mining techniques for predicting student dropout in higher education. The research compares different methodological approaches, including alternative algorithms and variations in model specifications. Additionally, we examine the impact of employing either a single model for all university programs or separate models per program. The performance of models with students grouped according to their position on the program study plan was also tested. The training datasets were explored with varying time series lengths (2, 4, 6, and 8 years) and the experiments use academic data from the University of Porto, spanning the academic years from 2012 to 2022. The algorithm that yielded the best results was XGBoost. The best predictions were obtained with models trained with two years of data, both with separate models for each program and with a single model. The findings highlight the potential of data mining approaches in predicting student dropout, offering valuable insights for higher education institutions aiming to improve student retention and success. © 2023 Associacao Portuguesa de Sistemas de Informacao. All rights reserved.

  • 2
  • 2