Publicacoes - INESC TEC

Publicações

Publicações por Pedro Strecht

2014

Merging Decision Trees: A Case Study in Predicting Student Performance

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
Predicting the failure of students in university courses can provide useful information for course and programme managers as well as to explain the drop out phenomenon. While it is important to have models at course level, their number makes it hard to extract knowledge that can be useful at the university level. Therefore, to support decision making at this level, it is important to generalize the knowledge contained in those models. We propose an approach to group and merge interpretable models in order to replace them with more general ones without compromising the quality of predictive performance. We evaluate our approach using data from the U. Porto. The results obtained are promising, although they suggest alternative approaches to the problem.

FecharLer Abstract

2015

A Comparative Study of Regression and Classification Algorithms for Modelling Students' Academic Performance

Autores
Strecht, P; Cruz, L; Soares, C; Moreira, JM; Abreu, R;

Publicação
Proceedings of the 8th International Conference on Educational Data Mining, EDM 2015, Madrid, Spain, June 26-29, 2015

Abstract

2018

A Framework for Analytical Approaches to Combine Interpretable Models

Autores
Strecht, P; Moreira, JM; Soares, C;

Publicação
Information Management and Big Data, 5th International Conference, SIMBig 2018, Lima, Peru, September 3-5, 2018, Proceedings.

Abstract
Analytic approaches to combine interpretable models, although presented in different contexts, can be generalized to highlight the components that can be specialized. We propose a framework that structures the combination process, formalizes the problems that can be solved in alternative ways and evaluates the combined models based on their predictive ability to replace the base ones, without loss of interpretability. The framework is illustrated with a case study using data from the University of Porto, Portugal, where experiments were carried out. The results show that grouping base models by scientific areas, ordering by the number of variables and intersecting their underlying rules creates conditions for the combined models to outperform them. © 2019, Springer Nature Switzerland AG.

FecharLer Abstract

2018

Generalizing Knowledge in Decentralized Rule-Based Models

Autores
Strecht, P; Moreira, JM; Soares, C;

Publicação
ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers

Abstract
Knowledge generalization of ruled-based models, such as decision trees or decision rules, have emerged from different backgrounds. This particular kind of models, given their interpretability, offer several possibilities to be combined. Despite each distinct context, common patterns have emerged revealing the systemic nature of the problem. In this paper, we look at the problem of generalizing the knowledge contained in a set of models as a process formalizing the operations that can be addressed in alternative ways. We also include a set-up to evaluate gen-eralized models based on their ability to replace the base ones from a predictive performance perspective, without loss of interpretability.

FecharLer Abstract

2022

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II

Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.

FecharLer Abstract

2021

Inmplode: A framework to interpret multiple related rule-based models

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
EXPERT SYSTEMS

Abstract
There is a growing trend to split problems into separate subproblems and develop separate models for each (e.g., different churn models for separate customer segments; different failure prediction models for separate university courses, etc.). While it may lead to better predictive models, the use of multiple models makes interpretability more challenging. In this paper, we address the problem of synthesizing the knowledge contained in a set of models without a significant loss of prediction performance. We focus on decision tree models because their interpretability makes them suitable for problems involving knowledge extraction. We detail the process, identifying alternative methods to address the different phases involved. An extensive set of experiments is carried out on the problem of predicting the failure of students in courses at the University of Porto. We assess the effect of using different methods for the operations of the methodology, both in terms of the knowledge extracted as well as the accuracy of the combined models.

FecharLer Abstract