Pedro Strecht

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering

Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Pedro Strecht

Interest
Topics

Details

Name
Pedro Strecht
Role
External Research Collaborator
Since
01st April 2014

Nationality
Portugal
Centre
Artificial Intelligence and Decision Support
Contacts
+351222094398
pedro.strecht@inesctec.pt

Publications

View all Publications

2025

Estimating Completeness of Consensus Models: Geometrical and Distributional Approaches

Authors
Strecht, P; Mendes Moreira, J; Soares, C;

Publication
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2024, PT I

Abstract
In many organizations with a distributed operation, not only is data collection distributed, but models are also developed and deployed separately. Understanding the combined knowledge of all the local models may be important and challenging, especially in the case of a large number of models. The automated development of consensus models, which aggregate multiple models into a single one, involves several challenges, including fidelity (ensuring that aggregation does not penalize the predictive performance severely) and completeness (ensuring that the consensus model covers the same space as the local models). In this paper, we address the latter, proposing two measures for geometrical and distributional completeness. The first quantifies the proportion of the decision space that is covered by a model, while the second takes into account the concentration of the data that is covered by the model. The use of these measures is illustrated in a real-world example of academic management, as well as four publicly available datasets. The results indicate that distributional completeness in the deployed models is consistently higher than geometrical completeness. Although consensus models tend to be geometrically incomplete, distributional completeness reveals that they cover the regions of the decision space with a higher concentration of data.

CloseRead Abstract

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Authors
Strecht, P; Mendes Moreira, J; Soares, C;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

CloseRead Abstract

2023

Curbing Dropout: Predictive Analytics at the University of Porto

Authors
Blanquet, L; Grilo, J; Strecht, P; Camanho, A;

Publication
Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao

Abstract
This study explores data mining techniques for predicting student dropout in higher education. The research compares different methodological approaches, including alternative algorithms and variations in model specifications. Additionally, we examine the impact of employing either a single model for all university programs or separate models per program. The performance of models with students grouped according to their position on the program study plan was also tested. The training datasets were explored with varying time series lengths (2, 4, 6, and 8 years) and the experiments use academic data from the University of Porto, spanning the academic years from 2012 to 2022. The algorithm that yielded the best results was XGBoost. The best predictions were obtained with models trained with two years of data, both with separate models for each program and with a single model. The findings highlight the potential of data mining approaches in predicting student dropout, offering valuable insights for higher education institutions aiming to improve student retention and success. © 2023 Associacao Portuguesa de Sistemas de Informacao. All rights reserved.

CloseRead Abstract

2022

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

Authors
Strecht, P; Mendes Moreira, J; Soares, C;

Publication
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II

Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.

CloseRead Abstract

2021

Inmplode: A framework to interpret multiple related rule-based models

Authors
Strecht, P; Mendes Moreira, J; Soares, C;

Publication
EXPERT SYSTEMS

Abstract
There is a growing trend to split problems into separate subproblems and develop separate models for each (e.g., different churn models for separate customer segments; different failure prediction models for separate university courses, etc.). While it may lead to better predictive models, the use of multiple models makes interpretability more challenging. In this paper, we address the problem of synthesizing the knowledge contained in a set of models without a significant loss of prediction performance. We focus on decision tree models because their interpretability makes them suitable for problems involving knowledge extraction. We detail the process, identifying alternative methods to address the different phases involved. An extensive set of experiments is carried out on the problem of predicting the failure of students in courses at the University of Porto. We assess the effect of using different methods for the operations of the methodology, both in terms of the knowledge extracted as well as the accuracy of the combined models.

CloseRead Abstract

Details

Name

Role

Since

Nationality

Centre

Contacts

Estimating Completeness of Consensus Models: Geometrical and Distributional Approaches

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Curbing Dropout: Predictive Analytics at the University of Porto

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

Inmplode: A framework to interpret multiple related rule-based models