Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Areas of research:

- Knowledge discovery

  • Supervised learning   
  • Multiple predictive models
  • Applied knowledge discovery

- Intelligent transportation systems

  • Planning and operations of public transports

Interest
Topics
Details

Details

  • Name

    João Mendes Moreira
  • Role

    Senior Researcher
  • Since

    01st January 2011
007
Publications

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Authors
Strecht, P; Mendes-Moreira, J; Soares, C;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

2024

Map-matching methods in agriculture

Authors
Silva, A; Moreira, JM; Ferreira, C; Costa, N; Dias, D;

Publication
Comput. Electron. Agric.

Abstract
In this paper, a solution to monitor the location of humans during their activity in the agriculture sector with the aim to boost productivity and efficiency is provided. Our solution is based on map-matching methods, that are used to track the path spanned by a worker along a specific activity in an agriculture culture. Two different cultures are taken into consideration in this study — olives and vines. We leverage the symmetry of the geometry of these cultures into our solution and divide the problem three-fold — initially, we estimate a path of a worker along the fields, then we apply the map-matching to such path and finally, a post-processing method is applied to ensure local continuity of the sequence obtained from map-matching. The proposed methods are experimentally evaluated using synthetic and real data in the region of Mirandela, Portugal. Evaluation metrics show that results for synthetic data are robust under several sampling periods, while for real-world data, results for the vine culture are on par with synthetic, and for the olive culture performance is reduced. © 2023 Elsevier B.V.

2023

DyGCN-LSTM: A dynamic GCN-LSTM based encoder-decoder framework for multistep traffic prediction

Authors
Kumar, R; Moreira, JM; Chandra, J;

Publication
APPLIED INTELLIGENCE

Abstract
Intelligent transportation systems (ITS) are gaining attraction in large cities for better traffic management. Traffic forecasting is an important part of ITS, but a difficult one due to the intricate spatiotemporal relationships of traffic between different locations. Despite the fact that remote or far sensors may have temporal and spatial similarities with the predicting sensor, existing traffic forecasting research focuses primarily on modeling correlations between neighboring sensors while disregarding correlations between remote sensors. Furthermore, existing methods for capturing spatial dependencies, such as graph convolutional networks (GCNs), are unable to capture the dynamic spatial dependence in traffic systems. Self-attention-based techniques for modeling dynamic correlations of all sensors currently in use overlook the hierarchical features of roads and have quadratic computational complexity. Our paper presents a new Dynamic Graph Convolution LSTM Network (DyGCN-LSTM) to address the aforementioned limitations. The novelty of DyGCN-LSTM is that it can model the underlying non-linear spatial and temporal correlations of remotely located sensors at the same time. Experimental investigations conducted using four real-world traffic data sets show that the suggested approach is superior to state-of-the-art benchmarks by 25% in terms of RMSE.

2023

Applying Machine Learning to Estimate the Effort and Duration of Individual Tasks in Software Projects

Authors
Sousa, AO; Veloso, DT; Goncalves, HM; Faria, JP; Mendes Moreira, J; Graca, R; Gomes, D; Castro, RN; Henriques, PC;

Publication
IEEE ACCESS

Abstract
Software estimation is a vital yet challenging project management activity. Various methods, from empirical to algorithmic, have been developed to fit different development contexts, from plan-driven to agile. Recently, machine learning techniques have shown potential in this realm but are still underexplored, especially for individual task estimation. We investigate the use of machine learning techniques in predicting task effort and duration in software projects to assess their applicability and effectiveness in production environments, identify the best-performing algorithms, and pinpoint key input variables (features) for predictions. We conducted experiments with datasets of various sizes and structures exported from three project management tools used by partner companies. For each dataset, we trained regression models for predicting the effort and duration of individual tasks using eight machine learning algorithms. The models were validated using k-fold cross-validation and evaluated with several metrics. Ensemble algorithms like Random Forest, Extra Trees Regressor, and XGBoost consistently outperformed non-ensemble ones across the three datasets. However, the estimation accuracy and feature importance varied significantly across datasets, with a Mean Magnitude of Relative Error (MMRE) ranging from 0.11 to 9.45 across the datasets and target variables. Nevertheless, even in the worst-performing dataset, effort estimates aggregated to the project level showed good accuracy, with MMRE = 0.23. Machine learning algorithms, especially ensemble ones, seem to be a viable option for estimating the effort and duration of individual tasks in software projects. However, the quality of the estimates and the relevant features may depend largely on the characteristics of the available datasets and underlying projects. Nevertheless, even when the accuracy of individual estimates is poor, the aggregated estimates at the project level may present a good accuracy due to error compensation.

2023

Clinical model for Hereditary Transthyretin Amyloidosis age of onset prediction

Authors
Pedroto, M; Coelho, T; Jorge, A; Mendes Moreira, J;

Publication
FRONTIERS IN NEUROLOGY

Abstract
IntroductionHereditary transthyretin amyloidosis (ATTRv amyloidosis) is a rare neurological hereditary disease clinically characterized as severe, progressive, and life-threatening while the age of onset represents the moment in time when the first symptoms are felt. In this study, we present and discuss our results on the study, development, and evaluation of an approach that allows for time-to-event prediction of the age of onset, while focusing on genealogical feature construction. Materials and methodsThis research was triggered by the need to answer the medical problem of when will an asymptomatic ATTRv patient show symptoms of the disease. To do so, we defined and studied the impact of 77 features (ranging from demographic and genealogical to familial disease history) we studied and compared a pool of prediction algorithms, namely, linear regression (LR), elastic net (EN), lasso (LA), ridge (RI), support vector machines (SV), decision tree (DT), random forest (RF), and XGboost (XG), both in a classification as well as a regression setting; we assembled a baseline (BL) which corresponds to the current medical knowledge of the disease; we studied the problem of predicting the age of onset of ATTRv patients; we assessed the viability of predicting age of onset on short term horizons, with a classification framing, on localized sets of patients (currently symptomatic and asymptomatic carriers, with and without genealogical information); and we compared the results with an out-of-bag evaluation set and assembled in a different time-frame than the original data in order to account for data leakage. ResultsCurrently, we observe that our approach outperforms the BL model, which follows a set of clinical heuristics and represents current medical practice. Overall, our results show the supremacy of SV and XG for both the prediction tasks although impacted by data characteristics, namely, the existence of missing values, complex data, and small-sized available inputs. DiscussionWith this study, we defined a predictive model approach capable to be well-understood by medical professionals, compared with the current practice, namely, the baseline approach (BL), and successfully showed the improvement achieved to the current medical knowledge.

Supervised
thesis

2022

BeDriven plataforma de mobilidade

Author
João Daniel Oliveira Ribeiro

Institution
UTAD

2022

Designing out stigma in an online-based sexual health promotion programme

Author
Diogo Renato Barreira Coutinho

Institution
UP-FEUP

2022

Explaining Software Faults in Source Code

Author
Francisco José Torres Ribeiro

Institution
UM

2022

AI-Based Models to Predict The Traumatic Brain Injury Outcome

Author
João Pedro Barbosa Fonseca

Institution
UP-FEUP

2022

Reconciling Prediction in the Regression Setting: An Application to Portuguese Breweries’ Market Volume Sales Prediction

Author
Diogo Cerqueira Queirós

Institution
UP-FEUP