Publicacoes - INESC TEC

Publicações

Publicações por João Mendes Moreira

2024

Heterogeneity in families with ATTRV30M amyloidosis: a historical and longitudinal Portuguese case study impact for genetic counselling

Autores
Pedroto, M; Coelho, T; Fernandes, J; Oliveira, A; Jorge, A; Mendes Moreira, J;

Publicação
AMYLOID-JOURNAL OF PROTEIN FOLDING DISORDERS

Abstract
BackgroundHereditary transthyretin amyloidosis (ATTRv amyloidosis) is an inherited disease, where the study of family history holds importance. This study evaluates the changes of age-of-onset (AOO) and other age-related clinical factors within and among families affected by ATTRv amyloidosis.MethodsWe analysed information from 934 trees, focusing on family, parents, probands and siblings relationships. We focused on 1494 female and 1712 male symptomatic ATTRV30M patients. Results are presented alongside a comparison of current with historical records. Clinical and genealogical indicators identify major changes.ResultsOverall, analysis of familial data shows the existence of families with both early and late patients (1/6). It identifies long familial follow-up times since patient families tend to be diagnosed over several years. Finally, results show a large difference between parent-child and proband-patient relationships (20-30 years).ConclusionsThis study reveals that there has been a shift in patient profile, with a recent increase in male elderly cases, especially regarding probands. It shows that symptomatic patients exhibit less variability towards siblings, when compared to other family members, namely the transmitting ancestors' age of onset. This can influence genetic counselling guidelines.

FecharLer Abstract

2023

Interpreting What is Important: An Explainability Approach and Study on Feature Selection

Autores
Rodrigues, EM; Baghoussi, Y; Mendes-Moreira, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Machine learning models are widely used in time series forecasting. One way to reduce its computational cost and increase its efficiency is to select only the relevant exogenous features to be fed into the model. With this intention, a study on the feature selection methods: Pearson correlation coefficient, Boruta, Boruta-Shap, IMV-LSTM, and LIME is performed. A new method focused on interpretability, SHAP-LSTM, is proposed, using a deep learning model training process as part of a feature selection algorithm. The methods were compared in 2 different datasets showing comparable results with lesser computational cost when compared with the use of all features. In all datasets, SHAP-LSTM showed competitive results, having comparatively better results on the data with a higher presence of scarce occurring categorical features.

FecharLer Abstract

2023

Studying the Impact of Sampling in Highly Frequent Time Series

Autores
Ferreira, PJS; Mendes-Moreira, J; Rodrigues, A;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Nowadays, all kinds of sensors generate data, and more metrics are being measured. These large quantities of data are stored in large data centers and used to create datasets to train Machine Learning algorithms for most different areas. However, processing that data and training the Machine Learning algorithms require more time, and storing all the data requires more space, creating a Big Data problem. In this paper, we propose simple techniques for reducing large time series datasets into smaller versions without compromising the forecasting capability of the generated model and, simultaneously, reducing the time needed to train the models and the space required to store the reduced sets. We tested the proposed approach in three public and one private dataset containing time series with different characteristics. The results show, for the datasets studied that it is possible to use reduced sets to train the algorithms without affecting the forecasting capability of their models. This approach is more efficient for datasets with higher frequencies and larger seasonalities. With the reduced sets, we obtain decreases in the training time between 40 and 94% and between 46 and 65% for the memory needed to store the reduced sets.

FecharLer Abstract

2023

Combining Neighbor Models to Improve Predictions of Age of Onset of ATTRv Carriers

Autores
Pedroto, M; Jorge, A; Mendes Moreira, J; Coelho, T;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II

Abstract
Transthyretin (TTR)-related familial amyloid polyneuropathy (ATTRv) is a life-threatening autosomal dominant disease and the age of onset represents the moment when first symptoms are felt. Accurately predicting the age of onset for a given patient is relevant for risk assessment and treatment management. In this work, we evaluate the impact of combining prediction models obtained from neighboring time windows on prediction error. We propose Symmetric (Sym) and Asymmetric (Asym) models which represent two different averaging approaches. These are incorporated with a weighting mechanism as to create Symmetric (Sym), Symmetric-weighted (Sym-w), Asymmetric (Asym), and Asymmetric-weighted (Asym-w). These four ensemble models are then compared to the original approach which is focused on individual regression base learners namely: Baseline (BL), Decision Tree (DT), Elastic Net (EN), Lasso (LA), Linear Regression (LR), Random Forest (RF), Ridge (RI), Support Vector Regressor (SV) and XGBoost (XG). Our results show that by aggregating predictions from neighbor models the average mean absolute error obtained by each base learner decreases. Overall, the best results are achieved by regression-based ensemble tree models as base learners.

FecharLer Abstract

2024

Map-matching methods in agriculture

Autores
Silva, A; Mendes Moreira, J; Ferreira, C; Costa, N; Dias, D;

Publicação
COMPUTERS AND ELECTRONICS IN AGRICULTURE

Abstract
In this paper, a solution to monitor the location of humans during their activity in the agriculture sector with the aim to boost productivity and efficiency is provided. Our solution is based on map-matching methods, that are used to track the path spanned by a worker along a specific activity in an agriculture culture. Two different cultures are taken into consideration in this study olives and vines. We leverage the symmetry of the geometry of these cultures into our solution and divide the problem three-fold initially, we estimate a path of a worker along the fields, then we apply the map-matching to such path and finally, a post-processing method is applied to ensure local continuity of the sequence obtained from map-matching. The proposed methods are experimentally evaluated using synthetic and real data in the region of Mirandela, Portugal. Evaluation metrics show that results for synthetic data are robust under several sampling periods, while for real-world data, results for the vine culture are on par with synthetic, and for the olive culture performance is reduced.

FecharLer Abstract

2023

Unsupervised Online Event Ranking for IT Operations

Autores
Mendes, TC; Barata, AA; Pereira, M; Moreira, JM; Camacho, R; Sousa, RT;

Publicação
IDEAL

Abstract
Keeping high service levels of a fast-growing number of servers is crucial and challenging for IT operations teams. Online monitoring systems trigger many occurrences that experts find hard to keep up with. In addition, most of the triggered warnings do not correspond to real, critical problems, making it difficult for technicians to know which to focus on and address in a timely manner. Outlier and concept drift detection techniques can be applied to multiple streams of readings related to server monitoring metrics, but they also generate many False Positives. Ranking algorithms can already prioritize relevant results in information retrieval and recommender systems. However, these approaches are supervised, making them inapplicable in event detection on data streams. We propose a framework that combines event aggregations and uses a customized clustering algorithm to score and rank alarms in the context of IT operations. To the best of our knowledge, this is the first unsupervised, online, high-dimensional approach to rank IT ops events and contributes to advancing knowledge about associated key concepts and challenges of this problem.

FecharLer Abstract