Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Paulo Jorge Azevedo

2015

Contrast set mining in temporal databases

Autores
Magalhaes, A; Azevedo, PJ;

Publicação
EXPERT SYSTEMS

Abstract
Understanding the underlying differences between groups or classes in certain contexts can be of the utmost importance. Contrast set mining relies on discovering significant patterns by contrasting two or more groups. A contrast set is a conjunction of attribute-value pairs that differ meaningfully in its distribution across groups. A previously proposed technique is rules for contrast sets, which seeks to express each contrast set found in terms of rules. This work extends rules for contrast sets to a temporal data mining task. We define a set of temporal patterns in order to capture the significant changes in the contrasts discovered along the considered time line. To evaluate the proposal accuracy and ability to discover relevant information, two different real-life data sets were studied using this approach.

2020

Sequence Mining for Automatic Generation of Software Tests from GUI Event Traces

Autores
Oliveira, A; Freitas, R; Jorge, A; Amorim, V; Moniz, N; Paiva, ACR; Azevedo, PJ;

Publicação
Intelligent Data Engineering and Automated Learning - IDEAL 2020 - 21st International Conference, Guimaraes, Portugal, November 4-6, 2020, Proceedings, Part II

Abstract
In today’s software industry, systems are constantly changing. To maintain their quality and to prevent failures at controlled costs is a challenge. One way to foster quality is through thorough and systematic testing. Therefore, the definition of adequate tests is crucial for saving time, cost and effort. This paper presents a framework that generates software test cases automatically based on user interaction data. We propose a data-driven software test generation solution that combines the use of frequent sequence mining and Markov chain modeling. We assess the quality of the generated test cases by empirically evaluating their coverage with respect to observed user interactions and code. We also measure the plausibility of the distribution of the events in the generated test sets using the Kullback-Leibler divergence. © 2020, Springer Nature Switzerland AG.

2025

Meta Subspace Analysis: Understanding Model (Mis)behavior in the Metafeature Space

Autores
Soares, C; Azevedo, PJ; Cerqueira, V; Torgo, L;

Publicação
Discovery Science - 28th International Conference, DS 2025, Ljubljana, Slovenia, September 23-25, 2025, Proceedings

Abstract
A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance when compared to the average test performance. For instance, in the marketing domain, the approach extracts subgroups such as: in groups of customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose a complementary method, Meta Subspace Analysis (MSA), MSA uses metalearning to analyze these subgroups in the metafeature space. We use association rules to relate metafeatures of the feature space represented by the subgroups to the improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. While the subgroups in the original feature space are useful for the end user and the data scientist developing the corresponding model, the meta-level rules provide a domain-independent perspective on the behavior of the model that is suitable for the same data scientist but also for ML researchers, to understand the behavior of algorithms. We illustrate the approach with the results of two well-known algorithms, naive Bayes and random forest, on the Adult dataset. The results confirm some expected behavior of algorithms. However, and most interestingly, some unexpected behaviors are also obtained, requiring additional investigation. In general, the empirical study demonstrates the usefulness of the approach to obtain additional knowledge about the behavior of models. © 2025 Elsevier B.V., All rights reserved.

2009

Spatial Clustering of Molecular Dynamics Trajectories in Protein Unfolding Simulations

Autores
Ferreira, PG; Silva, CG; Azevedo, PJ; Brito, RMM;

Publicação
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS

Abstract
Molecular dynamics simulations is a valuable tool to study protein unfolding in silico. Analyzing the relative spatial position of the residues during the simulation may indicate which residues are essential in determining the protein structure. We present a method, inspired by a popular data mining technique called Frequent Itemset Mining, that clusters sets of amino acid residues with a synchronized trajectory during the unfolding process. The proposed approach has several advantages over traditional hierarchical clustering. © 2009 Springer Berlin Heidelberg.

2010

Rules for contrast sets

Autores
Azevedo, PJ;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
In this paper we present a technique to derive rules describing contrast sets. Contrast sets are a formalism to represent groups differences. We propose a novel approach to describe directional contrasts using rules where the contrasting effect is partitioned into pairs of groups. Our approach makes use of a directional Fisher Exact Test to find significant differences across groups. We used a Bonferroni within-search adjustment to control type I errors and a pruning technique to prevent derivation of non significant contrast set specializations.

2007

Evaluating deterministic motif significance measures in protein databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
ALGORITHMS FOR MOLECULAR BIOLOGY

Abstract
Background: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations. Results: From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs. Conclusion: In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.

  • 2
  • 8