Publicacoes - INESC TEC

Publicações

Publicações por Paulo Jorge Azevedo

2013

Multi-interval Discretization of Continuous Attributes for Label Ranking

Autores
de Sa, CR; Soares, C; Knobbe, A; Azevedo, P; Jorge, AM;

Publicação
DISCOVERY SCIENCE

Abstract
Label Ranking (LR) problems, such as predicting rankings of financial analysts, are becoming increasingly important in data mining. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, pre-processing methods for LR are still very scarce. However, some methods, like Naive Bayes for LR and APRIORI-LR, cannot deal with real-valued data directly. As a make-shift solution, one could consider conventional discretization methods used in classification, by simply treating each unique ranking as a separate class. In this paper, we show that such an approach has several disadvantages. As an alternative, we propose an adaptation of an existing method, MDLP, specifically for LR problems. We illustrate the advantages of the new method using synthetic data. Additionally, we present results obtained on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and in some cases improves the results of the learning algorithms.

FecharLer Abstract

2013

Classifying heart sounds using multiresolution time series motifs: an exploratory study

Autores
Gomes, EF; Jorge, AM; Azevedo, PJ;

Publicação
International C* Conference on Computer Science & Software Engineering, C3S2E13, Porto, Portugal - July 10 - 12, 2013

Abstract
The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification. © 2013 ACM.

FecharLer Abstract

2014

Classifying Heart Sounds using SAX Motifs, Random Forests and Text Mining techniques

Autores
Gomes, EF; Jorge, AM; Azevedo, PJ;

Publicação
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
In this paper we describe an approach to classifying heart sounds (classes Normal, Murmur and Extra-systole) that is based on the discretization of sound signals using the SAX (Symbolic Aggregate Approximation) representation. The ability of automatically classifying heart sounds or at least support human decision in this task is socially relevant to spread the reach of medical care using simple mobile devices or digital stethoscopes. In our approach, sounds are first pre-processed using signal processing techniques (decimate, low-pass filter, normalize, Shannon envelope). Then the pre-processed symbols are transformed into sequences of discrete SAX symbols. These sequences are subject to a process of motif discovery. Frequent sequences of symbols (motifs) are adopted as features. Each sound is then characterized by the frequent motifs that occur in it and their respective frequency. This is similar to the term frequency (TF) model used in text mining. In this paper we compare the TF model with the application of the TFIDF (Term frequency - Inverse Document Frequency) and the use of bi-grams (frequent size two sequences of motifs). Results show the ability of the motifs based TF approach to separate classes and the relative value of the TFIDF and the bi-grams variants. The separation of the Extra-systole class is overly difficult and much better results are obtained for separating the Murmur class. Empirical validation is conducted using real data collected in noisy environments. We have also assessed the cost-reduction potential of the proposed methods by considering a fixed cost model and using a cost sensitive meta algorithm.

FecharLer Abstract

2018

Preference rules for label ranking: Mining patterns in multi-target relations

Autores
de Sa, CR; Azevedo, P; Soares, C; Jorge, AM; Knobbe, A;

Publicação
INFORMATION FUSION

Abstract
In this paper, we investigate two variants of association rules for preference data, Label Ranking Association Rules and Pairwise Association Rules. Label Ranking Association Rules (LRAR) are the equivalent of Class Association Rules (CAR) for the Label Ranking task. In CAR, the consequent is a single class, to which the example is expected to belong to. In LRAR, the consequent is a ranking of the labels. The generation of LRAR requires special support and confidence measures to assess the similarity of rankings. In this work, we carry out a sensitivity analysis of these similarity-based measures. We want to understand which datasets benefit more from such measures and which parameters have more influence in the accuracy of the model. Furthermore, we propose an alternative type of rules, the Pairwise Association Rules (PAR), which are defined as association rules with a set of pairwise preferences in the consequent. While PAR can be used both as descriptive and predictive models, they are essentially descriptive models. Experimental results show the potential of both approaches.

FecharLer Abstract

2015

Automatically estimating iSAX parameters

Autores
Castro, NC; Azevedo, PJ;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
The Symbolic Aggregate Approximation (iSAX) is widely used in time series data mining. Its popularity arises from the fact that it largely reduces time series size, it is symbolic, allows lower bounding and is space efficient. However, it requires setting two parameters: the symbolic length and alphabet size, which limits the applicability of the technique. The optimal parameter values are highly application dependent. Typically, they are either set to a fixed value or experimentally probed for the best configuration. In this work we propose an approach to automatically estimate iSAX's parameters. The approach - AutoiSAX - not only discovers the best parameter setting for each time series in the database, but also finds the alphabet size for each iSAX symbol within the same word. It is based on simple and intuitive ideas from time series complexity and statistics. The technique can be smoothly embedded in existing data mining tasks as an efficient sub-routine. We analyze its impact in visualization interpretability, classification accuracy and motif mining. Our contribution aims to make iSAX a more general approach as it evolves towards a parameter-free method.

FecharLer Abstract

2018

Discovering a taste for the unusual: exceptional models for preference mining

Autores
de Sa, CR; Duivesteijn, W; Azevedo, P; Jorge, AM; Soares, C; Knobbe, A;

Publicação
MACHINE LEARNING

Abstract
Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes exceptional' varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.

FecharLer Abstract