Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I'm currently working in Data Mining and Machine Learning for predictive maintenance in the MANTIS project.

I have a PhD in Computer Science from the University of Leiden (Netherlands) in the topic of Pattern Mining for Label Ranking.

Interest
Topics
Details

Details

  • Name

    Cláudio Rebelo Sá
  • Role

    External Research Collaborator
  • Since

    01st July 2011
Publications

2018

Preference rules for label ranking: Mining patterns in multi-target relations

Authors
de Sa, CR; Azevedo, P; Soares, C; Jorge, AM; Knobbe, A;

Publication
INFORMATION FUSION

Abstract
In this paper, we investigate two variants of association rules for preference data, Label Ranking Association Rules and Pairwise Association Rules. Label Ranking Association Rules (LRAR) are the equivalent of Class Association Rules (CAR) for the Label Ranking task. In CAR, the consequent is a single class, to which the example is expected to belong to. In LRAR, the consequent is a ranking of the labels. The generation of LRAR requires special support and confidence measures to assess the similarity of rankings. In this work, we carry out a sensitivity analysis of these similarity-based measures. We want to understand which datasets benefit more from such measures and which parameters have more influence in the accuracy of the model. Furthermore, we propose an alternative type of rules, the Pairwise Association Rules (PAR), which are defined as association rules with a set of pairwise preferences in the consequent. While PAR can be used both as descriptive and predictive models, they are essentially descriptive models. Experimental results show the potential of both approaches.

2017

Label Ranking Forests

Authors
de Sa, CR; Soares, C; Knobbe, A; Cortez, P;

Publication
EXPERT SYSTEMS

Abstract
The problem of Label Ranking is receiving increasing attention from several research communities. The algorithms that have been developed/adapted to treat rankings of a fixed set of labels as the target object, including several different types of decision trees (DT). One DT-based algorithm, which has been very successful in other tasks but which has not been adapted for label ranking is the Random Forests (RF) algorithm. RFs are an ensemble learning method that combines different trees obtained using different randomization techniques. In this work, we propose an ensemble of decision trees for Label Ranking, based on Random Forests, which we refer to as Label Ranking Forests (LRF). Two different algorithms that learn DT for label ranking are used to obtain the trees. We then compare and discuss the results of LRF with standalone decision tree approaches. The results indicate that the method is highly competitive.

2016

Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance

Authors
Cerqueira, V; Pinto, F; Sa, C; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.

2016

Entropy-based discretization methods for ranking data

Authors
de Sa, CR; Soares, C; Knobbe, A;

Publication
INFORMATION SCIENCES

Abstract
Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

2016

Exceptional Preferences Mining

Authors
de Sa, CR; Duivesteijn, W; Soares, C; Knobbe, A;

Publication
DISCOVERY SCIENCE, (DS 2016)

Abstract
Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups.