Publications

Publications by Carlos Manuel Soares

2014

Merging Decision Trees: A Case Study in Predicting Student Performance

Authors
Strecht, P; Mendes Moreira, J; Soares, C;

Publication
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
Predicting the failure of students in university courses can provide useful information for course and programme managers as well as to explain the drop out phenomenon. While it is important to have models at course level, their number makes it hard to extract knowledge that can be useful at the university level. Therefore, to support decision making at this level, it is important to generalize the knowledge contained in those models. We propose an approach to group and merge interpretable models in order to replace them with more general ones without compromising the quality of predictive performance. We evaluate our approach using data from the U. Porto. The results obtained are promising, although they suggest alternative approaches to the problem.

CloseRead Abstract

2014

Monitoring Recommender Systems: A Business Intelligence Approach

Authors
Felix, C; Soares, C; Jorge, A; Vinagre, J;

Publication
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, PART VI - ICCSA 2014

Abstract
Recommender systems (RS) are increasingly adopted by e-business, social networks and many other user-centric websites. Based on the user's previous choices or interests, a RS suggests new items in which the user might be interested. With constant changes in user behavior, the quality of a RS may decrease over time. Therefore, we need to monitor the performance of the RS, giving timely information to management, who can than manage the RS to maximize results. Our work consists in creating a monitoring platform - based on Business Intelligence (BI) and On-line Analytical Processing (OLAP) tools - that provides information about the recommender system, in order to assess its quality, the impact it has on users and their adherence to the recommendations. We present a case study with Palco Principal(1), a social network for music.

CloseRead Abstract

2013

Multi-interval Discretization of Continuous Attributes for Label Ranking

Authors
de Sa, CR; Soares, C; Knobbe, A; Azevedo, P; Jorge, AM;

Publication
DISCOVERY SCIENCE

Abstract
Label Ranking (LR) problems, such as predicting rankings of financial analysts, are becoming increasingly important in data mining. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, pre-processing methods for LR are still very scarce. However, some methods, like Naive Bayes for LR and APRIORI-LR, cannot deal with real-valued data directly. As a make-shift solution, one could consider conventional discretization methods used in classification, by simply treating each unique ranking as a separate class. In this paper, we show that such an approach has several disadvantages. As an alternative, we propose an adaptation of an existing method, MDLP, specifically for LR problems. We illustrate the advantages of the new method using synthetic data. Additionally, we present results obtained on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and in some cases improves the results of the learning algorithms.

CloseRead Abstract

2017

Scalable Online Top-N Recommender Systems

Authors
Jorge, AM; Vinagre, J; Domingues, M; Gama, J; Soares, C; Matuszyk, P; Spiliopoulou, M;

Publication
E-COMMERCE AND WEB TECHNOLOGIES, EC-WEB 2016

Abstract
Given the large volumes and dynamics of data that recommender systems currently have to deal with, we look at online stream based approaches that are able to cope with high throughput observations. In this paper we describe work on incremental neighborhood based and incremental matrix factorization approaches for binary ratings, starting with a general introduction, looking at various approaches and describing existing enhancements. We refer to recent work on forgetting techniques and multidimensional recommendation. We will also focus on adequate procedures for the evaluation of online recommender algorithms.

CloseRead Abstract

2014

An Empirical Methodology to Analyze the Behavior of Bagging

Authors
Pinto, F; Soares, C; Mendes Moreira, J;

Publication
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.

CloseRead Abstract

2017

Metalearning for Context-aware Filtering: Selection of Tensor Factorization Algorithms

Authors
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publication
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17)

Abstract
This work addresses the problem of selecting Tensor Factorization algorithms for the Context-aware Filtering recommendation task using a metalearning approach. The most important challenge of applying metalearning on new problems is the development of useful measures able to characterize the data, i.e. metafeatures. We propose an extensive and exhaustive set of metafeatures to characterize Context-aware Filtering recommendation task. These metafeatures take advantage of the tensor's hierarchical structure via slice operations. The algorithm selection task is addressed as a Label Ranking problem, which ranks the Tensor Factorization algorithms according to their expected performance, rather than simply selecting the algorithm that is expected to obtain the best performance. A comprehensive experimental work is conducted on both levels, baselevel and metalevel (Tensor Factorization and Label Ranking, respectively). The results show that the proposed metafeatures lead to metamodels that tend to rank Tensor Factorization algorithms accurately and that the selected algorithms present high recommendation performance.

CloseRead Abstract