Publicacoes - INESC TEC

Publicações

Publicações por Pavel Brazdil

2016

Determining the Level of Clients' Dissatisfaction from Their Commentaries

Autores
Forte, AC; Brazdil, PB;

Publicação
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016)

Abstract
We present a study in the area of sentiment analysis of clients' commentaries transcribed by assistants of a help-desk service of one Portuguese telecommunications company. We have adopted a lexicon-based approach to determine the polarity of the sentiment of each commentary, based on the so called opinion words. This task was by no means easy, as not many tools are available for the Portuguese language. The initial results with the off-the-shelf solutions were rather poor. This has motivated us to carry out a number of enhancements, including, for instance, enriching the given lexicon with domain specific terms, formulating specific rules for negation and amplifiers. Automatic pruning of some of the lexicon terms has led to a significant improvement in performance. As our final system achieved a very good performance, our work should be of interest to others working on domain specific solutions for languages where ready-made solutions are not available.

FecharLer Abstract

2017

Metalearning

Autores
Brazdil, P; Vilalta, R; Giraud Carrier, CG; Soares, C;

Publicação
Encyclopedia of Machine Learning and Data Mining

Abstract
In the area machine learning / data mining many diverse algorithms are available nowadays and hence the selection of the most suitable algorithm may be a challenge. Tbhis is aggravated by the fact that many algorithms require that certain parameters be set. If a wrong algorithm and/or parameter configuration is selected, substandard results may be obtained. The topic of metalearning aims to facilitate this task. Metalearning typically proceeds in two phases. First, a given set of algorithms A (e.g. classification algorithms) and datasets D is identified and different pairs < ai,dj > from these two sets are chosen for testing. The dataset di is described by certain meta-features which together with the performance result of algorithm ai constitute a part of the metadata. In the second phase the metadata is used to construct a model, usually again with recourse to machine learning methods. The model represents a generalization of various base-level experiments. The model can then be applied to the new dataset to recommend the most suitable algorithm or a ranking ordered by relative performance. This article provides more details about this area. Besides, it discusses also how the method can be combined with hyperparameter optimization and extended to sequences of operations (workflows). © Springer Science+Business Media New York 2011, 2017

FecharLer Abstract

2015

Proceedings of the 2015 International Workshop on Meta-Learning and Algorithm Selection co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2015 (ECMLPKDD 2015), Porto, Portugal, September 7th, 2015

Autores
Vanschoren, J; Brazdil, P; Carrier, CGG; Kotthoff, L;

Publicação
MetaSel@PKDD/ECML

Abstract

2017

Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms co-located with the European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017

Autores
Brazdil, P; Vanschoren, J; Hutter, F; Hoos, H;

Publicação
AutoML@PKDD/ECML

Abstract

2017

Data mining techniques for the grouping of certified wines from the sub-regions of the demarcated region of Vinho Verde [Técnicas de data mining para agrupamento dos vinhos certificados das sub-regiões da região demarcada dos Vinhos Verdes]

Autores
Souza Roza, R; Brazdil, P; Reis, JL; Cerdeira, A; Martins, P; Felgueiras, O;

Publicação
Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao

Abstract
The combination of information obtained from data mining technique from physicochemical and organoleptic data analysis allowed similarities between the wines of the nine sub-regions in the Demarcated Region of Vinho Verde. Through clustering techniques, four clusters were identified, each characterized by its centroid. The measure of information gain, together with supervised rule-based learning, was used to find the differentiating characteristics. This study allowed the interconnection of the characteristics of the wines of these sub-regions, which can improve the decision making on the profiles of these same wines.

FecharLer Abstract

2018

Impact of Feature Selection on Average Ranking Method via Metalearning

Autores
Abdulrahman, SM; Cachada, MV; Brazdil, P;

Publicação
VIPIMAGE 2017

Abstract
Selecting appropriate classification algorithms for a given dataset is crucial and useful in practice but is also full of challenges. In order to maximize performance, users of machine learning algorithms need methods that can help them identify the most relevant features in datasets, select algorithms and determine their appropriate hyperparameter settings. In this paper, a method of recommending classification algorithms is proposed. It is oriented towards the average ranking method, combining algorithm rankings observed on prior datasets to identify the best algorithms for a new dataset. Our method uses a special case of data mining workflow that combines algorithm selection preceded by a feature selection method (CFS).

FecharLer Abstract