Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2012

HCAC: Semi-supervised hierarchical clustering using confidence-based active learning

Autores
Nogueira, BM; Jorge, AM; Rezende, SO;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Despite their importance, hierarchical clustering has been little explored for semi-supervised algorithms. In this paper, we address the problem of semi-supervised hierarchical clustering by using an active learning solution with cluster-level constraints. This active learning approach is based on a new concept of merge confidence in agglomerative clustering. When there is low confidence in a cluster merge the user is queried and provides a cluster-level constraint. The proposed method is compared with an unsupervised algorithm (average-link) and two state-of-the-art semi-supervised algorithms (pairwise constraints and Constrained Complete-Link). Results show that our algorithm tends to be better than the two semi-supervised algorithms and can achieve a significant improvement when compared to the unsupervised algorithm. Our approach is particularly useful when the number of clusters is high which is the case in many real problems. © 2012 Springer-Verlag Berlin Heidelberg.

FecharLer Abstract

2012

Towards Utility Maximization in Regression

Autores
Ribeiro, RP;

Publicação
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012)

Abstract
Utilitybased learning is a key technique for addressing many real world data mining applications, where the costs/benefits are not uniform across the domain of the target variable. Still, most of the existing research has been focused on classification problems. In this paper we address a related problem. There are many relevant domains (e. g. ecological, meteorological, finance) where decisions are based on the forecast of a numeric quantity (i.e. the result of a regression model). The goal of the work on this paper is to present an evaluation framework for applications where the numeric outcome of a regression model may lead to different costs/benefits as a consequence of the actions it entails. The new metric provides a more informed estimate of the utility of any regression model, given the application-specific preference biases, and hence makes more reliable the comparison and selection between alternative regression models. We illustrate the objective of our evaluation methodology on a real-life application and also carry a set of experiments over a subset of our target regression tasks: the prediction of rare and extreme values. Results show the effectiveness of our proposed utility metric for identifying the models that perform better on this type of applications.

FecharLer Abstract

2012

Conceptual clustering with generalization by intervals [Classification Conceptuelle avec Généralisation par Intervalles]

Autores
Brito, P; Polaillon, G;

Publicação
Revue des Nouvelles Technologies de l'Information

Abstract
This paper deals with hierarchical or pyramidal conceptual clustering methods, where each formed cluster corresponds to a concept, i.e., a pair (extent, intent).We consider data presenting real or interval-valued numerical values, ordered values and/or probability/frequency distributions on a set of categories. Concepts are obtained by a Galois connection with generalisation by intervals, which allows dealing with different variable types on a common framework. In the case of distribution data, the obtained concepts are more homogeneous and more easily interpretable than those obtained by using the maximum and minimum operators previously proposed. A measure of generality of a concept is defined similarly for all these variable types. An example illustrates the proposed method.

FecharLer Abstract

2012

Divisive monothetic clustering for interval and histogram-valued data

Autores
Brito, P; Chavent, M;

Publicação
ICPRAM 2012 - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods

Abstract
In this paper we propose a divisive top-down clustering method designed for interval and histogram-valued data. The method provides a hierarchy on a set of objects together with a monothetic characterization of each formed cluster. At each step, a cluster is split so as to minimize intra-cluster dispersion, which is measured using a distance suitable for the considered variable types. The criterion is minimized across the bipartitions induced by a set of binary questions. Since interval-valued variables may be considered a special case of histogram-valued variables, the method applies to data described by either kind of variables, or by variables of both types. An example illustrates the proposed approach.

FecharLer Abstract

2012

Modelling interval data with Normal and Skew-Normal distributions

Autores
Brito, P; Pedro Duarte Silva, APD;

Publicação
JOURNAL OF APPLIED STATISTICS

Abstract
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.

FecharLer Abstract

2012

Combining meta-learning and optimization algorithms for parameter selection

Autores
Gomes, T; Miranda, P; Prudencio, R; Soares, C; Carvalho, A;

Publicação
CEUR Workshop Proceedings

Abstract
In this article we investigate the combination of meta-learning and optimization algorithms for parameter selection. We discuss our general proposal as well as present the recent develop-ments and experiments performed using Support Vector Machines (SVMs). Meta-learning was combined to single and multi-objective optimization techniques to select SVM parameters. The hybrid meth-ods derived from the proposal presented better results on predictive accuracy than the use of traditional optimization techniques.

FecharLer Abstract