Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2006

Data mining for business applications: KDD-2006 workshop

Autores
Ghani, R; Soares, C;

Publicação
SIGKDD Explorations

Abstract

2006

Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features

Autores
Soares, C; Brazdil, PB;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
The Support Vector Machine (SVM) algorithm is sensitive to the choice of parameter settings, which makes it hard to use by non-experts. It has been shown that meta-learning can be used to support the selection of SVM parameter values. Previous approaches have used general statistical measures as meta-features. Here we propose a new set of meta-features that are based on the kernel matrix. We test them on the problem of setting the width of the Gaussian kernel for regression problems. We obtain significant improvements in comparison to earlier meta-learning results. We expect that with better support in the selection of parameter values, SVM becomes accessible to a wider range of users. Copyright 2006 ACM.

2006

Sequence mining on web access logs: A case study

Autores
Soares, C; de Graaf, E; Kok, JN; Kosters, WA;

Publicação
Belgian/Netherlands Artificial Intelligence Conference

Abstract
We present a case study in which sequence mining algorithms were applied to web access log data. The data are from a portal that is targeted for business users. In this portal, like in many others, content is described using a set of descriptors, such as keywords, category and type. We investigate whether representing content by the type rather than its identifier enables existing sequence mining methods to obtain interesting patterns. Rather than a more traditional approach based on measures such as support and confidence, we analyze results from an application perspective. This enables us to identify opportunities for improving and extending these methods.

2006

Discretization from data streams: Applications to histograms and data mining

Autores
Gama, J; Pinto, C;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The first layer receives the sequence of input data and keeps some, statistics on the data using many more intervals than required. Based on the statistics stored by the first layer, the second layer creates the final discretization. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We experimentally demonstrate that incremental discretization is able to maintain the performance of learning algorithms in comparison to a batch discretization. The proposed method is much more appropriate in incremental learning, and in problems where data flows continuously, as in most of the recent data mining applications. Copyright 2006 ACM.

2006

Learning with local drift detection

Autores
Gama, J; Castillo, G;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS

Abstract
Most of the work in Machine Learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generates the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to monitor the online error-rate of a learning algorithm looking for significant deviations. The method can be used as a wrapper over any learning algorithm. In most problems, a change affects only some regions of the instance space, not the instance space as a whole. In decision models that fit different functions to regions of the instance space, like Decision Trees and Rule Learners, the method can be used to monitor the error in regions of the instance space, with advantages of fast model adaptation. In this work we present experiments using the method as a wrapper over a decision tree and a linear model, and in each internal-node of a decision tree. The experimental results obtained in controlled experiments using artificial data and a real-world problem show a good performance detecting drift and in adapting the decision model to the new concept.

2006

An adaptive prequential learning framework for Bayesian Network Classifiers

Autores
Castillo, G; Gama, J;

Publicação
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS

Abstract
We introduce an adaptive prequential learning framework for Bayesian Network Classifiers which attempts to handle the cost-performance trade-off and cope with concept drift. Our strategy for incorporating new data is based on bias management and gradual adaptation. Starting with the simple Naive Bayes, we scale up the complexity by gradually increasing the maximum number of allowable attribute dependencies, and then by searching for new dependences in the extended search space. Since updating the structure is a costly task, we use new data to primarily adapt the parameters and only if this is really necessary, do we adapt the structure. The method for handling concept drift is based on the Shewhart P-Chart. We evaluated our adaptive algorithms on artificial domains and benchmark problems and show its advantages and future applicability in real-world on-line learning systems.

  • 474
  • 516