Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2007

OLINDDA

Autores
Spinosa, EJ; de Leon F. de Carvalho, AP; Gama, J;

Publicação
Proceedings of the 2007 ACM symposium on Applied computing - SAC '07

Abstract

2007

Knowledge discovery from data streams

Autores
Gama, J; Aguilar Ruiz, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

2008

The dimension of ECOCs for multiclass classification problems

Autores
Pimenta, E; Gama, J; Carvalho, A;

Publicação
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS

Abstract
Several classification problems involve more than two classes. These problems are known as multiclass classification problems. One of the approaches to deal with multiclass problems is their decomposition into a set of binary problems. Recent work shows important advantages related with this approach. Several strategies have been proposed for this decomposition. The strategies most frequently used are All-vs-All, One-vs-All and Error Correction Output Codes (ECOC). ECOCs are based on binary words (codewords) and have been adapted to deal with multiclass problems. For such, they must comply with a number of specific constraints. Different dimensions may be adopted for the codewords for each number of classes in the problem. These dimensions grow exponentially with the number of classes present in a dataset. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The proposed methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive with other multiclass decomposition methods.

FecharLer Abstract

2007

Change detection in learning histograms from data streams

Autores
Sebastiao, R; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.

FecharLer Abstract

2003

Iterative Bayes

Autores
Gama, J;

Publicação
THEORETICAL COMPUTER SCIENCE

Abstract
Naive Bayes is a well-known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The Iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. In this paper we argue that Iterative Bayes minimizes a quadratic loss function instead of the 0-1 loss function that usually applies, to classification problems. Experimental evaluation of Iterative Bayes on 27 benchmark data sets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

FecharLer Abstract

1999

Iterative naive Bayes

Autores
Gama, J;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 25 benchmark datasets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

FecharLer Abstract