Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Gama

2008

Schema matching on streams with accuracy guarantees

Autores
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;

Publicação
Intelligent Data Analysis

Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.

2007

Pursuing the best ECOC dimension for multiclass problems

Autores
Pimenta, E; Gama, J; Carvalho, A;

Publicação
Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007

Abstract
Recent work highlights advantages in decomposing multiclass decision problems into multiple binary problems. Several strategies have been proposed for this decomposition. The most frequently investigated are All-vs-All, One-vs-All and the Error correction output codes (ECOC). ECOC are binary words (codewords) and can be adapted to be used in classifications problems. They must, however, comply with some specific constraints. The codewords can have several dimensions for each number of classes to be represented. These dimensions grow exponentially with the number of classes of the multiclass problem. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive against conventional multiclass decomposition methods. Copyright

1999

Linear tree

Autores
Gama, J; Brazdil, P;

Publicação
Intelligent Data Analysis

Abstract
In this paper we present system Ltree for propositional supervised learning. Ltree is able to define decision surfaces both orthogonal and oblique to the axes defined by the attributes of the input space. This is done combining a decision tree with a linear discriminant by means of constructive induction. At each decision node Ltree defines a new instance space by insertion of new attributes that are projections of the examples that fall at this node over the hyper-planes given by a linear discriminant function. This new instance space is propagated down through the tree. Tests based on those new attributes are oblique with respect to the original input space. Ltree is a probabilistic tree in the sense that it outputs a class probability distribution for each query example. The class probability distribution is computed at learning time, taking into account the different class distributions on the path from the root to the actual node. We have carried out experiments on twenty one benchmark datasets and compared our system with other well known decision tree systems (orthogonal and oblique) like C4.5, OC1, LMDT, and CART. On these datasets we have observed that our system has advantages in what concerns accuracy and learning times at statistically significant confidence levels.

2010

Drift Severity Metric

Autores
Kosina, P; Gama, J; Sebastiao, R;

Publicação
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Concept drift in data is usually considered only as abrupt or gradual thus referring to the speed of change. Such simple distinguishing by speed is sufficient for most of the problems, but there might be situations for which a finer representation would be of use. This paper studies further the phenomenon of concept drift and introduces a simple measure which is relevant to the speed and amount of change between different concepts.

2007

OLINDDA

Autores
Spinosa, EJ; de Leon F. de Carvalho, AP; Gama, J;

Publicação
Proceedings of the 2007 ACM symposium on Applied computing - SAC '07

Abstract

2007

Knowledge discovery from data streams

Autores
Gama, J; Aguilar Ruiz, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

  • 60
  • 93