Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2008

Schema matching on streams with accuracy guarantees

Authors
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;

Publication
Intelligent Data Analysis

Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.

2008

The dimension of ECOCs for multiclass classification problems

Authors
Pimenta, E; Gama, J; Carvalho, A;

Publication
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS

Abstract
Several classification problems involve more than two classes. These problems are known as multiclass classification problems. One of the approaches to deal with multiclass problems is their decomposition into a set of binary problems. Recent work shows important advantages related with this approach. Several strategies have been proposed for this decomposition. The strategies most frequently used are All-vs-All, One-vs-All and Error Correction Output Codes (ECOC). ECOCs are based on binary words (codewords) and have been adapted to deal with multiclass problems. For such, they must comply with a number of specific constraints. Different dimensions may be adopted for the codewords for each number of classes in the problem. These dimensions grow exponentially with the number of classes present in a dataset. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The proposed methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive with other multiclass decomposition methods.

2008

Special track on data streams

Authors
Gama, J; Carvalho, A; Aguilar Rlliz, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract

2008

Knowledge discovery from sensor data (SensorKDD)

Authors
Vatsavai, RR; Omitaomu, OA; Gama, J; Chawla, NV; Gaber, MM; Ganguly, AR;

Publication
SIGKDD Explorations

Abstract

2008

RUSE-WARMR: Rule Selection for Classifier Induction in Multi-Relational Data-Sets

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS

Abstract
One of the major challenges in knowledge discovery is how to extract meaningful and useful knowledge from the complex structured data that one finds in Scientific and Technological applications. One approach is to explore the logic relations in the database and using, say, an Inductive Logic Programming (ILP) algorithm find descriptive and expressive patterns. These patterns can then be used as features to characterize the target concept, The effectiveness of these algorithms depends both upon the algorithm we use to generate the patterns and upon the classifier Rule mining provides an excellent framework for efficiently mining the interesting patterns that are relevant. We propose a novel method to select discriminative patterns and evaluate the effectiveness of this method on a complex discovery application of practical interest.

2008

Online reliability estimates for individual predictions in data streams

Authors
Rodrigues, PP; Gama, J; Bosnic, Z;

Publication
Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems, reliability measures have already been defined, mostly empirical measures as the estimation using the local sensitivity analysis. However, with the advent of data streams, these reliability estimates should also be computed online, based only on available data and current model's state. In this paper we define empirical measures to perform online estimation of reliability of individual predictions when made in the context of online learning systems. We present preliminary results and evaluate the estimators in two different problems. © 2008 IEEE.

  • 452
  • 516