Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2008

Special track on data streams

Autores
Gama, J; Carvalho, A; Aguilar Rlliz, J;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract

2008

Knowledge discovery from sensor data (SensorKDD)

Autores
Vatsavai, RR; Omitaomu, OA; Gama, J; Chawla, NV; Gaber, MM; Ganguly, AR;

Publicação
SIGKDD Explorations

Abstract

2008

RUSE-WARMR: Rule Selection for Classifier Induction in Multi-Relational Data-Sets

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS

Abstract
One of the major challenges in knowledge discovery is how to extract meaningful and useful knowledge from the complex structured data that one finds in Scientific and Technological applications. One approach is to explore the logic relations in the database and using, say, an Inductive Logic Programming (ILP) algorithm find descriptive and expressive patterns. These patterns can then be used as features to characterize the target concept, The effectiveness of these algorithms depends both upon the algorithm we use to generate the patterns and upon the classifier Rule mining provides an excellent framework for efficiently mining the interesting patterns that are relevant. We propose a novel method to select discriminative patterns and evaluate the effectiveness of this method on a complex discovery application of practical interest.

FecharLer Abstract

2008

Online reliability estimates for individual predictions in data streams

Autores
Rodrigues, PP; Gama, J; Bosnic, Z;

Publicação
Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems, reliability measures have already been defined, mostly empirical measures as the estimation using the local sensitivity analysis. However, with the advent of data streams, these reliability estimates should also be computed online, based only on available data and current model's state. In this paper we define empirical measures to perform online estimation of reliability of individual predictions when made in the context of online learning systems. We present preliminary results and evaluate the estimators in two different problems. © 2008 IEEE.

FecharLer Abstract

2008

A review on the combination of binary classifiers in multiclass problems

Autores
Lorena, AC; de Carvalho, ACPLF; Gama, JMP;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

FecharLer Abstract

2008

Learning from Data Streams: Synopsis and Change Detection

Autores
Sebastiao, R; Gama, J; Mendonca, T;

Publicação
STAIRS 2008

Abstract
The aim of this PhD program is the study of algorithms for learning histograms, with the capacity of representing continuous high-speed flows of data and dealing with the current problem of change detection on data streams. In many modern applications, information is no longer gathered as finite stored data sets, but assuming the form of infinite data streams. As a large volume of information is produced at a high-speed rate it is no longer possible to use memory algorithms which require the full historic data stored in the main memory, so new ones are needed to process data online at the rate it is available. Moreover, the process generating data is not strictly stationary and evolves over time; so algorithms should, while extracting some sort of knowledge from this incessantly growing data, be able to adapt themselves to changes, maintaining a representation consistent with the most recent status of nature. In this work, we presented a feasible approach, using incremental histograms and monitoring data distributions, to detect concept drift in data stream context.

FecharLer Abstract