2008
Autores
Rodrigues, PP; Gama, J; Bosnic, Z;
Publicação
Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems, reliability measures have already been defined, mostly empirical measures as the estimation using the local sensitivity analysis. However, with the advent of data streams, these reliability estimates should also be computed online, based only on available data and current model's state. In this paper we define empirical measures to perform online estimation of reliability of individual predictions when made in the context of online learning systems. We present preliminary results and evaluate the estimators in two different problems. © 2008 IEEE.
2008
Autores
Lorena, AC; de Carvalho, ACPLF; Gama, JMP;
Publicação
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
2008
Autores
Sebastiao, R; Gama, J; Mendonca, T;
Publicação
STAIRS 2008
Abstract
The aim of this PhD program is the study of algorithms for learning histograms, with the capacity of representing continuous high-speed flows of data and dealing with the current problem of change detection on data streams. In many modern applications, information is no longer gathered as finite stored data sets, but assuming the form of infinite data streams. As a large volume of information is produced at a high-speed rate it is no longer possible to use memory algorithms which require the full historic data stored in the main memory, so new ones are needed to process data online at the rate it is available. Moreover, the process generating data is not strictly stationary and evolves over time; so algorithms should, while extracting some sort of knowledge from this incessantly growing data, be able to adapt themselves to changes, maintaining a representation consistent with the most recent status of nature. In this work, we presented a feasible approach, using incremental histograms and monitoring data distributions, to detect concept drift in data stream context.
2008
Autores
Ikonotnovska, E; Gama, J;
Publicação
DISCOVERY SCIENCE, PROCEEDINGS
Abstract
In this paper we propose a fast and incremental algorithm for learning model trees from data streams (FIMT) for regression problems. The algorithm is incremental, works online, processes examples once at the speed they arrive, and maintains an any-time regression model. The leaves contain linear-models trained online from the examples that fall at that leaf, a process with low complexity. The use of linear models in the leaves increases its any-time global performance. FIMT is able to obtain competitive accuracy with batch learners even for medium size datasets, but with better training time in an order of magnitude. We study the properties of FIMT over several artificial and real datasets and evaluate its sensitivity on the order of examples and the noise level.
2008
Autores
del Campo Avila, J; Ramos Jimenez, G; Gama, J; Morales Bueno, R;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.
2008
Autores
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.