2011
Autores
Suzuki, E; Sebag, M; Ando, S; Balcazar, JL; Billard, A; Bratko, I; Bredeche, N; Gama, J; Grunwald, P; Iba, H; Kersting, K; Peters, J; Washio, T;
Publicação
Proceedings - IEEE International Conference on Data Mining, ICDM
Abstract
2011
Autores
Khan, L; Pechenizkiy, M; Zliobaite, I; Agrawal, C; Bifet, A; Delany, SJ; Dries, A; Fan, W; Gabrys, B; Gama, J; Gao, J; Gopalkrishnan, V; Holmes, G; Katakis, I; Kuncheva, L; Van Leeuwen, M; Masud, M; Menasalvas, E; Minku, L; Pfahringer, B; Polikar, R; Rodrigues, PP; Tsoumakas, G; Tsymbal, A;
Publicação
Proceedings - IEEE International Conference on Data Mining, ICDM
Abstract
2011
Autores
Gama, J; May, M;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
2011
Autores
Carmona Cejudo, JM; Baena Garcia, M; del Campo Avila, J; Bifet, A; Gama, J; Morales Bueno, R;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011
Abstract
Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.
2011
Autores
Gama, J; Rodrigues, PP; Lopes, L;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Nowadays applications produce infinite streams of data distributed across wide sensor networks. In this work we study the problem of continuously maintain a cluster structure over the data points generated by the entire network. Usual techniques operate by forwarding and concentrating the entire data in a central server, processing it as a multivariate stream. In this paper, we propose DGClust, a new distributed algorithm which reduces both the dimensionality and the communication burdens, by allowing each local sensor to keep an online discretization of its data stream, which operates with constant update time and (almost) fixed space. Each new data point triggers a cell in this univariate grid, reflecting the current state of the data stream at the local site. Whenever a local site changes its state, it notifies the central server about the new state it is in. This way, at each point in time, the central site has the global multivariate state of the entire network. To avoid monitoring all possible states, which is exponential in the number of sensors, the central site keeps a small list of counters of the most frequent global states. Finally, a simple adaptive partitional clustering algorithm is applied to the frequent states central points in order to provide an anytime definition of the clusters centers. The approach is evaluated in the context of distributed sensor networks, focusing on three outcomes: loss to real centroids, communication prevention, and processing reduction. The experimental work on synthetic data supports our proposal, presenting robustness to a high number of sensors, and the application to real data from physiological sensors exposes the aforementioned advantages of the system.
2011
Autores
Correa, FE; Oliveira, MDB; Alves, LRA; Gama, J; Correa, PLP;
Publicação
EFITA/WCCA '11
Abstract
Agribusiness, as many other activities, produces huge amounts of spatio-temporal data. We need a system in order to store, analyze, and mine this data. In a previous work, we developed data warehouse tools to store, organize and query Brazilian agribusiness data from several regions along 10 years. In this paper, we go a step ahead, and propose specific data mining techniques to discover marks and evolution patterns from Agribusiness data. We propose the use of Tucker decomposition to automatically detect short time windows that exhibit large changes in the correlation structure between the time-series of prices from the Brazil Grain market.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.