Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2011

Preface

Autores
Suzuki, E; Sebag, M; Ando, S; Balcazar, JL; Billard, A; Bratko, I; Bredeche, N; Gama, J; Grunwald, P; Iba, H; Kersting, K; Peters, J; Washio, T;

Publicação
Proceedings - IEEE International Conference on Data Mining, ICDM

Abstract

2011

Preface

Autores
Khan, L; Pechenizkiy, M; Zliobaite, I; Agrawal, C; Bifet, A; Delany, SJ; Dries, A; Fan, W; Gabrys, B; Gama, J; Gao, J; Gopalkrishnan, V; Holmes, G; Katakis, I; Kuncheva, L; Van Leeuwen, M; Masud, M; Menasalvas, E; Minku, L; Pfahringer, B; Polikar, R; Rodrigues, PP; Tsoumakas, G; Tsymbal, A;

Publicação
Proceedings - IEEE International Conference on Data Mining, ICDM

Abstract

2011

Ubiquitous Knowledge Discovery Introduction

Autores
Gama, J; May, M;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

2011

Online Evaluation of Email Streaming Classifiers Using GNUsmail

Autores
Carmona Cejudo, JM; Baena Garcia, M; del Campo Avila, J; Bifet, A; Gama, J; Morales Bueno, R;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011

Abstract
Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.

2011

Clustering distributed sensor data streams using local processing and reduced communication

Autores
Gama, J; Rodrigues, PP; Lopes, L;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Nowadays applications produce infinite streams of data distributed across wide sensor networks. In this work we study the problem of continuously maintain a cluster structure over the data points generated by the entire network. Usual techniques operate by forwarding and concentrating the entire data in a central server, processing it as a multivariate stream. In this paper, we propose DGClust, a new distributed algorithm which reduces both the dimensionality and the communication burdens, by allowing each local sensor to keep an online discretization of its data stream, which operates with constant update time and (almost) fixed space. Each new data point triggers a cell in this univariate grid, reflecting the current state of the data stream at the local site. Whenever a local site changes its state, it notifies the central server about the new state it is in. This way, at each point in time, the central site has the global multivariate state of the entire network. To avoid monitoring all possible states, which is exponential in the number of sensors, the central site keeps a small list of counters of the most frequent global states. Finally, a simple adaptive partitional clustering algorithm is applied to the frequent states central points in order to provide an anytime definition of the clusters centers. The approach is evaluated in the context of distributed sensor networks, focusing on three outcomes: loss to real centroids, communication prevention, and processing reduction. The experimental work on synthetic data supports our proposal, presenting robustness to a high number of sensors, and the application to real data from physiological sensors exposes the aforementioned advantages of the system.

2011

Data Mining Applied on Grain Data Mart

Autores
Correa, FE; Oliveira, MDB; Alves, LRA; Gama, J; Correa, PLP;

Publicação
EFITA/WCCA '11

Abstract
Agribusiness, as many other activities, produces huge amounts of spatio-temporal data. We need a system in order to store, analyze, and mine this data. In a previous work, we developed data warehouse tools to store, organize and query Brazilian agribusiness data from several regions along 10 years. In this paper, we go a step ahead, and propose specific data mining techniques to discover marks and evolution patterns from Agribusiness data. We propose the use of Tucker decomposition to automatically detect short time windows that exhibit large changes in the correlation structure between the time-series of prices from the Brazil Grain market.

  • 402
  • 516