Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2010

Clustering data streams with weightless neural networks

Autores
Cardoso, DO; Lima, PMV; De Gregorio, M; Gama, J; Franca, FMG;

Publicação
ESANN 2011 proceedings, 19th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

Abstract
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

2010

Monitoring Incremental Histogram Distribution for Change Detection in Data Streams

Autores
Sebastiao, R; Gama, J; Rodrigues, PP; Bernardes, J;

Publicação
KNOWLEDGE DISCOVERY FROM SENSOR DATA

Abstract
Histograms are a common technique for density estimation and they have been widely used as a tool in exploratory data analysis. Learning histograms from static and stationary data is a well known topic. Nevertheless, very few works discuss this problem when we have a continuous flow of data generated from dynamic environments. The scope of this paper is to detect changes from high-speed time-changing data streams. To address this problem, we construct histograms able to process examples once at the rate they arrive. The main goal of this work is continuously maintain a histogram consistent with the current status of the nature. We study strategies to detect changes in the distribution generating examples, and adapt the histogram to the most recent data by forgetting outdated data. We use the Partition Incremental Discretization algorithm that was designed to learn histograms from high-speed data streams. We present a method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference window, reflecting the distribution observed in the past; and the current window which receives the most recent data. The current window is cumulative and can have a fixed or an adaptive step depending on the distance between distributions. We compared both distributions using Kullback-Leibler divergence, defining a threshold for change detection decision based on the asymmetry of this measure. We evaluated our algorithm with controlled artificial data sets and compare the proposed approach with nonparametric tests. We also present results with real word data sets from industrial and medical domains. Those results suggest that an adaptive window's step exhibit high probability in change detection and faster detection rates, with few false positives alarms.

2010

Knowledge discovery from sensor data (SensorKDD)

Autores
Chandola, V; Omitaomu, OA; Ganguly, AR; Vatsavai, RR; Chawla, NV; Gama, J; Gaber, MM;

Publicação
SIGKDD Explorations

Abstract

2010

The next generation of transportation systems, greenhouse emissions, and data mining

Autores
Kargupta, H; Gama, J; Fan, W;

Publicação
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract

2010

Sequential Pattern Mining in Multi-relational Datasets

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multi-sequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.

2010

Bipartite Graphs for Monitoring Clusters Transitions

Autores
Oliveira, M; Gama, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS IX, PROCEEDINGS

Abstract
The study of evolution has become an important research issue; especially in the last decade, due to a greater awareness of our world's volatility. As a consequence, a new paradigm has emerged to respond more effectively to a elms of new problems in Data Mining. In this paper we address the problem of monitoring the evolution of clusters and propose the MClusT framework, which was developed along the lines of this new Change Mining paradigm. MClusT includes a taxonomy of transitions, a tracking method based in Graph Theory; and a transition detection algorithm. To demonstrate its feasibility and applicability we present; real world case studies, using datasets extracted from Banco de Portugal and the Portuguese Institute of Statistics. We also test our approach in a benchmark dataset from TSDL. The results are encouraging and demonstrate the ability of MClusT framework to provide an efficient diagnosis of clusters transitions.

  • 424
  • 516