Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2017

Mobility Mining Using Nonnegative Tensor Factorization

Autores
Nosratabadi, HE; Fanaee T, H; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Mobility mining has lots of applications in urban planning and transportation systems. In particular, extracting mobility patterns enables service providers to have a global insight about the mobility behaviors which consequently leads to providing better services to the citizens. In the recent years several data mining techniques have been presented to tackle this problem. These methods usually are either spatial extension of temporal methods or temporal extension of spatial methods. However, still a framework that can keep the natural structure of mobility data has not been considered. Non-negative tensor factorizations (NNTF) have shown great applications in topic modelling and pattern recognition. However, unfortunately their usefulness in mobility mining is less explored. In this paper we propose a new mobility pattern mining framework based on a recent non-negative tensor model called BetaNTF. We also present a new approach based on interpretability concept for determination of number of components in the tensor rank selection process. We later demonstrate some meaningful mobility patterns extracted with the proposed method from bike sharing network mobility data in Boston, USA.

2017

WCDS: A Two-Phase Weightless Neural System for Data Stream Clustering

Autores
Cardoso, DO; Franca, FMG; Gama, J;

Publicação
NEW GENERATION COMPUTING

Abstract
Clustering is a powerful and versatile tool for knowledge discovery, able to provide a valuable information for data analysis in various domains. To perform this task based on streaming data is quite challenging: outdated knowledge needs to be disposed while the current knowledge is obtained from fresh data; since data are continuously flowing, strict efficiency constraints have to be met. This paper presents WCDS, an approach to this problem based on the WiSARD artificial neural network model. This model already had useful characteristics as inherent incremental learning capability and patent functioning speed. These were combined with novel features as an adaptive countermeasure to cluster imbalance, a mechanism to discard expired data, and offline clustering based on a pairwise similarity measure for WiSARD discriminators. In an insightful experimental evaluation, the proposed system had an excellent performance according to multiple quality standards. This supports its applicability for the analysis of data streams.

2017

Weightless neural networks for open set recognition

Autores
Cardoso, DO; Gama, J; Franca, FMG;

Publicação
MACHINE LEARNING

Abstract
Open set recognition is a classification-like task. It is accomplished not only by the identification of observations which belong to targeted classes (i.e., the classes among those represented in the training sample which should be later recognized) but also by the rejection of inputs from other classes in the problem domain. The need for proper handling of elements of classes beyond those of interest is frequently ignored, even in works found in the literature. This leads to the improper development of learning systems, which may obtain misleading results when evaluated in their test beds, consequently failing to keep the performance level while facing some real challenge. The adaptation of a classifier for open set recognition is not always possible: the probabilistic premises most of them are built upon are not valid in a open-set setting. Still, this paper details how this was realized for WiSARD a weightless artificial neural network model. Such achievement was based on an elaborate distance-like computation this model provides and the definition of rejection thresholds during training. The proposed methodology was tested through a collection of experiments, with distinct backgrounds and goals. The results obtained confirm the usefulness of this tool for open set recognition.

2017

Fading histograms in detecting distribution and concept changes

Autores
Sebastião, R; Gama, J; Mendonça, T;

Publicação
I. J. Data Science and Analytics

Abstract
The remarkable number of real applications under dynamic scenarios is driving a novel ability to generate and gatherinformation.Nowadays,amassiveamountofinforma- tion is generated at a high-speed rate, known as data streams. Moreover, data are collected under evolving environments. Due to memory restrictions, data must be promptly processed and discarded immediately. Therefore, dealing with evolving data streams raises two main questions: (i) how to remember discarded data? and (ii) how to forget outdated data? To main- tain an updated representation of the time-evolving data, this paper proposes fading histograms. Regarding the dynamics of nature, changes in data are detected through a windowing scheme that compares data distributions computed by the fading histograms: the adaptive cumulative windows model (ACWM). The online monitoring of the distance between data distributions is evaluated using a dissimilarity measure based on the asymmetry of the Kullback–Leibler divergence.The experimental results support the ability of fading his- tograms in providing an updated representation of data. Such property works in favor of detecting distribution changes with smaller detection delay time when compared with stan- dard histograms. With respect to the detection of concept changes, the ACWM is compared with 3 known algorithms taken from the literature, using artificial data and using pub- lic data sets, presenting better results. Furthermore, we the proposed method was extended for multidimensional and the experiments performed show the ability of the ACWM for detecting distribution changes in these settings.

2017

The Initialization and Parameter Setting Problem in Tensor Decomposition-Based Link Prediction

Autores
Silva Fernandes, Sd; Tork, HF; da Gama, JMP;

Publicação
2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19-21, 2017

Abstract
Link prediction is the task of social network analysis whose goal is to predict the links that will appear in the network in future instants. Among the link predictors exploiting the time evolution of the networks, we can find the tensor decomposition-based methods. A major limitation of these methods is the lack of appropriate approaches for estimating their parameters and initialization. In this paper, we address this problem by proposing a parameter setting method. Our proposed approach resorts to optimization techniques to drive the search for an adequate parameter and initialization choice. © 2017 IEEE.

2017

An evolutionary algorithm for clustering data streams with a variable number of clusters

Autores
Silva, JD; Hruschka, ER; Gama, J;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.

  • 248
  • 503