2018
Authors
Moulton, RH; Viktor, HL; Japkowicz, N; Gama, J;
Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I
Abstract
Clustering naturally addresses many of the challenges of data streams and many data stream clustering algorithms (DSCAs) have been proposed. The literature does not, however, provide quantitative descriptions of how these algorithms behave in different circumstances. In this paper we study how the clusterings produced by different DSCAs change, relative to the ground truth, as quantitatively different types of concept drift are encountered. This paper makes two contributions to the literature. First, we propose a method for generating real-valued data streams with precise quantitative concept drift. Second, we conduct an experimental study to provide quantitative analyses of DSCA performance with synthetic real-valued data streams and show how to apply this knowledge to real world data streams. We find that large magnitude and short duration concept drifts are most challenging and that DSCAs with partitioning-based offline clustering methods are generally more robust than those with density-based offline clustering methods. Our results further indicate that increasing the number of classes present in a stream is a more challenging environment than decreasing the number of classes. Code related to this paper is available at: https://doi.org/10.5281/zenodo.1168699, https://doi.org/10.5281/zenodo.1216189, https://doi.org/10.5281/zenodo.1213802, https://doi.org/10.5281/zenodo.1304380. © Springer Nature Switzerland AG 2019.
2018
Authors
Washio, T; Gama, J; Li, Y; Parekh, R; Liu, H; Bifet, A; De Veaux, RD;
Publication
Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
Abstract
2018
Authors
Gama, J;
Publication
MATEC Web of Conferences
Abstract
2018
Authors
Li, X; Gama, J; Chen, B; Chen, S; Wang, S; Zhu, XH;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2018
Authors
Zgraja, J; Gama, J; Wozniak, M;
Publication
ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers
Abstract
Usually, during data stream classifier learning, we assume that labels of all incoming examples are available without any delay and they are used to update employing predictive model. Unfortunately, this assumption about access to all class labels is naive and it requires relatively high budget for labeling. It causes that methods which can train data stream classifiers on the basis of partially labeled data are highly desirable. Among them, active learning [1] seems to be a promising direction, which focuses on selecting only the most valuable learning examples to be labeled and used to produce an accurate predictive model. However, designing such a system we have to ensure that a cho-sen active learning strategy is able to handle changes in data distribution and quickly adapt to changing data distribution. In this work, we focus on novel active learning strategies that are designed for effective tackling of such changes. We propose a novel active data stream classifier learning method based on query by clustering approach. Experimental evaluation of the proposed methods prove the usefulness of the proposed approach for reducing labeling cost for classifier of drifting data streams.
2018
Authors
Bifet, A; Carvalho, A; Gama, J;
Publication
Proceedings of the ACM Symposium on Applied Computing
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.