2019
Authors
Li, G; Yang, J; Gama, J; Natwichai, J; Tong, Y;
Publication
DASFAA (2)
Abstract
2019
Authors
Li, G; Yang, J; Gama, J; Natwichai, J; Tong, Y;
Publication
DASFAA Workshops
Abstract
2019
Authors
Veloso, B; Martins, C; Espanha, R; Azevedo, R; Gama, J;
Publication
BigMine@KDD
Abstract
The high asymmetry of international termination rates, where calls are charged with higher values, are fertile ground for the appearance of frauds in Telecom Companies. In this paper, we present a solution for a real problem called Interconnect Bypass Fraud. This problem is one of the most expressive in the telecommunication domain and can be detected by the occurrence of burst of calls from specific numbers. Based on this assumption, we propose the adoption of a new fast forgetting technique that works together with the Lossy Counting algorithm. Our goal is to detect as soon as possible items with abnormal behaviours, e.g. bursts of calls, repetitions and mirror behaviours. The results shows that our technique not only complements the techniques used by the telecom company but also improves the performance of the Lossy Counting algorithm in terms of runtime, memory used and sensibility to detect the abnormal behaviours.
2019
Authors
Gomes, HM; Read, J; Bifet, A; Barddal, JP; Gama, J;
Publication
SIGKDD Explor.
Abstract
2020
Authors
Bifet, A; Berlingerio, M; Gama, J; Read, J; Nogueira, AR;
Publication
BigMine@KDD
Abstract
2020
Authors
Aminian, E; Ribeiro, RP; Gama, J;
Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II
Abstract
Data are growing fast in today's world and great portion of that is in the form of stream. In many situations, data streams are imbalanced making it difficult to use with classical data mining methods. However, mining these special kinds of streams is one of the most attractive research area. In this paper, we propose two algorithms for learning from imbalanced regression data streams. Both methods are based on Chebychev's inequality but in a different way. The first method, under-samples from the frequent target value examples while the second method over-samples the rare and extreme target value examples. This way, the learner will focus in the rare and more difficult cases. We applied our methods to train regression models using two benchmark datasets and two well-known regression algorithms: Perceptron and FIMT-DD. Our obtained results from the simulations indicate the usefulness of our proposed methods.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.