2017
Autores
Silva Fernandes, Sd; Tork, HF; da Gama, JMP;
Publicação
2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19-21, 2017
Abstract
Link prediction is the task of social network analysis whose goal is to predict the links that will appear in the network in future instants. Among the link predictors exploiting the time evolution of the networks, we can find the tensor decomposition-based methods. A major limitation of these methods is the lack of appropriate approaches for estimating their parameters and initialization. In this paper, we address this problem by proposing a parameter setting method. Our proposed approach resorts to optimization techniques to drive the search for an adequate parameter and initialization choice. © 2017 IEEE.
2017
Autores
Silva, JD; Hruschka, ER; Gama, J;
Publicação
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.
2017
Autores
Krawczyk, B; Minku, LL; Gama, J; Stefanowski, J; Wozniak, M;
Publicação
INFORMATION FUSION
Abstract
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for 'non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research. Published by Elsevier B.V.
2017
Autores
Gavaldà, Ricard; Zliobaite, Indre; Gama, Joao;
Publicação
SoGood@ECML-PKDD
Abstract
2017
Autores
Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)
Abstract
Emerging markets contain the vast majority of the world's population. Despite the huge number of inhabitants, these markets still lack a proper finance infrastructure. One of the main difficulties felt by customers is the access to loans. This limitation arises from the fact that most customers usually lack a verifiable credit history. As such, traditional banks are unable to provide loans. This paper proposes credit scoring modeling based on non-traditional data, acquired from smartphones, for loan classification processes. We use Logistic Regression (LR) and Support Vector Machine (SVM) models which are the top performers in traditional banking. Then we compared the transformation of the training datasets creating boolean indicators against recoding using Weight of Evidence (WoE). Our models surpassed the performance of the manual loan application selection process, loans granted through the models criteria presented fewer overdues, also the approval criteria of the models increased the amount of granted loans substantially. Compared to the baseline, the loans approved by meeting the criteria of the SVM model presented -196.80% overdue rate. At the same time, the approval criteria of the SVM model generated 251.53% more loans. This paper shows that credit scoring can be useful in emerging markets. The non-traditional data can be used to build algorithms that can identify good borrowers as in traditional banking.
2017
Autores
Oliveira, Eugenio; Gama, Joao; Vale, ZitaA.; Cardoso, HenriqueLopes;
Publicação
EPIA
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.