Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Vinagre

2018

Online bagging for recommender systems

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
EXPERT SYSTEMS

Abstract
Ensemble methods have been successfully used in the past to improve recommender systems; however, they have never been studied with incremental recommendation algorithms. Many online recommender systems deal with continuous, potentially fast, and unbounded flows of databig data streamsand often need to be responsive to fresh user feedback, adjusting recommendations accordingly. This is clear in tasks such as social network feeds, news recommender systems, automatic playlist completion, and other similar applications. Batch ensemble approaches are not suitable to perform continuous learning, given the complexity of retraining new models on demand. In this paper, we adapt a general purpose online bagging algorithm for top-N recommendation tasks and propose two novel online bagging methods specifically tailored for recommender systems. We evaluate the three approaches, using an incremental matrix factorization algorithm for top-N recommendation with positive-only user feedback data as the base model. Our results show that online bagging is able to improve accuracy up to 55% over the baseline, with manageable computational overhead.

2018

Online Gradient Boosting for Incremental Recommender Systems

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
Discovery Science - 21st International Conference, DS 2018, Limassol, Cyprus, October 29-31, 2018, Proceedings

Abstract
Ensemble models have been proven successful for batch recommendation algorithms, however they have not been well studied in streaming applications. Such applications typically use incremental learning, to which standard ensemble techniques are not trivially applicable. In this paper, we study the application of three variants of online gradient boosting to top-N recommendation tasks with implicit data, in a streaming data environment. Weak models are built using a simple incremental matrix factorization algorithm for implicit feedback. Our results show a significant improvement of up to 40% over the baseline standalone model. We also show that the overhead of running multiple weak models is easily manageable in stream-based applications. © 2018, Springer Nature Switzerland AG.

2018

Self Hyper-parameter Tuning for Stream Recommendation Algorithms

Authors
Veloso, B; Gama, J; Malheiro, B; Vinagre, J;

Publication
ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers

Abstract
E-commerce platforms explore the interaction between users and digital content – user generated streams of events – to build and maintain dynamic user preference models which are used to make mean-ingful recommendations. However, the accuracy of these incremental models is critically affected by the choice of hyper-parameters. So far, the incremental recommendation algorithms used to process data streams rely on human expertise for hyper-parameter tuning. In this work we apply our Self Hyper-Parameter Tuning (SPT) algorithm to incremental recommendation algorithms. SPT adapts the Melder-Mead optimi-sation algorithm to perform hyper-parameter tuning. First, it creates three models with random hyper-parameter values and, then, at dynamic size intervals, assesses and applies the Melder-Mead operators to update their hyper-parameters until the models converge. The main contribu-tion of this work is the adaptation of the SPT method to incremental matrix factorisation recommendation algorithms. The proposed method was evaluated with well-known recommendation data sets. The results show that SPT systematically improves data stream recommendations.

2019

ORSUM 2019 2nd Workshop on Online Recommender Systems and User Modeling

Authors
Vinagre, J; Jorge, AM; Bifet, A; Al Ghossein, M;

Publication
RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS

Abstract
The ever-growing nature of user generated data in online systems poses obvious challenges on how we process such data. Typically, this issue is regarded as a scalability problem and has been mainly addressed with distributed algorithms able to train on massive amounts of data in short time windows. However, data is inevitably adding up at high speeds. Eventually one needs to discard or archive some of it. Moreover, the dynamic nature of data in user modeling and recommender systems, such as change of user preferences, and the continuous introduction of new users and items make it increasingly difficult to maintain up-to-date, accurate recommendation models. The objective of this workshop is to bring together researchers and practitioners interested in incremental and adaptive approaches to stream-based user modeling, recommendation and personalization, including algorithms, evaluation issues, incremental content and context mining, privacy and transparency, temporal recommendation or software frameworks for continuous learning.

2021

Statistically Robust Evaluation of Stream-Based Recommender Systems

Authors
Vinagre, J; Jorge, AM; Rocha, C; Gama, J;

Publication
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
Online incremental models for recommendation are nowadays pervasive in both the industry and the academia. However, there is not yet a standard evaluation methodology for the algorithms that maintain such models. Moreover, online evaluation methodologies available in the literature generally fall short on the statistical validation of results, since this validation is not trivially applicable to stream-based algorithms. We propose a k-fold validation framework for the pairwise comparison of recommendation algorithms that learn from user feedback streams, using prequential evaluation. Our proposal enables continuous statistical testing on adaptive-size sliding windows over the outcome of the prequential process, allowing practitioners and researchers to make decisions in real time based on solid statistical evidence. We present a set of experiments to gain insights on the sensitivity and robustness of two statistical tests-McNemar's and Wilcoxon signed rank-in a streaming data environment. Our results show that besides allowing a real-time, fine-grained online assessment, the online versions of the statistical tests are at least as robust as the batch versions, and definitely more robust than a simple prequential single-fold approach.

2021

A Hybrid Recommender System for Improving Automatic Playlist Continuation

Authors
Gatzioura, A; Vinagre, J; Jorge, AM; Sanchez Marre, M;

Publication
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
Although widely used, the majority of current music recommender systems still focus on recommendations' accuracy, user preferences and isolated item characteristics, without evaluating other important factors, like the joint item selections and the recommendation moment. However, when it comes to playlist recommendations, additional dimensions, as well as the notion of user experience and perception, should be taken into account to improve recommendations' quality. In this work, HybA, a hybrid recommender system for automatic playlist continuation, that combines Latent Dirichlet Allocation and Case-Based Reasoning, is proposed. This system aims to address "similar concepts" rather than similar users. More than generating a playlist based on user requirements, like automatic playlist generation methods, HybA identifies the semantic characteristics of a started playlist and reuses the most similar past ones, to recommend relevant playlist continuations. In addition, support to beyond accuracy dimensions, like increased coherence or diverse items' discovery, is provided. To overcome the semantic gap between music descriptions and user preferences, identify playlist structures and capture songs' similarity, a graph model is used. Experiments on real datasets have shown that the proposed algorithm is able to outperform other state of the art techniques, in terms of accuracy, while balancing between diversity and coherence.

  • 3
  • 10