Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Vinagre

2021

Hyperparameter self-tuning for data streams

Autores
Veloso, B; Gama, J; Malheiro, B; Vinagre, J;

Publicação
INFORMATION FUSION

Abstract
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. Therefore, the online processing of these data streams requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their batch-learning counterparts, stream-based learning algorithms require careful hyperparameter settings. However, this problem is exacerbated in online learning settings, especially with the occurrence of concept drifts, which frequently require the reconfiguration of hyperparameters. In this article, we present SSPT, an extension of the Self Parameter Tuning (SPT) optimisation algorithm for data streams. We apply the Nelder-Mead algorithm to dynamically-sized samples, converging to optimal settings in a single pass over data while using a relatively small number of hyperparameter configurations. In addition, our proposal automatically readjusts hyperparameters when concept drift occurs. To assess the effectiveness of SSPT, the algorithm is evaluated with three different machine learning problems: recommendation, regression, and classification. Experiments with well-known data sets show that the proposed algorithm can outperform previous hyperparameter tuning efforts by human experts. Results also show that SSPT converges significantly faster and presents at least similar accuracy when compared with the previous double-pass version of the SPT algorithm.

2021

ORSUM 2021-4th Workshop on Online Recommender Systems and User Modeling

Autores
Vinagre, J; Jorge, AM; Al Ghossein, M; Bifet, A;

Publicação
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021)

Abstract
Modern online services continuously generate data at very fast rates. This continuous flow of data encompasses content - e.g. posts, news, products, comments -, but also user feedback - e.g. ratings, views, reads, clicks -, together with context data - user device, spacial or temporal data, user task or activity, weather. This can be overwhelming for systems and algorithms designed to train in batches, given the continuous and potentially fast change of content, context and user preferences or intents. Therefore, it is important to investigate online methods able to transparently adapt to the inherent dynamics of online services. Incremental models that learn from data streams are gaining attention in the recommender systems community, given their natural ability to deal with the continuous flows of data generated in dynamic, complex environments. User modeling and personalization can particularly benefit from algorithms capable of maintaining models incrementally and online. The objective of this workshop is to foster contributions and bring together a growing community of researchers and practitioners interested in online, adaptive approaches to user modeling, recommendation and personalization, and their implications regarding multiple dimensions, such as evaluation, reproducibility, privacy and explainability.

2022

Preface to the special issue on dynamic recommender systems and user models

Autores
Vinagre, J; Jorge, AM; Al-Ghossein, M; Bifet, A; Cremonesi, P;

Publicação
USER MODELING AND USER-ADAPTED INTERACTION

Abstract
[No abstract available]

2022

Flexible Fine-grained Data Access Management for Hyperledger Fabric

Autores
Parente, J; Alonso, AN; Coelho, F; Vinagre, J; Bastos, P;

Publicação
2022 FOURTH INTERNATIONAL CONFERENCE ON BLOCKCHAIN COMPUTING AND APPLICATIONS (BCCA)

Abstract
As blockchains go beyond cryptocurrencies into applications in multiple industries such as Insurance, Healthcare and Banking, handling personal or sensitive data, data access control becomes increasingly relevant. Access control mechanisms proposed so far are mostly based on requester identity, particularly for permissioned blockchain platforms, and are limited to binary, all-or-nothing access decisions. This is the case with Hyperledger Fabric's native access control mechanisms and, as permission updates require consensus, these fall short regarding the flexibility required to address GDPR-derived policies and client consent management. We propose SDAM, a novel access control mechanism for Fabric that enables fine-grained and dynamic control policies, using both contextual and resource attributes for decisions. Instead of binary results, decisions may also include mandatory data transformations as to conform with the expressed policy, all without modifications to Fabric. Results show that SDAM's overhead w.r.t baseline Fabric is acceptable. The scalability of the approach w.r.t to the number of concurrent clients is also evaluated and found to follow Fabric's.

2023

Privacy-Preserving Machine Learning in Life Insurance Risk Prediction

Autores
Pereira, K; Vinagre, J; Alonso, AN; Coelho, F; Carvalho, M;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II

Abstract
The application of machine learning to insurance risk prediction requires learning from sensitive data. This raises multiple ethical and legal issues. One of the most relevant ones is privacy. However, privacy-preserving methods can potentially hinder the predictive potential of machine learning models. In this paper, we present preliminary experiments with life insurance data using two privacy-preserving techniques: discretization and encryption. Our objective with this work is to assess the impact of such privacy preservation techniques in the accuracy of ML models. We instantiate the problem in three general, but plausible Use Cases involving the prediction of insurance claims within a 1-year horizon. Our preliminary experiments suggest that discretization and encryption have negligible impact in the accuracy of ML models.

2022

Poster: User Sessions on Tor Onion Services: Can Colluding ISPs Deanonymize Them at Scale?

Autores
Lopes, D; Medeiros, P; Dong, JD; Barradas, D; Portela, B; Vinagre, J; Ferreira, B; Christin, N; Santos, N;

Publicação
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022

Abstract
Tor is the most popular anonymity network in the world. It relies on advanced security and obfuscation techniques to ensure the privacy of its users and free access to the Internet. However, the investigation of traffic correlation attacks against Tor Onion Services (OSes) has been relatively overlooked in the literature. In particular, determining whether it is possible to emulate a global passive adversary capable of deanonymizing the IP addresses of both the Tor OSes and of the clients accessing them has remained, so far, an open question. In this paper, we present ongoing work toward addressing this question and reveal some preliminary results on a scalable traffic correlation attack that can potentially be used to deanonymize Tor OS sessions. Our attack is based on a distributed architecture involving a group of colluding ISPs from across the world. After collecting Tor traffic samples at multiple vantage points, ISPs can run them through a pipeline where several stages of traffic classifiers employ complementary techniques that result in the deanonymization of OS sessions with high confidence (i.e., low false positives). We have responsibly disclosed our early results with the Tor Project team and are currently working not only on improving the effectiveness of our attack but also on developing countermeasures to preserve Tor users' privacy.

  • 5
  • 10