2022
Authors
Pereira, K; Vinagre, J; Alonso, AN; Coelho, F; Carvalho, M;
Publication
Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part II
Abstract
The application of machine learning to insurance risk prediction requires learning from sensitive data. This raises multiple ethical and legal issues. One of the most relevant ones is privacy. However, privacy-preserving methods can potentially hinder the predictive potential of machine learning models. In this paper, we present preliminary experiments with life insurance data using two privacy-preserving techniques: discretization and encryption. Our objective with this work is to assess the impact of such privacy preservation techniques in the accuracy of ML models. We instantiate the problem in three general, but plausible Use Cases involving the prediction of insurance claims within a 1-year horizon. Our preliminary experiments suggest that discretization and encryption have negligible impact in the accuracy of ML models. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
2023
Authors
Silva, PR; Vinagre, J; Gama, J;
Publication
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023
Abstract
Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.
2022
Authors
Vinagre, J; Ghossein, MA; Jorge, AM; Bifet, A; Peska, L;
Publication
ORSUM@RecSys
Abstract
2016
Authors
Moniz, Nuno; Torgo, Luis; Vinagre, Joao;
Publication
CoRR
Abstract
2021
Authors
Costa, P; Cerqueira, V; Vinagre, J;
Publication
CoRR
Abstract
2023
Authors
Silva, PR; Vinagre, J; Gama, J;
Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Federated learning (FL) is a collaborative, decentralized privacy-preserving method to attach the challenges of storing data and data privacy. Artificial intelligence, machine learning, smart devices, and deep learning have strongly marked the last years. Two challenges arose in data science as a result. First, the regulation protected the data by creating the General Data Protection Regulation, in which organizations are not allowed to keep or transfer data without the owner's authorization. Another challenge is the large volume of data generated in the era of big data, and keeping that data in one only server becomes increasingly tricky. Therefore, the data is allocated into different locations or generated by devices, creating the need to build models or perform calculations without transferring data to a single location. The new term FL emerged as a sub-area of machine learning that aims to solve the challenge of making distributed models with privacy considerations. This survey starts by describing relevant concepts, definitions, and methods, followed by an in-depth investigation of federated model evaluation. Finally, we discuss three promising applications for further research: anomaly detection, distributed data streams, and graph representation.This article is categorized under:Technologies > Machine LearningTechnologies > Artificial Intelligence
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.