Publications

Publications by LIAAD

2023

Privacy-Preserving Machine Learning on Apache Spark

Authors
Brito, CV; Ferreira, PG; Portela, BL; Oliveira, RC; Paulo, JT;

Publication
IEEE ACCESS

Abstract
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

CloseRead Abstract

2023

Mapeamento do Perfil das Mulheres Brasileiras em Processamento de Linguagem Natural

Authors
Helena Caseli; Evelin Amorim; Elisa Terumi Rubel Schneider; Leidiana Iza Andrade Freitas; Jéssica Rodrigues; Maria das Graças V. Nunes;

Publication
Anais do XVII Women in Information Technology (WIT 2023)

Abstract
Conhecer o perfil das mulheres brasileiras que atuam em Processamento de Linguagem Natural (PLN) é um importante passo para o desenvolvimento de políticas e programas que visem aumentar a inclusão e a diversidade nessa área. Este é o primeiro trabalho realizado no Brasil com este fim. A partir de dados coletados via consulta pública, Lattes e Linkedin, notou-se que o perfil é de uma formação em computação ou linguística, atuando em empresas ou universidades, mas com pouca diversidade étnica e aparente dificuldade em conciliar vida profissional e maternidade. Analisando mais especificamente o grupo “Brasileiras em PLN” constatou-se uma expressiva capacidade de publicação e orientação, mas ainda uma baixa colaboração entre nossas integrantes.

CloseRead Abstract

2023

Time Series of Counts under Censoring: A Bayesian Approach

Authors
Silva, I; Silva, ME; Pereira, I; McCabe, B;

Publication
ENTROPY

Abstract
Censored data are frequently found in diverse fields including environmental monitoring, medicine, economics and social sciences. Censoring occurs when observations are available only for a restricted range, e.g., due to a detection limit. Ignoring censoring produces biased estimates and unreliable statistical inference. The aim of this work is to contribute to the modelling of time series of counts under censoring using convolution closed infinitely divisible (CCID) models. The emphasis is on estimation and inference problems, using Bayesian approaches with Approximate Bayesian Computation (ABC) and Gibbs sampler with Data Augmentation (GDA) algorithms.

CloseRead Abstract

2023

Automatic characterisation of Dansgaard-Oeschger events in palaeoclimate ice records

Authors
Barbosa, S; Silva, ME; Dias, N; Rousseau, D;

Publication

Abstract
Greenland ice core records display abrupt transitions, designated as Dansgaard-Oeschger (DO) events, characterised by episodes of rapid warming (typically decades) followed by a slower cooling. The identification of abrupt transitions is hindered by the typical low resolution and small size of paleoclimate records, and their significant temporal variability. Furthermore, the amplitude and duration of the DO events varies substantially along the last glacial period, which further hinders the objective identification of abrupt transitions from ice core records Automatic, purely data-driven methods, have the potential to foster the identification of abrupt transitions in palaeoclimate time series in an objective way, complementing the traditional identification of transitions by visual inspection of the time series.In this study we apply an algorithmic time series method, the Matrix Profile approach, to the analysis of the NGRIP Greenland ice core record, focusing on:- the ability of the method to retrieve in an automatic way abrupt transitions, by comparing the anomalies identified by the matrix profile method with the expert-based identification of DO events;- the characterisation of DO events, by classifying DO events in terms of shape and identifying events with similar warming/cooling temporal patternThe results for the NGRIP time series show that the matrix profile approach struggles to retrieve all the abrupt transitions that are identified by experts as DO events, the main limitation arising from the diversity in length of DO events and the method’s dependence on fixed-size sub-sequences within the time series. However, the matrix profile method is able to characterise the similarity of shape patterns between DO events in an objective and consistent way.

CloseRead Abstract

2023

Zero-shot Classification at Different Levels of Granularity

Authors
Molina, M;

Publication
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Abstract

2023

Whispered speech segmentation based on Deep Learning

Authors
Nunes, Gonçalo Duarte;

Publication

Abstract