Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2024

A Distributed Computing Solution for Privacy-Preserving Genome-Wide Association Studies

Autores
Brito, C; Ferreira, P; Paulo, J;

Publicação

Abstract
AbstractBreakthroughs in sequencing technologies led to an exponential growth of genomic data, providing unprecedented biological in-sights and new therapeutic applications. However, analyzing such large amounts of sensitive data raises key concerns regarding data privacy, specifically when the information is outsourced to third-party infrastructures for data storage and processing (e.g., cloud computing). Current solutions for data privacy protection resort to centralized designs or cryptographic primitives that impose considerable computational overheads, limiting their applicability to large-scale genomic analysis.We introduce Gyosa, a secure and privacy-preserving distributed genomic analysis solution. Unlike in previous work, Gyosafollows a distributed processing design that enables handling larger amounts of genomic data in a scalable and efficient fashion. Further, by leveraging trusted execution environments (TEEs), namely Intel SGX, Gyosaallows users to confidentially delegate their GWAS analysis to untrusted third-party infrastructures. To overcome the memory limitations of SGX, we implement a computation partitioning scheme within Gyosa. This scheme reduces the number of operations done inside the TEEs while safeguarding the users’ genomic data privacy. By integrating this security scheme inGlow, Gyosaprovides a secure and distributed environment that facilitates diverse GWAS studies. The experimental evaluation validates the applicability and scalability of Gyosa, reinforcing its ability to provide enhanced security guarantees. Further, the results show that, by distributing GWASes computations, one can achieve a practical and usable privacy-preserving solution.

2024

Integration of multi-modal datasets to estimate human aging

Autores
Ribeiro, R; Moraes, A; Moreno, M; Ferreira, PG;

Publicação
MACHINE LEARNING

Abstract
Aging involves complex biological processes leading to the decline of living organisms. As population lifespan increases worldwide, the importance of identifying factors underlying healthy aging has become critical. Integration of multi-modal datasets is a powerful approach for the analysis of complex biological systems, with the potential to uncover novel aging biomarkers. In this study, we leveraged publicly available epigenomic, transcriptomic and telomere length data along with histological images from the Genotype-Tissue Expression project to build tissue-specific regression models for age prediction. Using data from two tissues, lung and ovary, we aimed to compare model performance across data modalities, as well as to assess the improvement resulting from integrating multiple data types. Our results demostrate that methylation outperformed the other data modalities, with a mean absolute error of 3.36 and 4.36 in the test sets for lung and ovary, respectively. These models achieved lower error rates when compared with established state-of-the-art tissue-agnostic methylation models, emphasizing the importance of a tissue-specific approach. Additionally, this work has shown how the application of Hierarchical Image Pyramid Transformers for feature extraction significantly enhances age modeling using histological images. Finally, we evaluated the benefits of integrating multiple data modalities into a single model. Combining methylation data with other data modalities only marginally improved performance likely due to the limited number of available samples. Combining gene expression with histological features yielded more accurate age predictions compared with the individual performance of these data types. Given these results, this study shows how machine learning applications can be extended to/in multi-modal aging research. Code used is available at https://github.com/zroger49/multi_modal_age_prediction.

2024

The molecular impact of cigarette smoking resembles aging across tissues

Autores
Ramirez, JM; Ribeiro, R; Soldatkina, O; Moraes, A; García-Pérez, R; Ferreira, PG; Melé, M;

Publicação

Abstract
AbstractTobacco smoke is the main cause of preventable mortality worldwide. Smoking increases the risk of developing many diseases and has been proposed as an aging accelerator. Yet, the molecular mechanisms driving smoking-related health decline and aging acceleration in most tissues remain unexplored. Here, we characterize gene expression, alternative splicing, DNA methylation and histological alterations induced by cigarette smoking across human tissues. We show that smoking impacts tissue architecture and triggers systemic inflammation. We find that in many tissues, the effects of smoking significantly overlap those of aging in the same direction. Specifically, both age and smoking upregulate inflammatory genes and drive hypomethylation at enhancers. In addition, we observe widespread smoking-driven hypermethylation at target regions of the Polycomb repressive complex, which is a well-known aging effect. Smoking-induced epigenetic changes overlap causal aging CpGs, suggesting that these methylation changes may directly mediate aging acceleration observed in smokers. Finally, we find that smoking effects that are shared with aging are more persistent over time. Overall, our multi-tissue and multi-omic analysis of the effects of cigarette smoking provides an extensive characterization of the impact of tobacco smoke across tissues and unravels the molecular mechanisms driving smoking-induced tissue homeostasis decline and aging acceleration.

2024

APAtizer: a tool for alternative polyadenylation analysis of RNA-Seq data

Autores
Sousa, B; Bessa, M; de Mendonca, FL; Ferreira, PG; Moreira, A; Pereira-Castro, I;

Publicação
BIOINFORMATICS

Abstract
APAtizer is a tool designed to analyze alternative polyadenylation events on RNA-sequencing data. The tool handles different file formats, including BAM, htseq, and DaPars bedGraph files. It provides a user-friendly interface that allows users to generate informative visualizations, including Volcano plots, heatmaps, and gene lists. These outputs allow the user to retrieve useful biological insights such as the occurrence of polyadenylation events when comparing two biological conditions. In addition, it can perform differential gene expression, gene ontology analysis, visualization of Venn diagram intersections, and correlation analysis.

2024

Identification of Participants of Narratives Using Knowledge Bases

Autores
Juliana Machado; Evelin Amorim;

Publicação
Anais do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD 2024)

Abstract
Identifying participants in narratives is important to understand and extract meaning from unstructured texts. This paper investigates the use of DBpedia and Wikifier for this task. We tested these two knowledge base platforms to evaluate their performance in recognizing and extracting entities in Portuguese-language journalistic narrative texts. The results show that both DBpedia and Wikifier present similar results in identifying participants, around 0.40 in the f1-score. The objective of this paper is to study the potential of knowledge bases to improve the understanding of narratives, in addition to suggesting directions for future research in this domain.

2024

ISO 24617-8 Applied: Insights from Multilingual Discourse Relations Annotation in English, Polish, and Portuguese

Autores
Tomaszewska, A; Silvano, P; Leal, A; Amorim, E;

Publicação
ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings

Abstract
The main objective of this study is to contribute to multilingual discourse research by employing ISO-24617 Part 8 (Semantic Relations in Discourse, Core Annotation Schema – DR-core) for annotating discourse relations. Centering around a parallel discourse relations corpus that includes English, Polish, and European Portuguese, we initiate one of the few ISO-based comparative analyses through a multilingual corpus that aligns discourse relations across these languages. In this paper, we discuss the project’s contributions, including the annotated corpus, research findings, and statistics related to the use of discourse relations. The paper further discusses the challenges encountered in complying with the ISO standard, such as defining the scope of arguments and annotating specific relation types like Expansion. Our findings highlight the necessity for clearer definitions of certain discourse relations and more precise guidelines for argument spans, especially concerning the inclusion of connectives. Additionally, the study underscores the importance of ongoing collaborative efforts to broaden the inclusion of languages and more comprehensive datasets, with the objective of widening the reach of ISO-guided multilingual discourse research. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

  • 30
  • 469