Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Gabriel Ferreira

2024

A Distributed Computing Solution for Privacy-Preserving Genome-Wide Association Studies

Authors
Brito, C; Ferreira, P; Paulo, J;

Publication

Abstract
AbstractBreakthroughs in sequencing technologies led to an exponential growth of genomic data, providing unprecedented biological in-sights and new therapeutic applications. However, analyzing such large amounts of sensitive data raises key concerns regarding data privacy, specifically when the information is outsourced to third-party infrastructures for data storage and processing (e.g., cloud computing). Current solutions for data privacy protection resort to centralized designs or cryptographic primitives that impose considerable computational overheads, limiting their applicability to large-scale genomic analysis.We introduce Gyosa, a secure and privacy-preserving distributed genomic analysis solution. Unlike in previous work, Gyosafollows a distributed processing design that enables handling larger amounts of genomic data in a scalable and efficient fashion. Further, by leveraging trusted execution environments (TEEs), namely Intel SGX, Gyosaallows users to confidentially delegate their GWAS analysis to untrusted third-party infrastructures. To overcome the memory limitations of SGX, we implement a computation partitioning scheme within Gyosa. This scheme reduces the number of operations done inside the TEEs while safeguarding the users’ genomic data privacy. By integrating this security scheme inGlow, Gyosaprovides a secure and distributed environment that facilitates diverse GWAS studies. The experimental evaluation validates the applicability and scalability of Gyosa, reinforcing its ability to provide enhanced security guarantees. Further, the results show that, by distributing GWASes computations, one can achieve a practical and usable privacy-preserving solution.

2022

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Authors
Baptista, D; Ferreira, PG; Rocha, M;

Publication

Abstract
AbstractOne of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact on performance. Drug features appeared to be more predictive of drug response. Molecular fingerprint-based drug representations performed slightly better than learned representations, and gene expression data of cancer or drug response-specific genes also improved performance. In general, fully connected feature-encoding subnetworks outperformed other architectures, with DL outperforming other ML methods. Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.Author summaryCancer therapies often fail because tumor cells become resistant to treatment. One way to overcome resistance is by treating patients with a combination of two or more drugs. Some combinations may be more effective than when considering individual drug effects, a phenomenon called drug synergy. Computational drug synergy prediction methods can help to identify new, clinically relevant drug combinations. In this study, we developed several deep learning models for drug synergy prediction. We examined the effect of using different types of deep learning architectures, and different ways of representing drugs and cancer cell lines. We explored the use of biological prior knowledge to select relevant cell line features, and also tested data-driven feature reduction methods. We tested both precomputed drug features and deep learning methods that can directly learn features from raw representations of molecules. We also evaluated whether including genomic features, in addition to gene expression data, improves the predictive performance of the models. Through these experiments, we were able to identify strategies that will help guide the development of new deep learning models for drug synergy prediction in the future.

2023

Predicting Age from Human Lung Tissue Through Multi-modal Data Integration

Authors
Moraes, A; Moreno, M; Ribeiro, R; Ferreira, G;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
The accurate prediction of biological age can bring important benefits in promoting therapeutic and behavioural strategies for healthy aging. We propose the development of age prediction models using multi-modal datasets, including transcriptomics, methylation and histological images from lung tissue samples of 793 human donors. From a technical point of view this is a challenging problem since not all donors are covered by the same data modalities and the datasets have a very high feature dimensionality with a relatively smaller number of samples. To fairly compare performance across different data types, we’ve created a test set including donors represented in each modality. Given the unique characteristics of the data distribution, we developed gradient boosting tree and convolutional neural network models for each dataset. The performance of the models can be affected by several covariates, including smoking history, and, most importantly, by a skewed distribution of age. Data-centric approaches, including feature engineering, feature selection, data stratification and resampling, proved fundamental in building models that were optimally adapted for each data modality, resulting in significant improvements in model performance for imbalanced regression. The models were then applied to the test set independently, and later combined into a multi-modal ensemble through a voting strategy, predicting age with a median absolute error of 4 years. Even if prediction accuracy remains a challenge, in this work we provide insights to address the difficulties of multi-modal data integration and imbalanced data prediction. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2023

Privacy-Preserving Machine Learning on Apache Spark

Authors
Brito, CV; Ferreira, PG; Portela, BL; Oliveira, RC; Paulo, JT;

Publication
IEEE ACCESS

Abstract
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

2012

An integrated encyclopedia of DNA elements in the human genome

Authors
Dunham, I; Kundaje, A; Aldred, SF; Collins, PJ; Davis, C; Doyle, F; Epstein, CB; Frietze, S; Harrow, J; Kaul, R; Khatun, J; Lajoie, BR; Landt, SG; Lee, BK; Pauli, F; Rosenbloom, KR; Sabo, P; Safi, A; Sanyal, A; Shoresh, N; Simon, JM; Song, L; Trinklein, ND; Altshuler, RC; Birney, E; Brown, JB; Cheng, C; Djebali, S; Dong, XJ; Dunham, I; Ernst, J; Furey, TS; Gerstein, M; Giardine, B; Greven, M; Hardison, RC; Harris, RS; Herrero, J; Hoffman, MM; Iyer, S; Kellis, M; Khatun, J; Kheradpour, P; Kundaje, A; Lassmann, T; Li, QH; Lin, X; Marinov, GK; Merkel, A; Mortazavi, A; Parker, SCJ; Reddy, TE; Rozowsky, J; Schlesinger, F; Thurman, RE; Wang, J; Ward, LD; Whitfield, TW; Wilder, SP; Wu, W; Xi, HLS; Yip, KY; Zhuang, JL; Bernstein, BE; Birney, E; Dunham, I; Green, ED; Gunter, C; Snyder, M; Pazin, MJ; Lowdon, RF; Dillon, LAL; Adams, LB; Kelly, CJ; Zhang, J; Wexler, JR; Green, ED; Good, PJ; Feingold, EA; Bernstein, BE; Birney, E; Crawford, GE; Dekker, J; Elnitski, L; Farnham, PJ; Gerstein, M; Giddings, MC; Gingeras, TR; Green, ED; Guigo, R; Hardison, RC; Hubbard, TJ; Kellis, M; Kent, WJ; Lieb, JD; Margulies, EH; Myers, RM; Snyder, M; Stamatoyannopoulos, JA; Tenenbaum, SA; Weng, ZP; White, KP; Wold, B; Khatun, J; Yu, Y; Wrobel, J; Risk, BA; Gunawardena, HP; Kuiper, HC; Maier, CW; Xie, L; Chen, X; Giddings, MC; Bernstein, BE; Epstein, CB; Shoresh, N; Ernst, J; Kheradpour, P; Mikkelsen, TS; Gillespie, S; Goren, A; Ram, O; Zhang, XL; Wang, L; Issner, R; Coyne, MJ; Durham, T; Ku, M; Truong, T; Ward, LD; Altshuler, RC; Eaton, ML; Kellis, M; Djebali, S; Davis, CA; Merkel, A; Dobin, A; Lassmann, T; Mortazavi, A; Tanzer, A; Lagarde, J; Lin, W; Schlesinger, F; Xue, CH; Marinov, GK; Khatun, J; Williams, BA; Zaleski, C; Rozowsky, J; Roeder, M; Kokocinski, F; Abdelhamid, RF; Alioto, T; Antoshechkin, I; Baer, MT; Batut, P; Bell, I; Bell, K; Chakrabortty, S; Chen, X; Chrast, J; Curado, J; Derrien, T; Drenkow, J; Dumais, E; Dumais, J; Duttagupta, R; Fastuca, M; Fejes Toth, K; Ferreira, P; Foissac, S; Fullwood, MJ; Gao, H; Gonzalez, D; Gordon, A; Gunawardena, HP; Howald, C; Jha, S; Johnson, R; Kapranov, P; King, B; Kingswood, C; Li, GL; Luo, OJ; Park, E; Preall, JB; Presaud, K; Ribeca, P; Risk, BA; Robyr, D; Ruan, XA; Sammeth, M; Sandhu, KS; Schaeffer, L; See, LH; Shahab, A; Skancke, J; Suzuki, AM; Takahashi, H; Tilgner, H; Trout, D; Walters, N; Wang, HE; Wrobel, J; Yu, YB; Hayashizaki, Y; Harrow, J; Gerstein, M; Hubbard, TJ; Reymond, A; Antonarakis, SE; Hannon, GJ; Giddings, MC; Ruan, YJ; Wold, B; Carninci, P; Guigo, R; Gingeras, TR; Rosenbloom, KR; Sloan, CA; Learned, K; Malladi, VS; Wong, MC; Barber, G; Cline, MS; Dreszer, TR; Heitner, SG; Karolchik, D; Kent, WJ; Kirkup, VM; Meyer, LR; Long, JC; Maddren, M; Raney, BJ; Furey, TS; Song, LY; Grasfeder, LL; Giresi, PG; Lee, BK; Battenhouse, A; Sheffield, NC; Simon, JM; Showers, KA; Safi, A; London, D; Bhinge, AA; Shestak, C; Schaner, MR; Kim, SK; Zhang, ZZZ; Mieczkowski, PA; Mieczkowska, JO; Liu, Z; McDaniell, RM; Ni, YY; Rashid, NU; Kim, MJ; Adar, S; Zhang, ZC; Wang, TY; Winter, D; Keefe, D; Birney, E; Iyer, VR; Lieb, JD; Crawford, GE; Li, GL; Sandhu, KS; Zheng, MZ; Wang, P; Luo, OJ; Shahab, A; Fullwood, MJ; Ruan, XA; Ruan, YJ; Myers, RM; Pauli, F; Williams, BA; Gertz, J; Marinov, GK; Reddy, TE; Vielmetter, J; Partridge, EC; Trout, D; Varley, KE; Gasper, C; Bansal, A; Pepke, S; Jain, P; Amrhein, H; Bowling, KM; Anaya, M; Cross, MK; King, B; Muratet, MA; Antoshechkin, I; Newberry, KM; Mccue, K; Nesmith, AS; Fisher Aylor, KI; Pusey, B; DeSalvo, G; Parker, SL; Balasubramanian, S; Davis, NS; Meadows, SK; Eggleston, T; Gunter, C; Newberry, JS; Levy, SE; Absher, DM; Mortazavi, A; Wong, WH; Wold, B; Blow, MJ; Visel, A; Pennachio, LA; Elnitski, L; Margulies, EH; Parker, SCJ; Petrykowska, HM; Abyzov, A; Aken, B; Barrell, D; Barson, G; Berry, A; Bignell, A; Boychenko, V; Bussotti, G; Chrast, J; Davidson, C; Derrien, T; Despacio Reyes, G; Diekhans, M; Ezkurdia, I; Frankish, A; Gilbert, J; Gonzalez, JM; Griffiths, E; Harte, R; Hendrix, DA; Howald, C; Hunt, T; Jungreis, I; Kay, M; Khurana, E; Kokocinski, F; Leng, J; Lin, MF; Loveland, J; Lu, Z; Manthravadi, D; Mariotti, M; Mudge, J; Mukherjee, G; Notredame, C; Pei, BK; Rodriguez, JM; Saunders, G; Sboner, A; Searle, S; Sisu, C; Snow, C; Steward, C; Tanzer, A; Tapanari, E; Tress, ML; van Baren, MJ; Walters, N; Washietl, S; Wilming, L; Zadissa, A; Zhang, ZD; Brent, M; Haussler, D; Kellis, M; Valencia, A; Gerstein, M; Reymond, A; Guigo, R; Harrow, J; Hubbard, TJ; Landt, SG; Frietze, S; Abyzov, A; Addleman, N; Alexander, RP; Auerbach, RK; Balasubramanian, S; Bettinger, K; Bhardwaj, N; Boyle, AP; Cao, AR; Cayting, P; Charos, A; Cheng, Y; Cheng, C; Eastman, C; Euskirchen, G; Fleming, JD; Grubert, F; Habegger, L; Hariharan, M; Harmanci, A; Iyengar, S; Jin, VX; Karczewski, KJ; Kasowski, M; Lacroute, P; Lam, H; Lamarre Vincent, N; Leng, J; Lian, J; Lindahl Allen, M; Min, RQ; Miotto, B; Monahan, H; Moqtaderi, Z; Mu, XMJ; O'Geen, H; Ouyang, ZQ; Patacsil, D; Pei, BK; Raha, D; Ramirez, L; Reed, B; Rozowsky, J; Sboner, A; Shi, MY; Sisu, C; Slifer, T; Witt, H; Wu, LF; Xu, XQ; Yan, KK; Yang, XQ; Yip, KY; Zhang, ZD; Struhl, K; Weissman, SM; Gerstein, M; Farnham, PJ; Snyder, M; Tenenbaum, SA; Penalva, LO; Doyle, F; Karmakar, S; Landt, SG; Bhanvadia, RR; Choudhury, A; Domanus, M; Ma, LJ; Moran, J; Patacsil, D; Slifer, T; Victorsen, A; Yang, XQ; Snyder, M; White, KP; Auer, T; Centanin, L; Eichenlaub, M; Gruhl, F; Heermann, S; Hoeckendorf, B; Inoue, D; Kellner, T; Kirchmaier, S; Mueller, C; Reinhardt, R; Schertel, L; Schneider, S; Sinn, R; Wittbrodt, B; Wittbrodt, J; Weng, ZP; Whitfield, TW; Wang, J; Collins, PJ; Aldred, SF; Trinklein, ND; Partridge, EC; Myers, RM; Dekker, J; Jain, G; Lajoie, BR; Sanyal, A; Balasundaram, G; Bates, DL; Byron, R; Canfield, TK; Diegel, MJ; Dunn, D; Ebersol, AK; Frum, T; Garg, K; Gist, E; Hansen, RS; Boatman, L; Haugen, E; Humbert, R; Jain, G; Johnson, AK; Johnson, EM; Kutyavin, TV; Lajoie, BR; Lee, K; Lotakis, D; Maurano, MT; Neph, SJ; Neri, FV; Nguyen, ED; Qu, HZ; Reynolds, AP; Roach, V; Rynes, E; Sabo, P; Sanchez, ME; Sandstrom, RS; Sanyal, A; Shafer, AO; Stergachis, AB; Thomas, S; Thurman, RE; Vernot, B; Vierstra, J; Vong, S; Wang, H; Weaver, MA; Yan, YQ; Zhang, MH; Akey, JM; Bender, M; Dorschner, MO; Groudine, M; MacCoss, MJ; Navas, P; Stamatoyannopoulos, G; Kaul, R; Dekker, J; Stamatoyannopoulos, JA; Dunham, I; Beal, K; Brazma, A; Flicek, P; Herrero, J; Johnson, N; Keefe, D; Lukk, M; Luscombe, NM; Sobral, D; Vaquerizas, JM; Wilder, SP; Batzoglou, S; Sidow, A; Hussami, N; Kyriazopoulou Panagiotopoulou, S; Libbrecht, MW; Schaub, MA; Kundaje, A; Hardison, RC; Miller, W; Giardine, B; Harris, RS; Wu, W; Bickel, PJ; Banfai, B; Boley, NP; Brown, JB; Huang, HY; Li, QH; Li, JJ; Noble, WS; Bilmes, JA; Buske, OJ; Hoffman, MM; Sahu, AD; Kharchenko, PV; Park, PJ; Baker, D; Taylor, J; Weng, ZP; Iyer, S; Dong, XJ; Greven, M; Lin, XY; Wang, J; Xi, HLS; Zhuang, JL; Gerstein, M; Alexander, RP; Balasubramanian, S; Cheng, C; Harmanci, A; Lochovsky, L; Min, R; Mu, XMJ; Rozowsky, J; Yan, KK; Yip, KY; Birney, E;

Publication
NATURE

Abstract
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

2010

Mitotic cell-cycle progression is regulated by CPEB1 and CPEB4-dependent translational control

Authors
Novoa, I; Gallego, J; Ferreira, PG; Mendez, R;

Publication
NATURE CELL BIOLOGY

Abstract
Meiotic and early-embryonic cell divisions in vertebrates take place in the absence of transcription and rely on the translational regulation of stored maternal messenger RNAs. Most of these mRNAs are regulated by the cytoplasmic-polyadenylation-element-binding protein (CPEB), which mediates translational activation and repression through cytoplasmic changes in their poly(A) tail length. It was unknown whether translational regulation by cytoplasmic polyadenylation and CPEB can also regulate mRNAs at specific points of mitotic cell-cycle divisions. Here we show that CPEB-mediated post-transcriptional regulation by phase-specific changes in poly(A) tail length is required for cell proliferation and specifically for entry into M phase in mitotically dividing cells. This translational control is mediated by two members of the CPEB family of proteins, CPEB1 and CPEB4. We conclude that regulation of poly(A) tail length is not only required to compensate for the lack of transcription in specialized cell divisions but also acts as a general mechanism to control mitosis.

  • 5
  • 14