Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Sobre

Sobre

Pedro G. Ferreira graduated in Systems and Informatics Engineering (2002) and completed a PhD in Artificial Intelligence from University of Minho (2007). He was a Postdoctoral Fellow at Center for Genomic Regulation, Barcelona (2008-2012) and at University of Geneva (2012-2014). He participated in several major international consortia including ICGC-CLL, ENCODE, GEUVADIS and GTEx. Currently, he is an Assistant Professor at the Department of Computer Science, Faculty of Sciences of University of Porto and a researcher at INESCTEC-LIADD and i3s/Ipatimup. His main research focus is in genomic data science. In particular, he is interested in unraveling the role of genomics on the human health and disease. He has been involved in several bioinformatics start-ups.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Pedro Gabriel Ferreira
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    20 setembro 2018
Publicações

2022

The pegi3s Bioinformatics Docker Images Project

Autores
Lopez Fernandez, H; Ferreira, P; Reboiro Jato, M; Vieira, CP; Vieira, J;

Publicação
PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, PACBB 2021

Abstract
Among the available Linux container technologies, Docker is one of the most popular ones. Docker images can be used to provide ready-to-use software packages, where all required dependencies are already installed, and they can be deployed in any operating system where Docker is installed. They are also a convenient way to store immutable working software packages, thus contributing to reproducibility. Moreover, the usage of Docker images greatly eases the development of complex pipelines, standalone software applications with graphical user interfaces that require external software, and even the development of databases. Therefore, not surprisingly, Docker images are now ubiquitously used in computational biology and bioinformatics. Here, we present the pegi3s Bioinformatics Docker Images Project (https://pegi3s.github.io/dockerfiles/), a collection of more than 70 Docker images for commonly used software in the fields of genomics, transcriptomics, proteomics, phylogenetics, and sequence handling, among others, that is constantly growing. Several features distinguish this project from much larger projects, namely: 1) by providing a list of tools that are classified into broad categories, it is easier to find the most adequate tool(s) for a given project; 2) by providing the hyperlinks to the software manuals, we facilitate the process of finding the parameter combinations that are best suited for a given processing step; 3) most importantly, we provide clear instructions on how to run the images, provide test data that can be used to quickly evaluate the Docker image, and give all details on how each Docker image was built. All images are routinely used by ourselves, in the context of our research and teaching activities, meaning that they have been extensively tested. Therefore, we believe that this project, which is offered as a service in the context of the European ELIXIR program, is of interest to many researchers, independently of having or not a background in informatics.

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Autores
Moreno, M; Vilaca, R; Ferreira, PG;

Publicação
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

2021

Deep learning for drug response prediction in cancer

Autores
Baptista, D; Ferreira, PG; Rocha, M;

Publicação
BRIEFINGS IN BIOINFORMATICS

Abstract
Abstract Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review recently published studies that have employed DL methods to predict drug response in cancer cell lines. We also provide a brief description of DL and the main types of architectures that have been used in these studies. Additionally, we present a selection of publicly available drug screening data resources that can be used to develop drug response prediction models. Finally, we also address the limitations of these approaches and provide a discussion on possible paths for further improvement. Contact:mrocha@di.uminho.pt

2021

Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease

Autores
de Goede, OM; Nachun, DC; Ferraro, NM; Gloudemans, MJ; Rao, AS; Smail, C; Eulalio, TY; Aguet, F; Ng, B; Xu, J; Barbeira, AN; Castel, SE; Kim-Hellmuth, S; Park, Y; Scott, AJ; Strober, BJ; Brown, CD; Wen, X; Hall, IM; Battle, A; Lappalainen, T; Im, HK; Ardlie, KG; Mostafavi, S; Quertermous, T; Kirkegaard, K; Montgomery, SB; Anand, S; Gabriel, S; Getz, GA; Graubert, A; Hadley, K; Handsaker, RE; Huang, KH; Li, X; MacArthur, DG; Meier, SR; Nedzel, JL; Nguyen, DT; Segrè, AV; Todres, E; Balliu, B; Bonazzola, R; Brown, A; Conrad, DF; Cotter, DJ; Cox, N; Das, S; Dermitzakis, ET; Einson, J; Engelhardt, BE; Eskin, E; Flynn, ED; Fresard, L; Gamazon, ER; Garrido-Martín, D; Gay, NR; Guigó, R; Hamel, AR; He, Y; Hoffman, PJ; Hormozdiari, F; Hou, L; Jo, B; Kasela, S; Kashin, S; Kellis, M; Kwong, A; Li, X; Liang, Y; Mangul, S; Mohammadi, P; Muñoz-Aguirre, M; Nobel, AB; Oliva, M; Park, Y; Parsana, P; Reverter, F; Rouhana, JM; Sabatti, C; Saha, A; Stephens, M; Stranger, BE; Teran, NA; Viñuela, A; Wang, G; Wright, F; Wucher, V; Zou, Y; Ferreira, PG; Li, G; Melé, M; Yeger-Lotem, E; Bradbury, D; Krubit, T; McLean, JA; Qi, L; Robinson, K; Roche, NV; Smith, AM; Tabor, DE; Undale, A; Bridge, J; Brigham, LE; Foster, BA; Gillard, BM; Hasz, R; Hunter, M; Johns, C; Johnson, M; Karasik, E; Kopen, G; Leinweber, WF; McDonald, A; Moser, MT; Myer, K; Ramsey, KD; Roe, B; Shad, S; Thomas, JA; Walters, G; Washington, M; Wheeler, J; Jewell, SD; Rohrer, DC; Valley, DR; Davis, DA; Mash, DC; Barcus, ME; Branton, PA; Sobin, L; Barker, LK; Gardiner, HM; Mosavel, M; Siminoff, LA; Flicek, P; Haeussler, M; Juettemann, T; Kent, WJ; Lee, CM; Powell, CC; Rosenbloom, KR; Ruffier, M; Sheppard, D; Taylor, K; Trevanion, SJ; Zerbino, DR; Abell, NS; Akey, J; Chen, L; Demanelis, K; Doherty, JA; Feinberg, AP; Hansen, KD; Hickey, PF; Jasmine, F; Jiang, L; Kaul, R; Kibriya, MG; Li, JB; Li, Q; Lin, S; Linder, SE; Pierce, BL; Rizzardi, LF; Skol, AD; Smith, KS; Snyder, M; Stamatoyannopoulos, J; Tang, H; Wang, M; Carithers, LJ; Guan, P; Koester, SE; Little, AR; Moore, HM; Nierras, CR; Rao, AK; Vaught, JB; Volpi, S;

Publicação
Cell

Abstract
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

2021

On the Identification of Clinically Relevant Bacterial Amino Acid Changes at the Whole Genome Level Using Auto-PSS-Genome

Autores
Lopez Fernandez, H; Vieira, CP; Ferreira, P; Gouveia, P; Fdez Riverola, F; Reboiro Jato, M; Vieira, J;

Publicação
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Abstract
The identification of clinically relevant bacterial amino acid changes can be performed using different methods aimed at the identification of genes showing positively selected amino acid sites (PSS). Nevertheless, such analyses are time consuming, and the frequency of genes showing evidence for PSS can be low. Therefore, the development of a pipeline that allows the quick and efficient identification of the set of genes that show PSS is of interest. Here, we present Auto-PSS-Genome, a Compi-based pipeline distributed as a Docker image, that automates the process of identifying genes that show PSS using three different methods, namely codeML, FUBAR, and omegaMap. Auto-PSS-Genome accepts as input a set of FASTA files, one per genome, containing all coding sequences, thus minimizing the work needed to conduct positively selected sites analyses. The Auto-PSS-Genome pipeline identifies orthologous gene sets and corrects for multiple possible problems in input FASTA files that may prevent the automated identification of genes showing PSS. A FASTA file containing all coding sequences can also be given as an external global reference, thus easing the comparison of results across species, when gene names are different. In this work, we use Auto-PSS-Genome to analyse Mycobacterium leprae (that causes leprosy), and the closely related species M. haemophilum, that mainly causes ulcerating skin infections and arthritis in persons who are severely immunocompromised, and in children causes cervical and perihilar lymphadenitis. The genes identified in these two species as showing PSS may be those that are partially responsible for virulence and resistance to drugs. [GRAPHICS] .

Teses
supervisionadas

2020

Detecting Abnormal Laboratory Test Results with Machine Learning

Autor
Rafael José Palhares Santos

Instituição
UP-FCUP

2020

A comparative evaluation of dimensionality reduction methods on large-scale gene expression datasets

Autor
Sara Carolina Martins Ribeiro

Instituição
UP-FCUP

2020

HSA/CD24: a cardiomyocyte precursor biomarker and/or a gateway to cardiomyogenesis

Autor
Catarina Lopes Alves

Instituição
UP-FCUP

2020

Neoantigen Signature Automatized Pipeline to predict Immunotherapy response

Autor
Francisca Isabel Conrado Dias Ferreira da Silva

Instituição
UP-FCUP

2020

Transcriptomics-based prediction of human phenotypes using scalable and secure machine learning approaches

Autor
Marta Carolina Cabral Moreno

Instituição
UP-FCUP