Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Rui Camacho

2019

EvoPPI 1.0: a Web Platform for Within- and Between-Species Multiple Interactome Comparisons and Application to Nine PolyQ Proteins Determining Neurodegenerative Diseases

Authors
Vazquez, N; Rocha, S; Lopez Fernandez, H; Torres, A; Camacho, R; Fdez Riverola, F; Vieira, J; Vieira, CP; Reboiro Jato, M;

Publication
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Abstract
Protein-protein interaction (PPI) data is essential to elucidate the complex molecular relationships in living systems, and thus understand the biological functions at cellular and systems levels. The complete map of PPIs that can occur in a living organism is called the interactome. For animals, PPI data is stored in multiple databases (e.g., BioGRID, CCSB, DroID, FlyBase, HIPPIE, HitPredict, HomoMINT, INstruct, Interactome3D, mentha, MINT, and PINA2) with different formats. This makes PPI comparisons difficult to perform, especially between species, since orthologous proteins may have different names. Moreover, there is only a partial overlap between databases, even when considering a single species. The EvoPPI (http://evoppi.i3s.up.pt) web application presented in this paper allows comparison of data from the different databases at the species level, or between species using a BLAST approach. We show its usefulness by performing a comparative study of the interactome of the nine polyglutamine (polyQ) disease proteins, namely androgen receptor (AR), atrophin-1 (ATN1), ataxin 1 (ATXN1), ataxin 2 (ATXN2), ataxin 3 (ATXN3), ataxin 7 (ATXN7), calcium voltage-gated channel subunit alpha1 A (CACNA1A), Huntingtin (HTT), and TATA-binding protein (TBP). Here we show that none of the human interactors of these proteins is common to all nine interactomes. Only 15 proteins are common to at least 4 of these polyQ disease proteins, and 40% of these are involved in ubiquitin protein ligase-binding function. The results obtained in this study suggest that polyQ disease proteins are involved in different functional networks. Comparisons with Mus musculus PPIs are also made for AR and TBP, using EvoPPI BLAST search approach (a unique feature of EvoPPI), with the goal of understanding why there is a significant excess of common interactors for these proteins in humans.

2019

Empowering Distributed Analysis Across Federated Cohort Data Repositories Adhering to FAIR Principles

Authors
Rocha, A; Ornelas, JP; Lopes, JC; Camacho, R;

Publication
ERCIM NEWS

Abstract
Novel data collection tools, methods and new techniques in biotechnology can facilitate improved health strategies that are customised to each individual. One key challenge to achieve this is to take advantage of the massive volumes of personal anonymous data, relating each profile to health and disease, while accounting for high diversity in individuals, populations and environments. These data must be analysed in unison to achieve statistical power, but presently cohort data repositories are scattered, hard to search and integrate, and data protection and governance rules discourage central pooling.

2019

Comparative Study of Feature Selection Methods for Medical Full Text Classification

Authors
Gonçalves, CA; Iglesias, EL; Borrajo, L; Camacho, R; Vieira, AS; Gonçalves, CT;

Publication
BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2019), PT II

Abstract
There is a lot of work in text categorization using only the title and abstract of the papers. However, in a full paper there is a much larger amount of information that could be used to improve the text classification performance. The potential benefits of using full texts come with an additional problem: the increased size of the data sets. To overcome the increased the size of full text data sets we performed an assessment study on the use of feature selection methods for full text classification. We have compared two existing feature selection methods (Information Gain and Correlation) and a novel method called k-Best-Discriminative-Terms. The assessment was conducted using the Ohsumed corpora. We have made two sets of experiments: using title and abstract only; and full text. The results achieved by the novel method show that the novel method does not perform well in small amounts of text like title and abstract but performs much better for the full text data sets and requires a much smaller number of attributes.

2020

Using autoencoders as a weight initialization method on deep neural networks for disease detection

Authors
Ferreira, MF; Camacho, R; Teixeira, LF;

Publication
BMC MEDICAL INFORMATICS AND DECISION MAKING

Abstract
Background As of today, cancer is still one of the most prevalent and high-mortality diseases, summing more than 9 million deaths in 2018. This has motivated researchers to study the application of machine learning-based solutions for cancer detection to accelerate its diagnosis and help its prevention. Among several approaches, one is to automatically classify tumor samples through their gene expression analysis. Methods In this work, we aim to distinguish five different types of cancer through RNA-Seq datasets: thyroid, skin, stomach, breast, and lung. To do so, we have adopted a previously described methodology, with which we compare the performance of 3 different autoencoders (AEs) used as a deep neural network weight initialization technique. Our experiments consist in assessing two different approaches when training the classification model - fixing the weights after pre-training the AEs, or allowing fine-tuning of the entire network - and two different strategies for embedding the AEs into the classification network, namely by only importing the encoding layers, or by inserting the complete AE. We then study how varying the number of layers in the first strategy, the AEs latent vector dimension, and the imputation technique in the data preprocessing step impacts the network's overall classification performance. Finally, with the goal of assessing how well does this pipeline generalize, we apply the same methodology to two additional datasets that include features extracted from images of malaria thin blood smears, and breast masses cell nuclei. We also discard the possibility of overfitting by using held-out test sets in the images datasets. Results The methodology attained good overall results for both RNA-Seq and image extracted data. We outperformed the established baseline for all the considered datasets, achieving an average F(1)score of 99.03, 89.95, and 98.84 and an MCC of 0.99, 0.84, and 0.98, for the RNA-Seq (when detecting thyroid cancer), the Malaria, and the Wisconsin Breast Cancer data, respectively. Conclusions We observed that the approach of fine-tuning the weights of the top layers imported from the AE reached higher results, for all the presented experiences, and all the considered datasets. We outperformed all the previous reported results when comparing to the established baselines.

2020

Gastric Microbiome Diversities in Gastric Cancer Patients from Europe and Asia Mimic the Human Population Structure and Are Partly Driven by Microbiome Quantitative Trait Loci

Authors
Cavadas, B; Camacho, R; Ferreira, JC; Ferreira, RM; Figueiredo, C; Brazma, A; Fonseca, NA; Pereira, L;

Publication
MICROORGANISMS

Abstract
The human gastrointestinal tract harbors approximately 100 trillion microorganisms with different microbial compositions across geographic locations. In this work, we used RNASeq data from stomach samples of non-disease (164 individuals from European ancestry) and gastric cancer patients (137 from Europe and Asia) from public databases. Although these data were intended to characterize the human expression profiles, they allowed for a reliable inference of the microbiome composition, as confirmed from measures such as the genus coverage, richness and evenness. The microbiome diversity (weighted UniFrac distances) in gastric cancer mimics host diversity across the world, with European gastric microbiome profiles clustering together, distinct from Asian ones. Despite the confirmed loss of microbiome diversity from a healthy status to a cancer status, the structured profile was still recognized in the disease condition. In concordance with the parallel host-bacteria population structure, we found 16 human loci (non-synonymous variants) in the European-descendent cohorts that were significantly associated with specific genera abundance. These microbiome quantitative trait loci display heterogeneity between population groups, being mainly linked to the immune system or cellular features that may play a role in enabling microbe colonization and inflammation.

2021

CMIID: A comprehensive medical information identifier for clinical search harmonization in Data Safe Havens

Authors
Domingues, MAP; Camacho, R; Rodrigues, PP;

Publication
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Over the last decades clinical research has been driven by informatics changes nourished by distinct research endeavors. Inherent to this evolution, several issues have been the focus of a variety of studies: multi-location patient data access, interoperability between terminological and classification systems and clinical practice and records harmonization. Having these problems in mind, the Data Safe Haven paradigm emerged to promote a newborn architecture, better reasoning and safe and easy access to distinct Clinical Data Repositories. This study aim is to present a novel solution for clinical search harmonization within a safe environment, making use of a hybrid coding taxonomy that enables researchers to collect information from multiple repositories based on a clinical domain query definition. Results show that is possible to query multiple repositories using a single query definition based on clinical domains and the capabilities of the Unified Medical Language System, although it leads to deterioration of the framework response times. Participants of a Focus Group and a System Usability Scale questionnaire rated the framework with a median value of 72.5, indicating the hybrid coding taxonomy could be enriched with additional metadata to further improve the refinement of the results and enable the possibility of using this system as data quality tagging mechanism.

  • 4
  • 20