Publications

Publications by Rui Camacho

2014

Ranking MEDLINE documents

Authors
Gonçalves, CT; Camacho, R; Oliveira, EC;

Publication
J. Braz. Comp. Soc.

Abstract
Background: BioTextRetriever is a Web-based search tool for retrieving relevant literature in Molecular Biology and related domains from MEDLINE. The core of BioTextRetriever is the dynamic construction of a classifier capable of selecting relevant papers among the whole MEDLINE bibliographic database. “Relevant” papers, in this context, means papers related to a set of DNA or protein sequences provided as input to the tool by the user. Methods: Since the number of retrieved papers may be very large, BioTextRetriever uses a novel ranking algorithm to retrieve the most relevant papers first. We have developed a new methodology that enables the automation of the assessment process based on a multi-criteria ranking function. This function combines six factors: MeSH terms, paper’s number of citations, author’s h-index, journals impact factor, author number of publications and journal similarity function. Results: The best results highlight the number of citations and the h-index factors. Conclusions: We have developed and a multi-criteria ranking function, that contemplates six factors, and that seems appropriate to retrieve relevant papers out of a huge repository such as MEDLINE. © 2014, Gonçalves et al.; licensee Springer.

CloseRead Abstract

2017

Co-expression networks between protein encoding mitochondrial genes and all the remaining genes in human tissues

Authors
Almeida, J; Ferreira, J; Camacho, R; Pereira, L;

Publication
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)

Abstract
Recent advances in sequencing allow the study of all identified human genes (22,000 protein encoding genes), which have differential expression between tissues. However, current knowledge on gene interactions lags behind, especially when one of the elements encodes a mitochondrial protein (1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes-all genes, and associated protein networks in several human tissues (Genotype-Tissue Expression database). We developed a pipeline and a web tree viewer that is available at GitHub (https://github.com/Pereira-lab/CoExpression). Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all protein encoding genes, which act in pathways of functional importance such as energy production and metabolite synthesis, especially in brain tissues. The strongest correlation between mtDNA genes are with genes encoded by this genome, showing that correlation among genes encoded by the same genome is more efficient.

CloseRead Abstract

2018

Using multi-relational data mining to discriminate blended therapy efficiency on patients based on log data

Authors
Rocha, A; Camacho, R; Ruwaard, J; Riper, H;

Publication
INTERNET INTERVENTIONS-THE APPLICATION OF INFORMATION TECHNOLOGY IN MENTAL AND BEHAVIOURAL HEALTH

Abstract
Introduction: Clinical trials of blended Internet-based treatments deliver a wealth of data from various sources, such as self-report questionnaires, diagnostic interviews, treatment platform log files and Ecological Momentary Assessments (EMA). Mining these complex data for clinically relevant patterns is a daunting task for which no definitive best method exists. In this paper, we explore the expressive power of the multi-relational Inductive Logic Programming (ILP) data mining approach, using combined trial data of the EU E-COMPARED depression trial. Methods: We explored the capability of ILP to handle and combine (implicit) multiple relationships in the E-COMPARED data. This data set has the following features that favor ILP analysis: 1) Time reasoning is involved; 2) there is a reasonable amount of explicit useful relations to be analyzed; 3) ILP is capable of building comprehensible models that might be perceived as putative explanations by domain experts; 4) both numerical and statistical models may coexist within ILP models if necessary. In our analyses, we focused on scores of the PHQ-8 self-report questionnaire (which taps depressive symptom severity), and on EMA of mood and various other clinically relevant factors. Both measures were administered during treatment, which lasted between 9 to 16 weeks. Results: E-COMPARED trial data revealed different individual improvement patterns: PHQ-8 scores suggested that some individuals improved quickly during the first weeks of the treatment, while others improved at a (much) slower pace, or not at all. Combining self-reported Ecological Momentary Assessments (EMA), PHQ-8 scores and log data about the usage of the ICT4D platform in the context of blended care, we set out to unveil possible causes for these different trajectories. Discussion: This work complements other studies into alternative data mining approaches to E-COMPARED trial data analysis, which are all aimed to identify clinically meaningful predictors of system use and treatment outcome. Strengths and limitations of the ILP approach given this objective will be discussed.

CloseRead Abstract

2019

EvoPPI: A Web Application to Compare Protein-Protein Interactions (PPIs) from Different Databases and Species

Authors
Vazquez, N; Rocha, S; Lopez Fernandez, H; Torres, A; Camacho, R; Fdez Riverola, F; Vieira, J; Vieira, CP; Reboiro Jato, M;

Publication
PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Abstract
Biological processes are mediated by protein-protein interactions (PPI) that have been studied using different methodologies, and organized as centralized repositories - PPI databases. The data stored in the different PPI databases only overlaps partially. Moreover, some of the repositories are dedicated to a species or subset of species, not all have the same functionalities, or store data in the same format, making comparisons between different databases difficult to perform. Therefore, here we present EvoPPI (http://evoppi.i3s.up.pt), an open source web application tool that allows users to compare the protein interactions reported in two different interactomes. When interactomes belong to different species, a versatile BLAST search approach is used to identify orthologous/paralogous genes, which to our knowledge is a unique feature of EvoPPI.

CloseRead Abstract

2018

LearnSec: A Framework for Full Text Analysis

Authors
Goncalves, C; Iglesias, EL; Borrajo, L; Camacho, R; Vieira, AS; Goncalves, CT;

Publication
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018)

Abstract
Large corpus of scientific research papers have been available for a long time. However, most of those corpus store only the title and the abstract of the paper. For some domains this information may not be enough to achieve high performance in text mining tasks. This problem has been recently reduced by the growing availability of full text scientific research papers. A full text version provides more detailed information but, on the other hand, a large amount of data needs to be processed. A priori, it is difficult to know if the extra work of the full text analysis has a significant impact in the performance of text mining tasks, or if the effect depends on the scientific domain or the specific corpus under analysis. The goal of this paper is to show a framework for full text analysis, called LearnSec, which incorporates domain specific knowledge and information about the content of the document sections to improve the classification process with propositional and relational learning. To demonstrate the usefulness of the tool, we process a scientific corpus based on OSHUMED, generating an attribute/value dataset in Weka format and a First Order Logic dataset in Inductive Logic Programming (ILP) format. Results show a successful assessment of the framework.

CloseRead Abstract

2018

Autoencoders as Weight Initialization of Deep Classification Networks Applied to Papillary Thyroid Carcinoma

Authors
Ferreira, MF; Camacho, R; Teixeira, LF;

Publication
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)

Abstract
Cancer is one of the most serious health problems of our time. One approach for automatically classifying tumor samples is to analyze derived molecular information. Previous work by Teixeira et al. compared different methods of Data Oversampling and Feature Reduction, as well as Deep (Stacked) Denoising Autoencoders followed by a shallow layer for classification. In this work, we compare the performance of 6 different types of Autoencoder (AE), combined with two different approaches when training the classification model: (a) fixing the weights, after pretraining an AE, and (b) allowing fine-tuning of the entire network. We also apply two different strategies for embedding the AE into the classification network: (1) by only importing the encoding layers, and (2) by importing the complete AE. Our best result was the combination of unsupervised feature learning through a single-layer Denoising AE, followed by its complete import into the classification network, and subsequent fine-tuning through supervised training, achieving an F1 score of 99.61% +/- 0.54. We conclude that a reconstruction of the input space, combined with a deeper classification network outperforms previous work, without resorting to data augmentation techniques.

CloseRead Abstract