2009
Authors
Fonseca, NA; Vieira, CP; Vieira, J;
Publication
PLOS ONE
Abstract
Background: Comparative studies using hundreds of sequences can give a detailed picture of the evolution of a given gene family. Nevertheless, retrieving only the sequences of interest from public databases can be difficult, in particular, when working with highly divergent sequences. The difficulty increases substantially when one wants to include in the study sequences from many (or less well studied) species whose genomes are non-annotated or incompletely annotated. Methodology/Principal Findings: In this work we evaluate the usefulness of different approaches of gene retrieval and classification, using the distal-less (DLX) gene family as a test case. Furthermore, we evaluate whether the use of a large number of gene sequences from a wide range of animal species, the use of multiple alternative alignments, and the use of amino acids aligned with high confidence only, is enough to recover the accepted DLX evolutionary history. Conclusions/Significance: The canonical DLX homeobox gene sequence here derived, together with the characteristic amino acid variants here identified in the DLX homeodomain region, can be used to retrieve and classify DLX genes in a simple and efficient way. A program is made available that allows the easy retrieval of synteny information that can be used to classify gene sequences. Maximum likelihood trees using hundreds of sequences can be used for gene identification. Nevertheless, for the DLX case, the proposed DLX evolutionary is not recovered even when multiple alignment algorithms are used.
2009
Authors
Vieira, J; Fonseca, NA; Vieira, CP;
Publication
JOURNAL OF MOLECULAR EVOLUTION
Abstract
Multiple independent recruitments of the S-pollen component (always an F-box gene) during RNase-based gametophytic self-incompatibility evolution have recently been suggested. Therefore, different mechanisms could be used to achieve the rejection of incompatible pollen in different plant families. This hypothesis is, however, mainly based on the interpretation of phylogenetic analyses, using a small number of divergent nucleotide sequences. In this work we show, based on a large collection of F-box S-like sequences, that the inferred relationship of F-box S-pollen and F-box S-like sequences is dependent on the sequence alignment software and phylogenetic method used. Thus, at present, it is not possible to address the phylogenetic relationship of F-box S-pollen and S-like sequences from different plant families. In Petunia and Malus/ Pyrus the putative S-pollen gene(s) show(s) variability patterns different than expected for an S-pollen gene, raising the question of false identification. Here we show that in Petunia, the unexpected features of the putative S-pollen gene are not incompatible with this gene's being the S-pollen gene. On the other hand, it is very unlikely that the Pyrus SFBB-gamma gene is involved in specificity determination.
2009
Authors
Fonseca, NA; Dutra, I;
Publication
IBERGRID: 3RD IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS
Abstract
From an application point of view, the Grid computing with its powerful processing power and large amounts of data storage offers the possibility to process large quantities of data, to run computationally-intensive operations, or both. For instance, in computational biological pipelines, one often has to process large quantities of data in individually computationally-intensive operations. To process this data in the Grid, hundreds, or even thousands of jobs need to be submitted and their results processed. Obviously, performing these tasks manually is unfeasible. On the other hand, developing software to this end, specifically for a single application, is unproductive because if the application changes, or the Grid submission engine changes, then the code needs to be rewritten. In this paper we present a middleware that facilitates the submission of jobs to grids (or clusters) and helps handling their results. The middleware, that we call UbiDis (Ubiquitous Distribution), copies all files necessary for running the program to the UI or front-end host (in a Grid or cluster), compiles programs on the UI or front-end (if necessary), generates and submits the jobs, and copies the outputs to the local machine. Furthermore, UbiDis transparently generates jobs to different job managers, allowing the user to easily and quickly change the location to where the jobs are submitted. Finally, we illustrate the usefulness of UbiDis using two applications.
2009
Authors
Pereira, M; Costa, VS; Camacho, R; Fonseca, NA; Simoes, C; Brito, RMM;
Publication
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS
Abstract
The rational development of new drugs is a complex and expensive process, comprising several steps. Typically, it starts by screening databases of small organic molecules for chemical structures with potential of binding to a target receptor and prioritizing the most promising ones. Only a few of these will be selected for biological evaluation and further refinement through chemical synthesis. Despite the accumulated knowledge by pharmaceutical companies that continually improve the process of finding new drugs, a myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognized as the major hurdle behind the current "target-rich, lead-poor" scenario. In this study we evaluate the use of several Machine Learning algorithms to find useful rules to the elucidation and prediction of toxicity using ID and 2D molecular descriptors. The results indicate that: i) Machine Learning algorithms can effectively use ID molecular descriptors to construct accurate and simple models; ii) extending the set of descriptors to include 2D descriptors improve the accuracy of the models.
2009
Authors
Pereira, M; Costa, VS; Camacho, R; Fonseca, NA;
Publication
DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS
Abstract
In this paper we present the work in progress on LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. In particular, we describe the integration with a molecule visualisation software that allows the chemist to graphically control the search for interesting patterns in chemical fragments. Furthermore, we show how structured information, such as rings, functional groups like carboxyl, amine, methyl, ester, etc are integrated and exploited in LogCHEM.
2009
Authors
Fonseca, NA; Costa, VS; Camacho, R; Vieira, C; Vieira, J;
Publication
DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS
Abstract
We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that; the method proposed Produces understand able descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.