Publications

Publications by Nuno Fonseca

2008

An S-RNase-based gametophytic self-incompatibility system evolved only once in eudicots

Authors
Vieira, J; Fonseca, NA; Vieira, CP;

Publication
JOURNAL OF MOLECULAR EVOLUTION

Abstract
It has been argued that the common ancestor of about 75% of all dicots possessed an S-RNase-based gametophytic self-incompatibility (GSI) system. S-RNase genes should thus be found in most plant families showing GSI. The S-RNase gene (or a duplicate) may also acquire a new function and thus genes belonging to the S-RNase lineage may also persist in plant families without GSI. Nevertheless, sequences that belong to the S-RNase lineage have been found in the Solanaceae, Scrophulariaceae, Rosaceae, Cucurbitaceae, and Fabaceae plant families only. Here we search for new sequences that may belong to the S-RNase lineage, using both a phylogenetic and a much faster and simpler amino acid pattern-based approach. We show that the two methods have an apparently similar false-negative rate of discovery (similar to 10%). The amino acid pattern-based approach produces about 15% false positives. Genes belonging to the S-RNase lineage are found in three new plant families, namely, the Rubiaceae, Euphorbiaceae, and Malvaceae. Acquisition of a new function by genes belonging to the S-RNase lineage is shown to be a frequent event. A putative S-RNase sequence is identified in Lotus, a plant genus for which molecular studies on GSI are lacking. The hypothesis of a single origin for S-RNase-based GSI (before the split of the Asteridae and Rosidae) is further supported by the finding of genes belonging to the S-RNase lineage in some of the oldest lineages of the Asteridae and Rosidae, and by Baysean constrained tree analyses.

CloseRead Abstract

2008

Amino acid pairing at the N- and C-termini of helical segments in proteins

Authors
Fonseca, NA; Camacho, R; Magalhaes, AL;

Publication
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS

Abstract
A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 angstrom and with a R-factor lower than 25%. A set of 5556 subsequences with a-helix or 3(10)-helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N-int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N-cap and C-cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu-Lys-Tyr-Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in alpha-helices when Pro is located at the C-cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site.

CloseRead Abstract

2010

Phylogeny of the Teashirt-related Zinc Finger (tshz) Gene Family and Analysis of the Developmental Expression of tshz2 and tshz3b in the Zebrafish

Authors
Santos, JS; Fonseca, NA; Vieira, CP; Vieira, J; Casares, F;

Publication
DEVELOPMENTAL DYNAMICS

Abstract
The tshz genes comprise a family of evolutionarily conserved transcription factors. However, despite the major role played by Drosophila tsh during the development of the fruit fly, the expression and function of other tshz genes have been analyzed in a very limited set of organisms and, therefore, our current knowledge of these genes is still fragmentary. In this study, we perform detailed phylogenetic analyses of the tshz genes, identify the members of this gene family in zebrafish and describe the developmental expressions of two of them, tshz2 and tshz3b, and compare them with meis1, meis2.1, meis2.2, pax6a, and pax6b expression patterns. The expression patterns of these genes define a complex set of coexpression domains in the developing zebrafish brain where their gene products have the potential to interact. Developmental Dynamics 239:1010-1018, 2010. (C) 2010 Wiley-Liss, Inc.

CloseRead Abstract

2011

Predicting Malignancy from Mammography Findings and Surgical Biopsies

Authors
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publication
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011)

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

CloseRead Abstract

2011

STUDYING THE RELEVANCE OF BREAST IMAGING FEATURES

Authors
Ferreira, P; Dutra, I; Fonseca, NA; Woods, R; Burnside, E;

Publication
HEALTHINF 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HEALTH INFORMATICS

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer in an initial stage. The sole exam approved for this purpose is mammography that, despite the existence of more advanced technologies, is considered the cheapest and most efficient method to detect cancer in a preclinical stage. We investigate, using machine learning techniques, how attributes obtained from mammographies can relate to malignancy. In particular, this study focus is on how mass density can influence malignancy from a data set of 348 patients containing, among other information, results of biopsies. To this end, we applied different learning algorithms on the data set using the WEKA tools, and performed significance tests on the results. The conclusions are threefold: (1) automatic classification of a mammography can reach equal or better results than the ones annotated by specialists, which can help doctors to quickly concentrate on some specific mammogram for a more thorough study; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) we can obtain classifiers that can predict mass density with a quality as good as the specialist blind to biopsy.

CloseRead Abstract

2009

UbiDis: a Flexible and General top-level Middleware to Manage Applications in Grids and Clusters

Authors
Fonseca, NA; Dutra, I;

Publication
IBERGRID: 3RD IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS

Abstract
From an application point of view, the Grid computing with its powerful processing power and large amounts of data storage offers the possibility to process large quantities of data, to run computationally-intensive operations, or both. For instance, in computational biological pipelines, one often has to process large quantities of data in individually computationally-intensive operations. To process this data in the Grid, hundreds, or even thousands of jobs need to be submitted and their results processed. Obviously, performing these tasks manually is unfeasible. On the other hand, developing software to this end, specifically for a single application, is unproductive because if the application changes, or the Grid submission engine changes, then the code needs to be rewritten. In this paper we present a middleware that facilitates the submission of jobs to grids (or clusters) and helps handling their results. The middleware, that we call UbiDis (Ubiquitous Distribution), copies all files necessary for running the program to the UI or front-end host (in a Grid or cluster), compiles programs on the UI or front-end (if necessary), generates and submits the jobs, and copies the outputs to the local machine. Furthermore, UbiDis transparently generates jobs to different job managers, allowing the user to easily and quickly change the location to where the jobs are submitted. Finally, we illustrate the usefulness of UbiDis using two applications.

CloseRead Abstract