Publicacoes - INESC TEC

Publicações

Publicações por Nuno Fonseca

2009

Gene Classification Based on Amino Acid Motifs and Residues: The DLX (distal-less) Test Case

Autores
Fonseca, NA; Vieira, CP; Vieira, J;

Publicação
PLOS ONE

Abstract
Background: Comparative studies using hundreds of sequences can give a detailed picture of the evolution of a given gene family. Nevertheless, retrieving only the sequences of interest from public databases can be difficult, in particular, when working with highly divergent sequences. The difficulty increases substantially when one wants to include in the study sequences from many (or less well studied) species whose genomes are non-annotated or incompletely annotated. Methodology/Principal Findings: In this work we evaluate the usefulness of different approaches of gene retrieval and classification, using the distal-less (DLX) gene family as a test case. Furthermore, we evaluate whether the use of a large number of gene sequences from a wide range of animal species, the use of multiple alternative alignments, and the use of amino acids aligned with high confidence only, is enough to recover the accepted DLX evolutionary history. Conclusions/Significance: The canonical DLX homeobox gene sequence here derived, together with the characteristic amino acid variants here identified in the DLX homeodomain region, can be used to retrieve and classify DLX genes in a simple and efficient way. A program is made available that allows the easy retrieval of synteny information that can be used to classify gene sequences. Maximum likelihood trees using hundreds of sequences can be used for gene identification. Nevertheless, for the DLX case, the proposed DLX evolutionary is not recovered even when multiple alignment algorithms are used.

FecharLer Abstract

2008

Protein evolution of ANTP and PRD homeobox genes

Autores
Fonseca, NA; Vieira, CP; Holland, PWH; Vieira, J;

Publicação
BMC EVOLUTIONARY BIOLOGY

Abstract
Background: Although homeobox genes have been the subject of many studies, little is known about the main amino acid changes that occurred early in the evolution of genes belonging to different classes. Results: In this study, we report a method for the fast and efficient retrieval of sequences belonging to the ANTP (HOXL and NKL) and PRD classes. Furthermore, we look for diagnostic amino acid residues that can be used to distinguish HOXL, NKL and PRD genes. Conclusion: The reported protein features will facilitate the robust classification of homeobox genes from newly sequenced bilaterian genomes. Nevertheless, in non-bilaterian genomes our findings must be cautiously applied. In principle, as long as a good manually curated data set is available the approach here described can be applied to non-bilaterian organisms as well. Our results help focus experimental studies onto investigating the biochemical functions of key homeodomain residues in different gene classes.

FecharLer Abstract

2010

Evolutionary patterns at the RNase based gametophytic self-incompatibility system in two divergent Rosaceae groups (Maloideae and Prunus)

Autores
Vieira, J; Ferreira, PG; Aguiar, B; Fonseca, NA; Vieira, CP;

Publicação
BMC EVOLUTIONARY BIOLOGY

Abstract
Background: Within Rosaceae, the RNase based gametophytic self-incompatibility (GSI) system has been studied at the molecular level in Maloideae and Prunus species that have been diverging for, at least, 32 million years. In order to understand RNase based GSI evolution within this family, comparative studies must be performed, using similar methodologies. Result: It is here shown that many features are shared between the two species groups such as levels of recombination at the S-RNase ( the S-pistil component) gene, and the rate at which new specificities arise. Nevertheless, important differences are found regarding the number of ancestral lineages and the degree of specificity sharing between closely related species. In Maloideae, about 17% of the amino acid positions at the S-RNase protein are found to be positively selected, and they occupy about 30% of the exposed protein surface. Positively selected amino acid sites are shown to be located on either side of the active site cleft, an observation that is compatible with current models of specificity determination. At positively selected amino acid sites, non-conservative changes are almost as frequent as conservative changes. There is no evidence that at these sites the most drastic amino acid changes may be more strongly selected. Conclusions: Many similarities are found between the GSI system of Prunus and Maloideae that are compatible with the single origin hypothesis for RNase based GSI. The presence of common features such as the location of positively selected amino acid sites and lysine residues that may be important for ubiquitylation, raise a number of issues that, in principle, can be experimentally addressed in Maloideae. Nevertheless, there are also many important differences between the two Rosaceae GSI systems. How such features changed during evolution remains a puzzling issue.

FecharLer Abstract

2005

On predicting protein secondary structure from their aminoacid sequences using Inductive Logic Programming

Autores
Magalhaes, A; Fonseca, NA;

Publicação
2005 PORTUGUESE CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We address the problem of predicting the stability of secondary structure motifs of proteins given their linear sequence of residues. Our study is restricted to the prediction of helix structures. We have applied an Inductive Logic Programming (ILP) system to automatically synthesise the predictive rules. ILP systems are well known for being able to induce comprehensible models for data. Furthermore, the models components are definitions provided by a domain expert which makes the model more likely to be helpful in the understanding of the underlying process that produced the data. Our methodology has two stages. First, the system induces a model (set of rules) using just structural information and groupings of the residues to avoid biases by the domain expert. In the second stage, the residues properties are used to make the induced rules Chemically/Biologically appealing. We claim that this methodology is also valuable for general Structure-Activity Relationship (SAR) problems.

FecharLer Abstract

2011

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Autores
Earl, D; Bradnam, K; St John, J; Darling, A; Lin, DW; Fass, J; Hung, OKY; Buffalo, V; Zerbino, DR; Diekhans, M; Nguyen, N; Ariyaratne, PN; Sung, WK; Ning, ZM; Haimel, M; Simpson, JT; Fonseca, NA; Birol, I; Docking, TR; Ho, IY; Rokhsar, DS; Chikhi, R; Lavenier, D; Chapuis, G; Naquin, D; Maillet, N; Schatz, MC; Kelley, DR; Phillippy, AM; Koren, S; Yang, SP; Wu, W; Chou, WC; Srivastava, A; Shaw, TI; Ruby, JG; Skewes Cox, P; Betegon, M; Dimon, MT; Solovyev, V; Seledtsov, I; Kosarev, P; Vorobyev, D; Ramirez Gonzalez, R; Leggett, R; MacLean, D; Xia, FF; Luo, RB; Li, ZY; Xie, YL; Liu, BH; Gnerre, S; MacCallum, I; Przybylski, D; Ribeiro, FJ; Yin, SY; Sharpe, T; Hall, G; Kersey, PJ; Durbin, R; Jackman, SD; Chapman, JA; Huang, XQ; DeRisi, JL; Caccamo, M; Li, YR; Jaffe, DB; Green, RE; Haussler, D; Korf, I; Paten, B;

Publicação
GENOME RESEARCH

Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: ( 1) It is possible to assemble the genome to a high level of coverage and accuracy, and that ( 2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

FecharLer Abstract

2011

Amino acid pair- and triplet-wise groupings in the interior of alpha-helical segments in proteins

Autores
de Sousa, MM; Munteanu, CR; Pazos, A; Fonseca, NA; Camacho, R; Magalhaes, AL;

Publicação
JOURNAL OF THEORETICAL BIOLOGY

Abstract
A statistical approach has been applied to analyse primary structure patterns at inner positions of alpha-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse alpha-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the alpha-helix were considered as inner. Amino acid pairings i, i+k(k = 1, 2, 3,4, 5), were analysed and the corresponding 20 x 20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for alpha-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.

FecharLer Abstract