Publicacoes - INESC TEC

Publicações

Publicações por HASLab

2007

Foundational certification of data-flow analyses

Autores
Frade, MJ; Saabas, A; Uustalu, T;

Publicação
TASE 2007: First Joint IEEE/IFIP Symposium on Theoretical Aspects of Software Engineering, Proceedings

Abstract
Data-flow analyses, such as live variables analysis, available expressions analysis etc., are usefully specifiable as type systems. These are sound and, in the case of distributive analysis frameworks, complete wrt. appropriate natural semantics on abstract properties. Applications include certification of analyses and "optimization" of functional correctness proofs alongside programs. On the example of live variables analysis, we show that analysis type systems are applied versions of more foundational Hoare logics describing either the same abstract property semantics as the type system (liveness states) or a more concrete natural semantics on transition traces of a suitable kind (future defs and uses). The rules of the type system are derivable in the Hoare logic for the abstract property semantics and those in turn in the Hoare logic for the transition trace semantics. This reduction of the burden of trusting the certification vehicle can be compared to foundational proof-carrying code, where general-purpose program logics are preferred to special-purpose type systems and universal logic to program logics. We also look at conditional liveness analysis to see that the same foundational development is also possible for conditional data-flow analyses proceeding from type systems for combined "standard state and abstract property" semantics.

FecharLer Abstract

2007

Evaluating deterministic motif significance measures in protein databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
ALGORITHMS FOR MOLECULAR BIOLOGY

Abstract
Background: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations. Results: From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs. Conclusion: In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.

FecharLer Abstract

2007

Deterministic motif mining in protein databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
Successes and New Directions in Data Mining

Abstract
Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter, we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A brief description on how sequence motifs can be used to extract structural level information patterns is also provided. © 2008, IGI Global.

FecharLer Abstract

2007

Evaluating protein motif significance measures: A case study on prosite patterns

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
2007 IEEE Symposium on Computational Intelligence and Data Mining, Vols 1 and 2

Abstract
The existence of preserved subsequences in a set of related protein sequences suggests that they might play a structural and functional role in protein's mechanisms. Due to its exploratory approach, the mining process tends to deliver a large number of motifs. Therefore it is critical to release methods that identify relevant significant motifs. Many measures of interest and significance have been proposed. However, since motifs have a wide range or applications, how to choose the appropriate significance measures is application dependent. Some measures show consistent results being highly correlated, while others show disagreements. In this paper we review existent measures and study their behavior in order to assist the selection of the most appropriate set of measures. An experimental evaluation of the measures for high quality patterns from the Prosite database is presented.

FecharLer Abstract

2007

A closer look on protein unfolding Simulations through hierarchical clustering

Autores
Ferreira, PG; Silva, CG; Brito, RMM; Azevedo, PJ;

Publicação
2007 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology

Abstract
Understanding protein folding and unfolding mechanisms are a central problem in molecular biology. Data obtained from molecular dynamics unfolding simulations may provide valuable insights for a better understanding of these mechanisms. Here, we propose the application of an augmented version of hierarchical clustering analysis to detect clusters of amino-acid residues with similar behavior in protein unfolding simulations. These clusters hold similar global pattern behavior of solvent accessible surface area (SASA) variation in unfolding simulations of the protein Transthyretin (TTR). Classical hierarchical clustering was applied to build a dendrogram based on the SASA variation of each amino-acid residue. The dendrogram was enriched with background information on the amino-acid residues, enabling the extraction of sub-clusters with well differentiated characteristics.

FecharLer Abstract

2007

Iterative reordering of rules for building ensembles without relearning

Autores
Azevedo, PJ; Jorge, AM;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
We study a new method for improving the classification accuracy of a model composed of classification association rules (CAR). The method consists in reordering the original set of rules according to the error rates obtained on a set of training examples. This is done iteratively, starting from the original set of rules. After obtaining N models these are used as an ensemble for classifying new cases. The net effect of this approach is that the original rule model is clearly improved. This improvement is due to the ensembling of the obtained models, which are, individually, slightly better than the original one. This ensembling approach has the advantage of running a single learning process, since the models in the ensemble are obtained by self replicating the original one.

FecharLer Abstract