2009
Autores
Ferreira, PG; Silva, CG; Azevedo, PJ; Brito, RMM;
Publicação
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS
Abstract
Molecular dynamics simulations is a valuable tool to study protein unfolding in silico. Analyzing the relative spatial position of the residues during the simulation may indicate which residues are essential in determining the protein structure. We present a method, inspired by a popular data mining technique called Frequent Itemset Mining, that clusters sets of amino acid residues with a synchronized trajectory during the unfolding process. The proposed approach has several advantages over traditional hierarchical clustering. © 2009 Springer Berlin Heidelberg.
2009
Autores
Ferreira, PG; Azevedo, PJ;
Publicação
Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques
Abstract
The recent increase in the number of complete genetic sequences freely available through specialized Internet databases presents big challenges for the research community. One such challenge is the efficient and effective search of sequence patterns, also known as motifs, among a set of related genetic sequences. Such patterns describe regions that may provide important insights about the structural and functional role of DNA and proteins. Two main classes can be considered: probabilistic patterns represent a model that simulates the sequences or part of the sequences under consideration and deterministic patterns that either match or not the input sequences. In this chapter a general overview of deterministic sequence mining over sets of genetic sequences is proposed. The authors formulate an architecture that divides the mining process workflow into a set of blocks. Each of these blocks is discussed individually. © 2010, IGI Global.
2009
Autores
Silva, CG; Ferreira, PG; Azevedo, PJ; Brito, RMM;
Publicação
Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions
Abstract
The protein folding problem, i.e. the identification of the rules that determine the acquisition of the native, functional, three-dimensional structure of a protein from its linear sequence of amino-acids, still is a major challenge in structural molecular biology. Moreover, the identification of a series of neurodegenerative diseases as protein unfolding/misfolding disorders highlights the importance of a detailed characterisation of the molecular events driving the unfolding and misfolding processes in proteins. One way of exploring these processes is through the use of molecular dynamics simulations. The analysis and comparison of the enormous amount of data generated by multiple protein folding or unfolding simulations is not a trivial task, presenting many interesting challenges to the data mining community. Considering the central role of the hydrophobic effect in protein folding, we show here the application of two data mining methods - hierarchical clustering and association rules - for the analysis and comparison of the solvent accessible surface area (SASA) variation profiles of each one of the 127 amino-acid residues in the amyloidogenic protein Transthyretin, across multiple molecular dynamics protein unfolding simulations. © 2010, IGI Global.
2009
Autores
Ferreira, PG; Azevedo, PJ;
Publicação
Database Technologies: Concepts, Methodologies, Tools, and Applications (4 Volumes)
Abstract
2009
Autores
Silva, I; Silva, ME;
Publicação
REVSTAT-STATISTICAL JOURNAL
Abstract
The high-order statistics (moments and cumulants of order higher than two) have been widely applied in several fields, specially in problems where it is conjectured a lack of Gaussianity and/or non-linearity. Since the INteger-valued AutoRegressive, INAR, processes are non-Gaussian, the high-order statistics can provide additional information that allows a better characterization of these processes. Thus, an estimation method for the parameters of an INAR process, based on Least Squares for the third-order moments is proposed. The results of a Monte Carlo study to investigate the performance of the estimator are presented and the method is applied to a set of real data.
2009
Autores
Silva, N; Pereira, I; Silva, ME;
Publicação
REVSTAT-STATISTICAL JOURNAL
Abstract
In this work we consider the problem of forecasting integer-valued time series, modelled by the INAR(1) process introduced by McKenzie (1985) and Al-Osh and Alzaid (1987). The theoretical properties and practical applications of INAR and related processes have been discussed extensively in the literature but there is still some discussion on the problem of producing coherent, i.e. integer-valued, predictions. Here Bayesian methodology is used to obtain point predictions as well as confidence intervals for future values of the process. The predictions thus obtained are compared with their classic counterparts. The proposed approaches are illustrated with a simulation study and a real example.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.