Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2009

Deterministic pattern mining on genetic sequences

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques

Abstract
The recent increase in the number of complete genetic sequences freely available through specialized Internet databases presents big challenges for the research community. One such challenge is the efficient and effective search of sequence patterns, also known as motifs, among a set of related genetic sequences. Such patterns describe regions that may provide important insights about the structural and functional role of DNA and proteins. Two main classes can be considered: probabilistic patterns represent a model that simulates the sequences or part of the sequences under consideration and deterministic patterns that either match or not the input sequences. In this chapter a general overview of deterministic sequence mining over sets of genetic sequences is proposed. The authors formulate an architecture that divides the mining process workflow into a set of blocks. Each of these blocks is discussed individually. © 2010, IGI Global.

2009

Using data mining techniques to probe the role of hydrophobic residues in protein folding and unfolding simulations

Autores
Silva, CG; Ferreira, PG; Azevedo, PJ; Brito, RMM;

Publicação
Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions

Abstract
The protein folding problem, i.e. the identification of the rules that determine the acquisition of the native, functional, three-dimensional structure of a protein from its linear sequence of amino-acids, still is a major challenge in structural molecular biology. Moreover, the identification of a series of neurodegenerative diseases as protein unfolding/misfolding disorders highlights the importance of a detailed characterisation of the molecular events driving the unfolding and misfolding processes in proteins. One way of exploring these processes is through the use of molecular dynamics simulations. The analysis and comparison of the enormous amount of data generated by multiple protein folding or unfolding simulations is not a trivial task, presenting many interesting challenges to the data mining community. Considering the central role of the hydrophobic effect in protein folding, we show here the application of two data mining methods - hierarchical clustering and association rules - for the analysis and comparison of the solvent accessible surface area (SASA) variation profiles of each one of the 127 amino-acid residues in the amyloidogenic protein Transthyretin, across multiple molecular dynamics protein unfolding simulations. © 2010, IGI Global.

2009

Deterministic Motif Mining in Protein Databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
Database Technologies: Concepts, Methodologies, Tools, and Applications (4 Volumes)

Abstract

2009

PARAMETER ESTIMATION FOR INAR PROCESSES BASED ON HIGH-ORDER STATISTICS

Autores
Silva, I; Silva, ME;

Publicação
REVSTAT-STATISTICAL JOURNAL

Abstract
The high-order statistics (moments and cumulants of order higher than two) have been widely applied in several fields, specially in problems where it is conjectured a lack of Gaussianity and/or non-linearity. Since the INteger-valued AutoRegressive, INAR, processes are non-Gaussian, the high-order statistics can provide additional information that allows a better characterization of these processes. Thus, an estimation method for the parameters of an INAR process, based on Least Squares for the third-order moments is proposed. The results of a Monte Carlo study to investigate the performance of the estimator are presented and the method is applied to a set of real data.

2009

FORECASTING IN INAR(1) MODEL

Autores
Silva, N; Pereira, I; Silva, ME;

Publicação
REVSTAT-STATISTICAL JOURNAL

Abstract
In this work we consider the problem of forecasting integer-valued time series, modelled by the INAR(1) process introduced by McKenzie (1985) and Al-Osh and Alzaid (1987). The theoretical properties and practical applications of INAR and related processes have been discussed extensively in the literature but there is still some discussion on the problem of producing coherent, i.e. integer-valued, predictions. Here Bayesian methodology is used to obtain point predictions as well as confidence intervals for future values of the process. The predictions thus obtained are compared with their classic counterparts. The proposed approaches are illustrated with a simulation study and a real example.

2009

Deterministic versus stochastic trends: Detection and challenges

Autores
Fatichi, S; Barbosa, SM; Caporali, E; Silva, ME;

Publicação
JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES

Abstract
The detection of a trend in a time series and the evaluation of its magnitude and statistical significance is an important task in geophysical research. This importance is amplified in climate change contexts, since trends are often used to characterize long-term climate variability and to quantify the magnitude and the statistical significance of changes in climate time series, both at global and local scales. Recent studies have demonstrated that the stochastic behavior of a time series can change the statistical significance of a trend, especially if the time series exhibits long-range dependence. The present study examines the trends in time series of daily average temperature recorded in 26 stations in the Tuscany region (Italy). In this study a new framework for trend detection is proposed. First two parametric statistical tests, the Phillips-Perron test and the Kwiatkowski-Phillips-Schmidt-Shin test, are applied in order to test for trend stationary and difference stationary behavior in the temperature time series. Then long-range dependence is assessed using different approaches, including wavelet analysis, heuristic methods and by fitting fractionally integrated autoregressive moving average models. The trend detection results are further compared with the results obtained using nonparametric trend detection methods: Mann-Kendall, Cox-Stuart and Spearman's rho tests. This study confirms an increase in uncertainty when pronounced stochastic behaviors are present in the data. Nevertheless, for approximately one third of the analyzed records, the stochastic behavior itself cannot explain the long-term features of the time series, and a deterministic positive trend is the most likely explanation.

  • 447
  • 516