Publications

Publications by LIAAD

2019

Multiscale information storage of linear long-range correlated stochastic processes

Authors
Faes, L; Pereira, MA; Silva, ME; Pernice, R; Busacca, A; Javorka, M; Rocha, AP;

Publication
PHYSICAL REVIEW E

Abstract
Information storage, reflecting the capability of a dynamical system to keep predictable information during its evolution over time, is a key element of intrinsic distributed computation, useful for the description of the dynamical complexity of several physical and biological processes. Here we introduce a parametric approach which allows one to compute information storage across multiple timescales in stochastic processes displaying both short-term dynamics and long-range correlations (LRC). Our analysis is performed in the popular framework of multiscale entropy, whereby a time series is first "coarse grained" at the chosen timescale through low-pass filtering and downsampling, and then its complexity is evaluated in terms of conditional entropy. Within this framework, our approach makes use of linear fractionally integrated autoregressive (ARFI) models to derive analytical expressions for the information storage computed at multiple timescales. Specifically, we exploit state space models to provide the representation of lowpass filtered and downsampled ARFI processes, from which information storage is computed at any given timescale relating the process variance to the prediction error variance. This enhances the practical usability of multiscale information storage, as it enables a computationally reliable quantification of a complexity measure which incorporates the effects of LRC together with that of short-term dynamics. The proposed measure is first assessed in simulated ARFI processes reproducing different types of autoregressive dynamics and different degrees of LRC, studying both the theoretical values and the finite sample performance. We find that LRC alter substantially the complexity of ARFI processes even at short timescales, and that reliable estimation of complexity can be achieved at longer timescales only when LRC are properly modeled. Then, we assess multiscale information storage in physiological time series measured in humans during resting state and postural stress, revealing unprecedented responses to stress of the complexity of heart period and systolic arterial pressure variability, which are related to the different role played by LRC in the two conditions.

CloseRead Abstract

2019

NStackSenti: Evaluation of a Multi-level Approach for Detecting the Sentiment of Users

Authors
Sohan, MF; Rahman, SSMM; Munna, MTA; Allayear, SM; Rahman, MH; Rahman, MM;

Publication
Communications in Computer and Information Science - Next Generation Computing Technologies on Computational Intelligence

Abstract

2019

Prediction Model for Prevalence of Type-2 Diabetes Mellitus Complications Using Machine Learning Approach

Authors
Younus, M; Munna, MTA; Alam, MM; Allayear, SM; Ara, SJF;

Publication
Studies in Big Data - Data Management and Analysis

Abstract

2019

Prediction model for prevalence of type-2 diabetes complications with ann approach combining with K-fold cross validation and K-means clustering

Authors
Munna M.T.A.; Alam M.M.; Allayear S.M.; Sarker K.; Ara S.J.F.;

Publication
Advances in Intelligent Systems and Computing

Abstract
In today’s era, most of the people are suffering with chronic diseases because of their lifestyle, food habits and reduction in physical activities. Diabetes is one of the most common chronic diseases which has affected to the people of all ages. Diabetes complication arises in human body due to increase of blood glucose (sugar) level than the normal level. Type-2 diabetes is considered as one of the most prevalent endocrine disorders. In this circumstance, we have tried to apply Machine learning algorithm to create the statistical prediction based model that people having diabetes can be aware of their prevalence. The aim of this paper is to detect the prevalence of diabetes relevant complications among patients with Type-2 diabetes mellitus. The processing and statistical analysis we used are Scikit-Learn, and Pandas for Python. We also have used unsupervised Machine Learning approaches known as Artificial Neural Network (ANN) and K-means Clustering for developing classification system based prediction model to judge Type-2 diabetes mellitus chronic diseases.

CloseRead Abstract

2019

An Iterative Oversampling Approach for Ordinal Classification

Authors
Marques, F; Duarte, H; Santos, J; Domingues, I; Amorim, JP; Abreu, PH;

Publication
SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING

Abstract
The machine learning field has grown considerably in the last years. There are, however, some problems still to be solved. The characteristics of the training sets, for instance, are known to affect the classifiers performance. Here, and inspired by medical applications, we are interested in studying datasets that are both ordinal and imbalanced. Ordinal datasets present labels where only the relative ordering between different values is significant. Imbalanced datasets have very different quantity of examples per class. Building upon our previous work, we make three new contributions, (1) extend the number of classifiers, (2) evaluate two techniques to balance intermediate train sets in binary decomposition methods (often used in multi-class contexts and ordinal ones in particular), and (3) propose a new, iterative, classifier-based oversampling algorithm that we name InCuBAtE. Experiments were made on 6 private datasets, concerning the assessment of response to treatment on oncologic diseases, and 15 public datasets widely used in the literature. When compared with our previous work, results have improved (or remained the same) for 4 of the 6 private datasets and for 11 out of the 15 public datasets.

CloseRead Abstract

2019

Generating Synthetic Missing Data: A Review by Missing Mechanism

Authors
Santos, MS; Pereira, RC; Costa, AF; Soares, JP; Santos, J; Abreu, PH;

Publication
IEEE ACCESS

Abstract
The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process defines the basis for the imputation experiments (configuration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-defined setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field.

CloseRead Abstract