Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2020

IoT data stream analytics

Authors
Bifet, A; Gama, J;

Publication
ANNALS OF TELECOMMUNICATIONS

Abstract

2020

Self Hyper-parameter Tuning for Stream Classification Algorithms

Authors
Veloso, B; Gama, J;

Publication
IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning - Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers

Abstract
The new 5G mobile communication system era brings a new set of communication devices that will appear on the market. These devices will generate data streams that require proper handling by machine algorithms. The processing of these data streams requires the design, development, and adaptation of appropriate machine learning algorithms. While stream processing algorithms include hyper-parameters for performance refinement, their tuning process is time-consuming and typically requires an expert to do the task. In this paper, we present an extension of the Self Parameter Tuning (SPT) optimization algorithm for data streams. We apply the Nelder-Mead algorithm to dynamically sized samples that converge to optimal settings in a double pass over data (during the exploration phase), using a relatively small number of data points. Additionally, the SPT automatically readjusts hyper-parameters when concept drift occurs. We did a set of experiments with well-known classification data sets and the results show that the proposed algorithm can outperform the results of previous hyper-parameter tuning efforts by human experts. The statistical results show that this extension is faster in terms of convergence and presents at least similar accuracy results when compared with the standard optimization techniques. © 2020, Springer Nature Switzerland AG.

2020

Objective Graphical Clustering of Spatiotemporal Gait Pattern in Patients with Parkinsonism

Authors
Ferreira, F; Gago, M; Mollaei, N; Bicho, E; Sousa, N; Gama, J; Ferreira, C;

Publication
INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS ICNAAM 2019

Abstract
The goal of this study was grouping patients with parkinsonism that share similar gait characteristics based on principal component analysis (PCA). Spatiotemporal gait data during self-selected walking were obtained from 15 patients with Vascular Parkinsonism, 15 patients with Idiopathic Parkinson's Disease and 15 Controls. PCA was used to reduce the dimensionality of 12 gait characteristics for the 45 subjects. Fuzzy C-mean cluster analysis was performed plotting the first two principal components, which accounted for 84.1% of the total variability. Results indicates that it is possible to quantitatively differentiate different gait types in patients with parkinsonism using PCA. Objective graphical classification of gait patterns could assist in clinical evaluation as well as aid treatment planning.

2020

Trustability in Algorithmic Systems Based on Artificial Intelligence in the Public and Private Sectors

Authors
Teixeira, S; Gama, J; Amorim, P; Figueira, G;

Publication
ERCIM NEWS

Abstract
Algorithmic systems based on artificial intelligence (AI) increasingly play a role in decision-making processes, both in government and industry. These systems are used in areas such as retail, finances, and manufacturing. In the latter domain, the main priority is that the solutions are interpretable, as this characteristic correlates to the adoption rate of users (e.g., schedulers). However, more recently, these systems have been applied in areas of public interest, such as education, health, public administration, and criminal justice. The adoption of these systems in this domain, in particular the data-driven decision models, has raised questions about the risks associated with this technology, from which ethical problems may emerge. We analyse two important characteristics, interpretability and trustability, of AI-based systems in the industrial and public domains, respectively.

2020

AutoML for Stream k-Nearest Neighbors Classification

Authors
Bahri, M; Veloso, B; Bifet, A; Gama, J;

Publication
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

Abstract
The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, the accuracy of these algorithms is very sensitive to their hyper-parameters, which requires expertise and extensive trials to tune. Another relevant aspect is the high-dimensionality of data, which can causes degradation to computational performance. To cope with these issues, this paper proposes a stream k-nearest neighbors (kNN) algorithm that applies an internal dimension reduction to the stream in order to reduce the resource usage and uses an automatic monitoring system that tunes dynamically the configuration of the kNN algorithm and the output dimension size with big data streams. Experiments over a wide range of datasets show that the predictive and computational performances of the kNN algorithm are improved.

2020

Using Network Features for Credit Scoring in MicroFinance: Extended Abstract

Authors
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020)

Abstract
This paper uses non-traditional data, from a MicroFinance Institution (MFI), in a Credit Scoring loan classification problem and addresses a common problem in emerging markets of the lack of a verifiable customers' credit history. We perform a set of experiments to define a baseline model and prove the relevance of node embedding features, in credit scoring models, using a real world dataset. © 2020 IEEE.

  • 129
  • 469