Publications

Publications by Paula Brito

2024

Special issue on New methodologies in clustering and classification for complex and/or big data

Authors
Brito, P; Cerioli, A; Garcia-Escudero, LA; Saporta, G;

Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
[No abstract available]

CloseRead Abstract

2015

Clustering of symbolic data

Authors
Brito, P;

Publication
Handbook of Cluster Analysis

Abstract
In this chapter, we present clustering methods for symbolic data. We start by recalling that symbolic data is data presenting inherent variability, and the motivations for the introduction of this new paradigm.We then proceed by defining the different types of variables that allow for the representation of symbolic data, and recall some distance measures appropriate for the new data types. Then we present clustering methods for different types of symbolic data, both hierarchical and nonhierarchical. An application illustrates two well-known methods for clustering symbolic data. © 2016 by Taylor & Francis Group, LLC.

CloseRead Abstract

2024

Anomaly detection-based undersampling for imbalanced classification problems

Authors
Park, YJ; Brito, P; Ma, YC;

Publication
ENGINEERING OPTIMIZATION

Abstract
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.

CloseRead Abstract

2023

Classification and Data Science in the Digital Age

Authors
Brito, P; Dias, JG; Lausen, B; Montanari, A; Nugent, R;

Publication
Studies in Classification, Data Analysis, and Knowledge Organization

Abstract

2023

Preface

Authors
Brito, P; Dias, G; Lausen, B; Montanari, A; Nugent, R;

Publication
Studies in Classification, Data Analysis, and Knowledge Organization

Abstract
[No abstract available]

CloseRead Abstract

2023

Wavelet-based fuzzy clustering of interval time series

Authors
D'Urso, P; De Giovanni, L; Maharaj, EA; Brito, P; Teles, P;

Publication
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

Abstract
We investigate the fuzzy clustering of interval time series using wavelet variances and covariances; in particular, we use a fuzzy c-medoids clustering algorithm. Traditional hierarchical and non-hierarchical clustering methods lead to the identification of mutually exclusive clusters whereas fuzzy clustering methods enable the identification of overlapping clusters, implying that one or more series could belong to more than one cluster simultaneously. An interval time series (ITS) which arises when interval-valued observa-tions are recorded over time is able to capture the variability of values within each interval at each time point. This is in contrast to single-point information available in a classical time series. Our main contribution is that by combining wavelet analysis, interval data analysis and fuzzy clustering, we are able to capture information which would otherwise have not been contemplated by the use of traditional crisp clustering methods on classical time series for which just a single value is recorded at each time point. Through simulation studies, we show that under some circumstances fuzzy c-medoids clustering performs better when applied to ITS than when it is applied to the corresponding traditional time series. Applications to exchange rates ITS and sea-level ITS show that the fuzzy clustering method reveals different and more meaningful results than when applied to associated single-point time series.

CloseRead Abstract