Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

I am Associate Professor at the School of Economics of the University of Porto, where  I teach Statistics and Multivariate Data Analysis, at undergraduate and post-graduate (Master, PhD) levels, and member of the Artificial Intelligence and Decision Support Lab (LIAAD) of INESC-TEC. I hold a doctorate degree in Applied Mathematics from the University of Paris Dauphine (1991).

My current research focuses on the analysis of multidimensional complex data, known as symbolic data - data representing inherent variability, in the form of intervals or distributions - for which I develop statistical approaches and multivariate analysis methodologies.  I am generally interested in multivariate data analysis, with particular incidence in clustering methods.

Interest
Topics
Details

Details

  • Name

    Paula Brito
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st January 2008
001
Publications

2019

Clustering genomic words in human DNA using peaks and trends of distributions

Authors
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Brito, P; Afreixo, V;

Publication
Advances in Data Analysis and Classification

Abstract
In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the ‘trend’), and a sparse vector of detrended data captures the peak structure. A simulation study demonstrates the effectiveness of the clustering procedure in grouping distributions with similar peak behavior and/or baseline features. The procedure is applied to investigate similarities between the distribution patterns of genomic words of lengths 3 and 5 in the human genome. These experiments demonstrate the potential of the new method for identifying words with similar distance patterns. © 2019, The Author(s).

2018

Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies

Authors
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Silva, RM; Bastos, CAC; Pinho, A; Brito, P; Afreixo, V;

Publication
Interdisciplinary Sciences: Computational Life Sciences

Abstract

2018

Outlier detection in interval data

Authors
Duarte Silva, APD; Filzmoser, P; Brito, P;

Publication
Advances in Data Analysis and Classification

Abstract
A multivariate outlier detection method for interval data is proposed that makes use of a parametric approach to model the interval data. The trimmed maximum likelihood principle is adapted in order to robustly estimate the model parameters. A simulation study demonstrates the usefulness of the robust estimates for outlier detection, and new diagnostic plots allow gaining deeper insight into the structure of real world interval data. © 2017 Springer-Verlag GmbH Germany, part of Springer Nature

2017

Off the beaten track: A new linear model for interval data

Authors
Dias, S; Brito, P;

Publication
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH

Abstract
We propose a new linear regression model for interval-valued variables. The model uses quantile functions to represent the intervals, thereby considering the distributions within them. In this paper we study the special case where the Uniform distribution is assumed in each observed interval, and we analyze the extension to the Symmetric Triangular distribution. The parameters of the model are obtained solving a constrained quadratic optimization problem that uses the Mallows distance between quantile functions. As in the classical case, a goodness-of-fit measure is deduced. Two applications on up-to-date fields are presented: one predicting duration of unemployment and the other allowing forecasting burned area by forest fires.

2017

Exploratory data analysis for interval compositional data

Authors
Hron, K; Brito, P; Filzmoser, P;

Publication
Adv. Data Analysis and Classification

Abstract
Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on (Formula presented.) representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated. © 2016 Springer-Verlag Berlin Heidelberg

Supervised
thesis

2017

Analysis of inter genomic word distance distributions

Author
Ana Helena Marques de Pinho Tavares

Institution
UA

2017

Análise de Desempenho de Lojas de Retalho (Supermercados e Hipermercados)

Author
Daniel Jorge Moreira Magalhães

Institution
UP-FEP

2017

Análise Fatorial de dados de tipo Intervalar e Histograma

Author
Paula Maria das Dores Cheira

Institution
UP-FCUP

2017

Análise Classificatória de Dados Distribucionais: Abordagem Simbólica e Composicional

Author
Maria do Rosário Guimarães de Almeida Moreira

Institution
UP-FEP

2017

Gender wage discrimination across Portuguese territory and its local determinants

Author
Natália Daniela Vieira da Costa e Silva

Institution
UP-FEP