Cookies
Usamos cookies para melhorar nosso site e a sua experiência. Ao continuar a navegar no site, você aceita a nossa política de cookies. Ver mais
Fechar
  • Menu
Sobre

Sobre

Sou Professora Associada na Faculdade de Economia da Universidade do Porto, onde ensino Estatística e Análise Multivariada de Dados a nível de licenciatura, mestrado e doutoramento, e membro do Laboratório em Inteligência Artificial e Sistemas de Apoio à Decisão (LIAAD) do INESC-TEC. Tenho um doutoramento em Matemática Aplicada da Universidade Paris Dauphine (1991).

A minha investigação actual centra-se na análise de dados multidimensionais complexos, usualmente designados por dados simbólicos - dados representado variabilidade inerente aos registos, sob a forma de intervalos ou distribuições - para os quais desenvolvo abordagens estatísticas e metodologias de análise multivariada.  De uma forma geral, interesso-me por análise multivariada de dados, com foco na análise classificatória.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Paula Brito
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    01 janeiro 2008
001
Publicações

2019

Clustering genomic words in human DNA using peaks and trends of distributions

Autores
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Brito, P; Afreixo, V;

Publicação
Advances in Data Analysis and Classification

Abstract
In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the ‘trend’), and a sparse vector of detrended data captures the peak structure. A simulation study demonstrates the effectiveness of the clustering procedure in grouping distributions with similar peak behavior and/or baseline features. The procedure is applied to investigate similarities between the distribution patterns of genomic words of lengths 3 and 5 in the human genome. These experiments demonstrate the potential of the new method for identifying words with similar distance patterns. © 2019, The Author(s).

2019

Clustering of interval time series

Autores
Maharaj, EA; Teles, P; Brito, P;

Publicação
Statistics and Computing

Abstract
Interval time series occur when real intervals of some variable of interest are registered as an ordered sequence along time. We address the problem of clustering interval time series (ITS), for which different approaches are proposed. First, clustering is performed based on point-to-point comparisons. Time-domain and wavelet features also serve as clustering variables in alternative approaches. Furthermore, autocorrelation matrix functions, gathering the autocorrelation and cross-correlation functions of the ITS upper and lower bounds, may be compared using adequate distances (e.g. the Frobenius distance) and used for clustering ITS. An improved procedure to determine the autocorrelation function of ITS is proposed, which also serves as a basis for clustering. The different alternative approaches are explored and their performances compared for ITS simulated under different setups. An application to sea level daily ranges, observed at different locations in Australia, illustrates the proposed methods. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.

2018

Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies

Autores
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Silva, RM; Bastos, CAC; Pinho, A; Brito, P; Afreixo, V;

Publicação
Interdisciplinary Sciences: Computational Life Sciences

Abstract

2018

Outlier detection in interval data

Autores
Duarte Silva, APD; Filzmoser, P; Brito, P;

Publicação
Advances in Data Analysis and Classification

Abstract
A multivariate outlier detection method for interval data is proposed that makes use of a parametric approach to model the interval data. The trimmed maximum likelihood principle is adapted in order to robustly estimate the model parameters. A simulation study demonstrates the usefulness of the robust estimates for outlier detection, and new diagnostic plots allow gaining deeper insight into the structure of real world interval data. © 2017 Springer-Verlag GmbH Germany, part of Springer Nature

2017

Off the beaten track: A new linear model for interval data

Autores
Dias, S; Brito, P;

Publicação
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH

Abstract
We propose a new linear regression model for interval-valued variables. The model uses quantile functions to represent the intervals, thereby considering the distributions within them. In this paper we study the special case where the Uniform distribution is assumed in each observed interval, and we analyze the extension to the Symmetric Triangular distribution. The parameters of the model are obtained solving a constrained quadratic optimization problem that uses the Mallows distance between quantile functions. As in the classical case, a goodness-of-fit measure is deduced. Two applications on up-to-date fields are presented: one predicting duration of unemployment and the other allowing forecasting burned area by forest fires.

Teses
supervisionadas

2017

Modelos de Regressão Linear para Variáveis Intervalares Uma extensão do modelo ID

Autor
Pedro Jorge Correia Malaquias

Instituição
UP-FEP

2017

Meta-aprendizagem no problema de seleção de algoritmo de Análise Classificatória

Autor
Vânia Patrícia Pereira Serra

Instituição
UP-FEP

2017

Analysis of inter genomic word distance distributions

Autor
Ana Helena Marques de Pinho Tavares

Instituição
UA

2017

Análise de Desempenho de Lojas de Retalho (Supermercados e Hipermercados)

Autor
Daniel Jorge Moreira Magalhães

Instituição
UP-FEP

2017

Análise Fatorial de dados de tipo Intervalar e Histograma

Autor
Paula Maria das Dores Cheira

Instituição
UP-FCUP