Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About
Download Photo HD

About

I received my PhD degree in Computer Science from the University of Porto, Portugal in 2011.
Currently, I'm an assistant professor at the Department of Computer Science of the Faculty of Sciences of the University of Porto and member of LIAAD-INESC TEC, the Artificial Intelligence and Decision Support Lab of University of Porto.
My main research interests include Data Mining and Machine Learning, in particular outlier detection, novelty detection, utility-based learning and evaluation issues on learning tasks.
As a member of LIAAD-INESC TEC, I have been involved in several research projects concerning environmental applications, fraud detection and fault diagnosis. I have also been member of the program committee for several conferences, serving as reviewer of several journals and involved in the organization of some scientific events.

Interest
Topics
Details

Details

  • Name

    Rita Paula Ribeiro
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st January 2008
002
Publications

2019

The search of conditional outliers

Authors
Portel, E; Ribeire, RP; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
There is no standard definition of outliers, but most authors agree that outliers are points far from other data points. Several outlier detection techniques have been developed mainly for two different purposes. On one hand, outliers are considered error measurement observations that should be removed from the analysis, e.g. robust statistics. On the other hand, outliers are the interesting observations, like in fraud detection, and should be modelled by some learning method. In this work, we start from the observation that outliers are affected by the so-called simpson paradox: a trend that appears in different groups of data but disappears or reverses when these groups are combined. Given a data set, we learn a regression tree. The tree grows by partitioning the data into groups more and more homogeneous of the target variable. At each partition defined by the tree, we apply a box plot on the target variable to detect outliers. We would expect that the deeper nodes of the tree would contain less and less outliers. We observe that some points previously signalled as outliers are no more signalled as such, but new outliers appear.

2019

Pre-processing approaches for imbalanced distributions in regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
NEUROCOMPUTING

Abstract
Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the target variable is nominal. In the context of regression tasks, where the target variable is continuous, imbalanced distributions of the target variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the target variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

2019

ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers

Authors
Monreale, A; Alzate, C; Kamp, M; Krishnamurthy, Y; Paurat, D; Mouchaweh, MS; Bifet, A; Gama, J; Ribeiro, RP;

Publication
DMLE/IOTSTREAMING@PKDD/ECML

Abstract

2018

Proceedings of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016), Riva del Garda, Italy, September 23, 2016

Authors
Mouchaweh, MS; Bouchachia, H; Gama, J; Ribeiro, RP;

Publication
STREAMEVOLV@ECML-PKDD

Abstract

2018

Resampling with neighbourhood bias on imbalanced domains

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
EXPERT SYSTEMS

Abstract
Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance on the most relevant cases for the user. This problem has been extensively studied for classification problems, where the target variable is nominal. Recently, it was recognized that imbalanced domains occur in several other contexts and for multiple tasks, such as regression tasks, where the target variable is continuous. This paper focuses on imbalanced domains in both classification and regression tasks. Resampling strategies are among the most successful approaches to address imbalanced domains. In this work, we propose variants of existing resampling strategies that are able to take into account the information regarding the neighbourhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies to reinforce some regions of the data sets. With an extensive set of experiments, we provide evidence of the advantage of introducing a neighbourhood bias in the resampling strategies for both classification and regression tasks with imbalanced data sets.

Supervised
thesis

2016

Utility-based Predictive analytics

Author
Paula Alexandra de Oliveira Branco

Institution
UP-FCUP

2016

Aplicação em dispositivos móveis para fidelização de clientes na área da cosmética

Author
Diana Carvalho Pacheco

Institution
UP-FCUP

2016

Aplicação em dispositivos móveis para empresa de Transporte de Mercadorias

Author
Hugo Miguel Figueiredo de Oliveira

Institution
UP-FCUP

2016

Modelo de Séries Temporais Hierárquicas de Previsão de Vendas Aplicado à Indústria do Calçado

Author
Filipa Vidal Veríssimo

Institution
UP-FCUP

2015

Aplicação de prescrição eletrónica de medicamentos para dispositivos móveis

Author
Vítor Hugo Guimarães Alves Gonçalves

Institution
UP-FCUP