Cookies
Usamos cookies para melhorar nosso site e a sua experiência. Ao continuar a navegar no site, você aceita a nossa política de cookies. Ver mais
Fechar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Rita Paula Ribeiro
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    01 janeiro 2008
002
Publicações

2019

The search of conditional outliers

Autores
Portel, E; Ribeire, RP; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
There is no standard definition of outliers, but most authors agree that outliers are points far from other data points. Several outlier detection techniques have been developed mainly for two different purposes. On one hand, outliers are considered error measurement observations that should be removed from the analysis, e.g. robust statistics. On the other hand, outliers are the interesting observations, like in fraud detection, and should be modelled by some learning method. In this work, we start from the observation that outliers are affected by the so-called simpson paradox: a trend that appears in different groups of data but disappears or reverses when these groups are combined. Given a data set, we learn a regression tree. The tree grows by partitioning the data into groups more and more homogeneous of the target variable. At each partition defined by the tree, we apply a box plot on the target variable to detect outliers. We would expect that the deeper nodes of the tree would contain less and less outliers. We observe that some points previously signalled as outliers are no more signalled as such, but new outliers appear.

2019

The search of conditional outliers

Autores
Portela, E; Ribeiro, RP; Gama, J;

Publicação
Intell. Data Anal.

Abstract

2019

The search of conditional outliers

Autores
Portela, E; Ribeiro, RP; Gama, J;

Publicação
Intelligent Data Analysis

Abstract

2019

The search of conditional outliers

Autores
Portela, E; Ribeiro, RP; Gama, J;

Publicação
Intelligent Data Analysis

Abstract
There is no standard definition of outliers, but most authors agree that outliers are points far from other data points. Several outlier detection techniques have been developed mainly for two different purposes. On one hand, outliers are considered error measurement observations that should be removed from the analysis, e.g. robust statistics. On the other hand, outliers are the interesting observations, like in fraud detection, and should be modelled by some learning method. In this work, we start from the observation that outliers are affected by the so-called simpson paradox: a trend that appears in different groups of data but disappears or reverses when these groups are combined. Given a data set, we learn a regression tree. The tree grows by partitioning the data into groups more and more homogeneous of the target variable. At each partition defined by the tree, we apply a box plot on the target variable to detect outliers. We would expect that the deeper nodes of the tree would contain less and less outliers. We observe that some points previously signalled as outliers are no more signalled as such, but new outliers appear.

2019

Pre-processing approaches for imbalanced distributions in regression

Autores
Branco, P; Torgo, L; Ribeiro, RP;

Publicação
NEUROCOMPUTING

Abstract
Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the target variable is nominal. In the context of regression tasks, where the target variable is continuous, imbalanced distributions of the target variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the target variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

Teses
supervisionadas

2016

Utility-based Predictive analytics

Autor
Paula Alexandra de Oliveira Branco

Instituição
UP-FCUP

2016

Aplicação em dispositivos móveis para fidelização de clientes na área da cosmética

Autor
Diana Carvalho Pacheco

Instituição
UP-FCUP

2016

Aplicação em dispositivos móveis para empresa de Transporte de Mercadorias

Autor
Hugo Miguel Figueiredo de Oliveira

Instituição
UP-FCUP

2016

Modelo de Séries Temporais Hierárquicas de Previsão de Vendas Aplicado à Indústria do Calçado

Autor
Filipa Vidal Veríssimo

Instituição
UP-FCUP

2015

Deteção de Indícios de Fraude na Indústria do Retalho

Autor
Ricardo Jorge Alves de Oliveira

Instituição
UP-FEP