Cookies
Usamos cookies para melhorar nosso site e a sua experiência. Ao continuar a navegar no site, você aceita a nossa política de cookies. Ver mais
Fechar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Rita Paula Ribeiro
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    01 janeiro 2008
002
Publicações

2017

Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains

Autores
Branco, Paula; Torgo, Luis; Ribeiro, RitaP.;

Publicação
Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part I

Abstract
The class imbalance problem is a key issue that has received much attention. This attention has been mostly focused on two-classes problems. Fewer solutions exist for the multi-classes imbalance problem. From an evaluation point of view, the class imbalance problem is challenging because a non-uniform importance is assigned to the classes. In this paper, we propose a relevance-based evaluation framework that incorporates user preferences by allowing the assignment of differentiated importance values to each class. The presented solution is able to overcome difficulties detected in existing measures and increases discrimination capability. The proposed framework requires the assignment of a relevance score to the problem classes. To deal with cases where the user is not able to specify each class relevance, we describe three mechanisms to incorporate the existing domain knowledge into the relevance framework. These mechanisms differ in the amount of information available and assumptions made regarding the domain. They also allow the use of our framework in common settings of multi-class imbalanced problems with different levels of information available. © 2017, Springer International Publishing AG.

2016

Sequential anomalies: a study in the Railway Industry

Autores
Ribeiro, RP; Pereira, P; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

2016

A Survey of Predictive Modeling on Im balanced Domains

Autores
Branco, P; Torgo, L; Ribeiro, RP;

Publicação
ACM COMPUTING SURVEYS

Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

Teses
supervisionadas

2016

Utility-based Predictive analytics

Autor
Paula Alexandra de Oliveira Branco

Instituição
UP-FCUP

2016

Aplicação em dispositivos móveis para fidelização de clientes na área da cosmética

Autor
Diana Carvalho Pacheco

Instituição
UP-FCUP

2016

Aplicação em dispositivos móveis para empresa de Transporte de Mercadorias

Autor
Hugo Miguel Figueiredo de Oliveira

Instituição
UP-FCUP

2016

Modelo de Séries Temporais Hierárquicas de Previsão de Vendas Aplicado à Indústria do Calçado

Autor
Filipa Vidal Veríssimo

Instituição
UP-FCUP

2015

Deteção de Indícios de Fraude na Indústria do Retalho

Autor
Ricardo Jorge Alves de Oliveira

Instituição
UP-FEP