Detalhes
Nome
Rita Paula RibeiroCluster
InformáticaCargo
Investigador SéniorDesde
01 janeiro 2008
Centro
Laboratório de Inteligência Artificial e Apoio à DecisãoContactos
+351220402963
rita.p.ribeiro@inesctec.pt
2017
Autores
Branco, Paula; Torgo, Luis; Ribeiro, RitaP.;
Publicação
Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part I
Abstract
The class imbalance problem is a key issue that has received much attention. This attention has been mostly focused on two-classes problems. Fewer solutions exist for the multi-classes imbalance problem. From an evaluation point of view, the class imbalance problem is challenging because a non-uniform importance is assigned to the classes. In this paper, we propose a relevance-based evaluation framework that incorporates user preferences by allowing the assignment of differentiated importance values to each class. The presented solution is able to overcome difficulties detected in existing measures and increases discrimination capability. The proposed framework requires the assignment of a relevance score to the problem classes. To deal with cases where the user is not able to specify each class relevance, we describe three mechanisms to incorporate the existing domain knowledge into the relevance framework. These mechanisms differ in the amount of information available and assumptions made regarding the domain. They also allow the use of our framework in common settings of multi-class imbalanced problems with different levels of information available. © 2017, Springer International Publishing AG.
2016
Autores
Ribeiro, RP; Pereira, P; Gama, J;
Publicação
MACHINE LEARNING
Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.
2016
Autores
Branco, P; Torgo, L; Ribeiro, RP;
Publicação
ACM COMPUTING SURVEYS
Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.
Teses supervisionadas
2016
Autor
Paula Alexandra de Oliveira Branco
Instituição
UP-FCUP
2016
Autor
Diana Carvalho Pacheco
Instituição
UP-FCUP
2016
Autor
Hugo Miguel Figueiredo de Oliveira
Instituição
UP-FCUP
2016
Autor
Filipa Vidal Veríssimo
Instituição
UP-FCUP
2015
Autor
Ricardo Jorge Alves de Oliveira
Instituição
UP-FEP
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.