2020
Autores
Ribeiro, RP; Moniz, N;
Publicação
MACHINE LEARNING
Abstract
Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we proposeSERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates howSERAprovides valid and useful insights into the performance of models in imbalanced regression tasks.
2020
Autores
Barros, M; Veloso, B; Pereira, PM; Ribeiro, RP; Gama, J;
Publicação
IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning - Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers
Abstract
The transformation of industrial manufacturing with computers and automation with smart systems leads us to monitor and log of industrial equipment events. It is possible to apply analytic approaches, and to find interpretive results for strategic decision making, providing advantages such as failure detection and predictive maintenance. Over the last years, many researchers have been studying the application of machine learning techniques to improve such tasks. In this context, we develop a system capable of detect anomalies on an Air Production Unit (APU), taking into consideration the peak frequency of each sensor. The study started with the analysis of the sensors installed on the APU, defining its normal behavior and its failure mode. Using that information, we define rules, to monitor the APU, to detect anomalies on its components, and to predict possible failures. The definition of rules was based on the peak frequency analysis, which allowed the setting of boundaries of normality for the APU working modes and, thus, the identification of anomalies. © 2020, Springer Nature Switzerland AG.
2020
Autores
Koprinska, I; Kamp, M; Appice, A; Loglisci, C; Antonie, L; Zimmermann, A; Guidotti, R; Özgöbek, O; Ribeiro, RP; Gavaldà, R; Gama, J; Adilova, L; Krishnamurthy, Y; Ferreira, PM; Malerba, D; Medeiros, I; Ceci, M; Manco, G; Masciari, E; Ras, ZW; Christen, P; Ntoutsi, E; Schubert, E; Zimek, A; Monreale, A; Biecek, P; Rinzivillo, S; Kille, B; Lommatzsch, A; Gulla, JA;
Publicação
PKDD/ECML Workshops
Abstract
2020
Autores
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Brito, P; Afreixo, V;
Publicação
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Abstract
In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the 'trend'), and a sparse vector of detrended data captures the peak structure. A simulation study demonstrates the effectiveness of the clustering procedure in grouping distributions with similar peak behavior and/or baseline features. The procedure is applied to investigate similarities between the distribution patterns of genomic words of lengths 3 and 5 in the human genome. These experiments demonstrate the potential of the new method for identifying words with similar distance patterns.
2020
Autores
Vieira, AR; Campos, P; Brito, P;
Publicação
JOURNAL OF COMPLEX NETWORKS
Abstract
Community detection techniques use only the information about the network topology to find communities in networks Similarly, classic clustering techniques for vector data consider only the information about the values of the attributes describing the objects to find clusters. In real-world networks, however, in addition to the information about the network topology, usually there is information about the attributes describing the vertices that can also be used to find communities. Using both the information about the network topology and about the attributes describing the vertices can improve the algorithms' results. Therefore, authors started investigating methods for community detection in attributed networks. In the past years, several methods were proposed to uncover this task, partitioning a graph into sub-graphs of vertices that are densely connected and similar in terms of their descriptions. This article focuses on the analysis and comparison of some of the proposed methods for community detection in attributed networks. For that purpose, several applications to both synthetic and real networks are conducted. Experiments are performed on both weighted and unweighted graphs. The objective is to establish which methods perform generally better according to the validation measures and to investigate their sensitivity to changes in the networks' structure and homogeneity.
2020
Autores
de Sa, CR; Shekar, AK; Ferreira, H; Soares, C;
Publicação
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019)
Abstract
Sensors are susceptible to failure when exposed to extreme conditions over long periods of time. Besides they can be affected by noise or electrical interference. Models (Machine Learning or others) obtained from these faulty and noisy sensors may be less reliable. In this paper, we propose a data augmentation approach for making neural networks more robust to missing and faulty sensor data. This approach is shown to be effective in a real life industrial application that uses data of various sensors to predict the wear of an automotive fuel-system component. Empirical results show that the proposed approach leads to more robust neural network in this particular application than existing methods.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.