2024
Autores
Alves, H; Brito, P; Campos, P;
Publicação
DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
In this paper we introduce and develop the concept of interval-weighted networks (IWN), a novel approach in Social Network Analysis, where the edge weights are represented by closed intervals composed with precise information, comprehending intrinsic variability. We extend IWN for both Newman's modularity and modularity gain and the Louvain algorithm, considering a tabular representation of networks by contingency tables. We apply our methodology to two real-world IWN. The first is a commuter network in mainland Portugal, between the twenty three NUTS 3 Regions (IWCN). The second focuses on annual merchandise trade between 28 European countries, from 2003 to 2015 (IWTN). The optimal partition of geographic locations (regions or countries) is developed and compared using two new different approaches, designated as Classic Louvain and Hybrid Louvain , which allow taking into account the variability observed in the original network, thereby minimizing the loss of information present in the raw data. Our findings suggest the division of the twenty three Portuguese regions in three main communities for the IWCN and between two to three country communities for the IWTN. However, we find different geographical partitions according to the community detection methodology used. This analysis can be useful in many real-world applications, since it takes into account that the weights may vary within the ranges, rather than being constant.
2024
Autores
Park, YJ; Brito, P; Ma, YC;
Publicação
ENGINEERING OPTIMIZATION
Abstract
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.
2024
Autores
Silva, CC; Brito, P; Campos, P;
Publicação
Statistical Journal of the IAOS
Abstract
2024
Autores
Verde R.; Batagelj V.; Brito P.; Silva A.P.D.; Korenjak-Cerne S.; Dobša J.; Diday E.;
Publicação
Statistical Journal of the IAOS
Abstract
The paper draws attention to the use of Symbolic Data Analysis (SDA) in the field of Official Statistics. It is composed of three sections presenting three pilot techniques in the field of SDA. The three contributions range from a technique based on the notion of exactly unified summaries for the creation of symbolic objects, a model-based approach for interval data as an innovative parametric strategy in this context, and measures of similarity defined between a class and a collection of classes based on the frequency of the categories which characterize them. The paper shows the effectiveness of the proposed approaches as prototypes of numerous techniques developed within the SDA framework and opens to possible further developments.
2024
Autores
Brito, P; Cerioli, A; Garcia-Escudero, LA; Saporta, G;
Publicação
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Abstract
[No abstract available]
2024
Autores
Ribeiro, J; Fontes, T; Soares, C; Borges, JL;
Publicação
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
Subgroup discovery (SD) aims at finding significant subgroups of a given population of individuals characterized by statistically unusual properties of interest. SD on event logs provides insight into particular behaviors of processes, which may be a valuable complement to the traditional process analysis techniques, especially for low -structured processes. This paper proposes a scalable and efficient method to search significant SD rules on frequent sequences of events, exploiting their multidimensional nature. With this method, it is intended to identify significant subsequences of events where the distribution of values of some target aspect is significantly different than the same distribution for the entire event log. A publicly available real -life event log of a Dutch hospital is used as a running example to demonstrate the applicability of our method. The proposed approach was applied on a real -life case study based on the public transport of a medium size European city (Porto, Portugal), for which the event data consists of 133 million smartcard travel validations from buses, trams and trains. The results include a characterization of mobility flows over multiple aspects, as well as the identification of unexpected behaviors in the flow of commuters (public transport). The generated knowledge provided a useful insight into the behavior of travelers, which can be applied at operational, tactical and strategic business levels, enhancing the current view of the transport services to transport authorities and operators.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.