2024
Autores
Verde R.; Batagelj V.; Brito P.; Silva A.P.D.; Korenjak-Cerne S.; Dobša J.; Diday E.;
Publicação
Statistical Journal of the IAOS
Abstract
The paper draws attention to the use of Symbolic Data Analysis (SDA) in the field of Official Statistics. It is composed of three sections presenting three pilot techniques in the field of SDA. The three contributions range from a technique based on the notion of exactly unified summaries for the creation of symbolic objects, a model-based approach for interval data as an innovative parametric strategy in this context, and measures of similarity defined between a class and a collection of classes based on the frequency of the categories which characterize them. The paper shows the effectiveness of the proposed approaches as prototypes of numerous techniques developed within the SDA framework and opens to possible further developments.
2024
Autores
Bezerra, A; Pereira, I; Rebelo, MA; Coelho, D; de Oliveira, DA; Costa, JFP; Cruz, RPM;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Phishing attacks aims to steal sensitive information and, unfortunately, are becoming a common practice on the web. Email phishing is one of the most common types of attacks on the web and can have a big impact on individuals and enterprises. There is still a gap in prevention when it comes to detecting phishing emails, as new attacks are usually not detected. The goal of this work was to develop a model capable of identifying phishing emails based on machine learning approaches. The work was performed in collaboration with E-goi, a multi-channel marketing automation company. The data consisted of emails collected from the E-goi servers in the electronic mail format. The problem consisted of a classification problem with unbalanced classes, with the minority class corresponding to the phishing emails and having less than 1% of the total emails. Several models were evaluated after careful data selection and feature extraction based on the email content and the literature regarding these types of problems. Due to the imbalance present in the data, several sampling methods based on under-sampling techniques were tested to see their impact on the model's ability to detect phishing emails. The final model consisted of a neural network able to detect more than 80% of phishing emails without compromising the remaining emails sent by E-goi clients.
2024
Autores
Ferreira Moreira, EJV; Campos, JC;
Publicação
13th Symposium on Languages, Applications and Technologies, SLATE 2024, July 4-5, 2024, Águeda, Portugal
Abstract
Model checkers can automatically verify a system’s behavior against temporal logic properties. However, analyzing the counterexamples produced in case of failure is still a manual process that requires both technical and domain knowledge. However, this step is crucial to understand the flaws of the system being verified. This paper presents a language created to support the generation of natural language explanations of counterexamples produced by a model checker. The language supports querying the properties and counterexamples to generate the explanations. The paper explains the language components and how they can be used to produce explanations. © Ezequiel José Veloso Ferreira Moreira and José Creissac Campos.
2024
Autores
Oliveira, B; Lobo, A; Botelho Costa, CIA; Carvalho, RF; Coimbra, MT; Renna, F;
Publicação
46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024, Orlando, FL, USA, July 15-19, 2024
Abstract
We introduce a Gradient-weighted Class Activation Mapping (Grad-CAM) methodology to assess the performance of five distinct models for binary classification (normal/abnormal) of synchronized heart sounds and electrocardiograms. The applied models comprise a one-dimensional convolutional neural network (1D-CNN) using solely ECG signals, a two-dimensional convolutional neural network (2D-CNN) applied separately to PCG and ECG signals, and two multimodal models that employ both signals. In the multimodal models, we implement two fusion approaches: an early fusion and a late fusion. The results indicate a performance improvement in using an early fusion model for the joint classification of both signals, as opposed to using a PCG 2D-CNN or ECG 1D-CNN alone (e.g., ROC-AUC score of 0.81 vs. 0.79 and 0.79, respectively). Although the ECG 2D-CNN demonstrates a higher ROC-AUC score (0.82) compared to the early fusion model, it exhibits a lower F1-score (0.85 vs. 0.86). Grad-CAM unveils that the models tend to yield higher gradients in the QRS complex and T/P-wave of the ECG signal, as well as between the two PCG fundamental sounds (S1 and S2), for discerning normalcy or abnormality, thus showcasing that the models focus on clinically relevant features of the recorded data.
2024
Autores
Ali, ÖG; Amorim, P;
Publicação
INTERNATIONAL JOURNAL OF FORECASTING
Abstract
Discrete choice models can forecast market shares and individual choice probabilities with different price and alternative set scenarios. This work introduces a method to personalize choice models involving causal variables, such as price, using rich observational data. The model provides interpretable customer- and context-specific preferences, and price sensitivity, with an estimation procedure that uses orthogonalization. We caution against the nalive use of regularization to deal with the high-dimensional observational data challenge. We experiment with the attended home delivery (AHD) slot choice problem using data from a European online retailer. Our results indicate that while the popular non-personalized multinomial logit (MNL) model does very well at the aggregate (day-slot) level, personalization provides significantly and substantially more accurate predictions at the individual-context level. But the nalive personalization approach using regularization without orthogonalization wrongly predicts that the choice probability will increase if the slot price increases, rendering it unfit for forecasting demand with pricing scenarios. The proposed method avoids this problem. Further, we introduce features based on potential consideration sets in the AHD slot choice context that increase accuracy and allow for more realistic substitution patterns than the proportional substitution implied by MNL.
2024
Autores
Gama, J; Ribeiro, RP; Mastelini, SM; Davari, N; Veloso, B;
Publicação
CoRR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.