Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2024

Boosting English-Amharic machine translation using corpus augmentation and Transformer

Authors
Biadgligne, Y; Smaili, K;

Publication
Interciencia

Abstract
The Transformer-based neural machine translation (NMT) model has been very successful in recent years and has become a new mainstream method. However, using them in lowresourced languages requires large amounts of data and efficient model configuration (hyperparameter tuning) mechanisms. The scarcity of parallel texts is a bottleneck for high quality (N) MTs, especially for under resourced languages like Amharic. As a result, this paper presents an attempt to improve English-Amharic MT by introducing three different vanilla Transformer architectures, with different hyper-parameter values. To obtain additional training material, offline token level corpus augmentation was applied to the previously collected English-Amharic parallel corpus. Compared to previous work on Amharic MT, the best of the three Transformer models have achieved state-of-the-art BLEU scores. In fact, we were able to achieve this result by employing corpus augmentation techniques and hyper-parameter tuning.

2024

Data-Centric Federated Learning for Anomaly Detection in Smart Grids and Other Industrial Control Systems

Authors
Perdigao, D; Cruz, T; Simoes, P; Abreu, PH;

Publication
PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024

Abstract
Energy smart grids and other modern industrial control systems networks impose considerable security management challenges due to several factors: their broad geographic dispersion and capillarity, the constrained nature of many of the devices and network links that integrate them, and the fact that they are often fragmented across multiple domains, owned and managed by different entities which often have non-aligned or even competing interests. Due to this scenario, we propose to improve federated learning-based anomaly detection for smart grids and other industrial control networks, using a federated data-centric methodology that attends to the balance and causality of the data, improving the representation of the different classes of anomalies of the ingested data, which directly impact the classifier's performance. The proposed approach shows up to 33% performance improvements in terms of F1-score for attack classification, compared to the baseline federated approach (not attending to class imbalance and causality) on a broad range of industrial control systems traffic datasets.

2024

A Perspective on the Missing at Random Problem: Synthetic Generation and Benchmark Analysis

Authors
Cabrera Sánchez, JF; Pereira, RC; Abreu, PH; Silva Ramírez, EL;

Publication
IEEE Access

Abstract

2024

Call for Papers: Data Generation in Healthcare Environments

Authors
Pereira, RC; Rodrigues, PP; Moreira, IS; Abreu, PH;

Publication
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
[No abstract available]

2024

Reconstruction of Mammography Projections using Image-to-Image Translation Techniques

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
32nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2024, Bruges, Belgium, October 9-11, 2024

Abstract

2024

A Perspective on the Missing at Random Problem: Synthetic Generation and Benchmark Analysis

Authors
Cabrera-Sánchez, JF; Pereira, RC; Abreu, PH; Silva-Ramírez, EL;

Publication
IEEE ACCESS

Abstract
Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates $(10\%, 20\%, 40\%, 60\%, 80\%)$ . Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.

  • 61
  • 516