Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2024

Multidimensional subgroup discovery on event logs

Authors
Ribeiro, J; Fontes, T; Soares, C; Borges, JL;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Subgroup discovery (SD) aims at finding significant subgroups of a given population of individuals characterized by statistically unusual properties of interest. SD on event logs provides insight into particular behaviors of processes, which may be a valuable complement to the traditional process analysis techniques, especially for low -structured processes. This paper proposes a scalable and efficient method to search significant SD rules on frequent sequences of events, exploiting their multidimensional nature. With this method, it is intended to identify significant subsequences of events where the distribution of values of some target aspect is significantly different than the same distribution for the entire event log. A publicly available real -life event log of a Dutch hospital is used as a running example to demonstrate the applicability of our method. The proposed approach was applied on a real -life case study based on the public transport of a medium size European city (Porto, Portugal), for which the event data consists of 133 million smartcard travel validations from buses, trams and trains. The results include a characterization of mobility flows over multiple aspects, as well as the identification of unexpected behaviors in the flow of commuters (public transport). The generated knowledge provided a useful insight into the behavior of travelers, which can be applied at operational, tactical and strategic business levels, enhancing the current view of the transport services to transport authorities and operators.

2024

VEST: automatic feature engineering for forecasting

Authors
Cerqueira, V; Moniz, N; Soares, C;

Publication
MACHINE LEARNING

Abstract
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of time series. The result of our research is a novel framework called VEST, designed to perform feature engineering using univariate and numeric time series automatically. The proposed approach works in three main steps. First, recent observations are mapped onto different representations. Second, each representation is summarised by statistical functions. Finally, a filter is applied for feature selection. We discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance in a database composed by 90 time series with high sampling frequency. However, we also found that there are no improvements when the framework is applied for multi-step forecasting or in time series with low sample size. VEST is publicly available online.

2024

CNP-MLDM: Contract Net Protocol for Negotiation in Machine Learning Data Market

Authors
Baghcheband, H; Soares, C; Reis, LP;

Publication
DS (LB)

Abstract
The Machine Learning Data Market (MLDM), which relies on multi-agent systems, necessitates robust negotiation strategies to ensure efficient and fair transactions. The Contract Net Protocol (CNP), a well-established negotiation strategy within Multi-Agent Systems (MAS), offers a promising solution. This paper explores the integration of CNP into MLDM, proposing the CNP-MLDM model to facilitate data exchanges. Characterized by its task announcement and bidding process, CNP enhances negotiation efficiency in MLDM. This paper describes CNP tailored for MLDM, detailing the proposed protocol following experimental results.

2024

Tabular data generation with tensor contraction layers and transformers

Authors
Silva, A; Restivo, A; Santos, M; Soares, C;

Publication
CoRR

Abstract

2024

Meta-TadGAN: Time Series Anomaly Detection Using TadGAN with Meta-features

Authors
Silva, IOe; Soares, C; Cerqueira, V; Rodrigues, A; Bastardo, P;

Publication
EPIA (3)

Abstract
TadGAN is a recent algorithm with competitive performance on time series anomaly detection. The detection process of TadGAN works by comparing observed data with generated data. A challenge in anomaly detection is that there are anomalies which are not easy to detect by analyzing the original time series but have a clear effect on its higher-order characteristics. We propose Meta-TadGAN, an adaptation of TadGAN that analyzes meta-level representations of time series. That is, it analyzes a time series that represents the characteristics of the time series, rather than the original time series itself. Results on benchmark datasets as well as real-world data from fire detectors shows that the new method is competitive with TadGAN.

2024

Enhancing Algorithm Performance Understanding through tsMorph: Generating Semi-Synthetic Time Series for Robust Forecasting Evaluation

Authors
Santos, M; de Carvalho, ACPLF; Soares, C;

Publication
AEQUITAS@ECAI

Abstract
When never produced as much data as today, and tomorrow will probably produce even more data. The increase is due not only to the larger number of data sources, but also because the source can continuously produce more recent data. The discovery of temporal patterns in continuously generated data is the main goal in many forecasting tasks, such as the average value of a currency or the average temperature in a city, in the next day. In these tasks, it is assumed that the time difference between two consecutive values produced by the same source is constant, and the sequence of values form a time series. The importance, and the very large number, of time series forecasting tasks make them one of the most popular data analysis application, which has been dealt with by a large number of different methods. Despite its popularity, there is a dearth of research aimed at comprehending the conditions under which these methods present high or poor forecasting performances. Empirical studies, although common, are challenged by the limited availability of time series datasets, restricting the extraction of reliable insights. To address this limitation, we present tsMorph, a tool for generating semi-synthetic time series through dataset morphing. tsMorph works by creating a sequence of datasets from two original datasets. The characteristics of the generated datasets progressively depart from those of one of the datasets and a convergence toward the attributes of the other dataset. This method provides a valuable alternative for obtaining substantial datasets. In this paper, we show the benefits of tsMorph by assessing the predictive performance of the Long Short-Term Memory Network and DeepAR forecasting algorithms. The time series used for the experiments come from the NN5 Competition. The experimental results provide important insights. Notably, the performances of the two algorithms improve proportionally with the frequency of the time series. These experiments confirm that tsMorph can be an effective tool for better understanding the behaviour of forecasting algorithms, delivering a pathway to overcoming the limitations posed by empirical studies and enabling more extensive and reliable experiments. Furthermore, tsMorph can promote Responsible Artificial Intelligence by emphasising characteristics of time series where forecasting algorithms may not perform well, thereby highlighting potential limitations.

  • 41
  • 513