Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2023

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Authors
Andrade, C; Ribeiro, RP; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. This is the case of a dataset extracted from the Brazilian NF-e Project containing electronic invoice product descriptions, including many product clusters. While LDA-based clustering methods have shown to be crucial, they have been mainly evaluated on datasets with few clusters. We propose the Topic Model with Contextual Outlier Handling (TMCOH) method to overcome this limitation. This method combines the Dirichlet Process, specific word representation, and contextual outlier detection techniques to recycle identified outliers aiming to integrate them into appropriate clusters later on. The experimental results for our case study demonstrate the effectiveness of TMCOH when compared to state-of-the-art methods and its potential for application to text clustering in large datasets.

2023

Pollution Emission Patterns of Transportation in Porto, Portugal Through Network Analysis

Authors
Andrade, T; Shaji, N; Ribeiro, RP; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Over the past few decades, road transportation emissions have increased. Vehicles are among the most significant sources of pollutants in urban areas. As such, several studies and public policies emerged to address the issue. Estimating greenhouse emissions and air quality over space and time is crucial for human health and mitigating climate change. In this study, we demonstrate that it is feasible to utilize raw GPS data to measure regional pollution levels. By applying feature engineering techniques and using a microscopic emissions model to calculate vehicle-specific power (VSP) and various specific pollutants, we identify areas with higher emission levels attributable to a fleet of taxis in Porto, Portugal. Additionally, we conduct network analysis to uncover correlations between emission levels and the structural characteristics of the transportation network. These findings can potentially identify emission clusters based on the network's connectivity and contribute to developing an emission inventory for an urban city like Porto.

2023

Discovery Science

Authors
Bifet, A; Lorena, AC; Ribeiro, RP; Gama, J; Abreu, PH;

Publication
Lecture Notes in Computer Science

Abstract

2023

Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Authors
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, R; Gavaldà, R; Masciari, E; Ras, Z; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Saldana, D; Sechidis, K; Canakoglu, A; Pido, S; Pinoli, P; Bifet, A; Pashami, S;

Publication
Communications in Computer and Information Science

Abstract

2023

Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Authors
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, R; Gavaldà, R; Masciari, E; Ras, Z; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Saldana, D; Sechidis, K; Canakoglu, A; Pido, S; Pinoli, P; Bifet, A; Pashami, S;

Publication
Communications in Computer and Information Science

Abstract

2023

Wavelet-based fuzzy clustering of interval time series

Authors
D'Urso, P; De Giovanni, L; Maharaj, EA; Brito, P; Teles, P;

Publication
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

Abstract
We investigate the fuzzy clustering of interval time series using wavelet variances and covariances; in particular, we use a fuzzy c-medoids clustering algorithm. Traditional hierarchical and non-hierarchical clustering methods lead to the identification of mutually exclusive clusters whereas fuzzy clustering methods enable the identification of overlapping clusters, implying that one or more series could belong to more than one cluster simultaneously. An interval time series (ITS) which arises when interval-valued observa-tions are recorded over time is able to capture the variability of values within each interval at each time point. This is in contrast to single-point information available in a classical time series. Our main contribution is that by combining wavelet analysis, interval data analysis and fuzzy clustering, we are able to capture information which would otherwise have not been contemplated by the use of traditional crisp clustering methods on classical time series for which just a single value is recorded at each time point. Through simulation studies, we show that under some circumstances fuzzy c-medoids clustering performs better when applied to ITS than when it is applied to the corresponding traditional time series. Applications to exchange rates ITS and sea-level ITS show that the fuzzy clustering method reveals different and more meaningful results than when applied to associated single-point time series.

  • 53
  • 503