Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Rita Paula Ribeiro

2023

Study on Correlation Between Vehicle Emissions and Air Quality in Porto

Authors
Shaji, N; Andrade, T; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I

Abstract
Road transportation emissions have increased in the last few decades and have been the primary source of pollutants in urban areas with ever-growing populations. In this context, it is important to have effective measures to monitor road emissions in regions. Creating an emission inventory over a region that can map the road emission based on the vehicle trips can be helpful for this. In this work, we show that it is possible to use raw GPS data to measure levels of pollution in a region. By transforming the data using feature engineering and calculating the vehicle-specific power (VSP), we show the areas with higher emissions levels made by a fleet of taxis in Porto, Portugal. The Uber H3 grid system is used to decompose the city into hexagonal grids to sample nearby data points into a region. We validate our experiments on real-world sensor datasets deployed in several city regions, showing the correlation with VSP and true values for several pollutants attesting to the method's usefulness.

2023

Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part II

Authors
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, RP; Gavaldà, R; Masciari, E; Ras, ZW; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Miranda, DS; Sechidis, K; Canakoglu, A; Pidò, S; Pinoli, P; Bifet, A; Pashami, S;

Publication
PKDD/ECML Workshops (2)

Abstract

2023

Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part I

Authors
Koprinska, I; Mignone, P; Guidotti, R; Jaroszewicz, S; Fröning, H; Gullo, F; Ferreira, PM; Roqueiro, D; Ceddia, G; Nowaczyk, S; Gama, J; Ribeiro, RP; Gavaldà, R; Masciari, E; Ras, ZW; Ritacco, E; Naretto, F; Theissler, A; Biecek, P; Verbeke, W; Schiele, G; Pernkopf, F; Blott, M; Bordino, I; Danesi, IL; Ponti, G; Severini, L; Appice, A; Andresini, G; Medeiros, I; Graça, G; Cooper, L; Ghazaleh, N; Richiardi, J; Miranda, DS; Sechidis, K; Canakoglu, A; Pidò, S; Pinoli, P; Bifet, A; Pashami, S;

Publication
PKDD/ECML Workshops (1)

Abstract

2022

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Authors
Jesus, S; Pombal, J; Alves, D; Cruz, AF; Saleiro, P; Ribeiro, RP; Gama, J; Bizarro, P;

Publication
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022

Abstract

2022

Model Optimization in Imbalanced Regression

Authors
Silva, A; Ribeiro, RP; Moniz, N;

Publication
DISCOVERY SCIENCE (DS 2022)

Abstract
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.

2022

The MetroPT dataset for predictive maintenance

Authors
Veloso, B; Gama, J; Ribeiro, RP; Pereira, PM;

Publication
SCIENTIFIC DATA

Abstract
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed) provide a framework that can be easily used and help the development of new machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.

  • 17
  • 18