Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Gama

2024

SWINN: Efficient nearest neighbor search in sliding windows using graphs

Autores
Mastelini, SM; Veloso, B; Halford, M; de Carvalho, ACPDF; Gama, J;

Publicação
INFORMATION FUSION

Abstract
Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms.

2023

Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Autores
Bifet, A; Lorena, AC; Ribeiro, RP; Gama, J; Abreu, PH;

Publicação
DS

Abstract

2023

Why Industry 5.0 Needs XAI 2.0?

Autores
Bobek, S; Nowaczyk, S; Gama, J; Pashami, S; Ribeiro, RP; Taghiyarrenani, Z; Veloso, B; Rajaoarisoa, LH; Szelazek, M; Nalepa, GJ;

Publicação
Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Lisbon, Portugal, July 26-28, 2023.

Abstract
Advances in artificial intelligence trigger transformations that make more and more companies enter Industry 4.0 and 5.0 eras. In many cases, these transformations are gradual and performed in a bottom-up manner. This means that in the first step, the industrial hardware is upgraded to collect as much data as possible without actual planning of the utilization of the information. Furthermore, the data storage and processing infrastructure is prepared to keep large volumes of historical data accessible for further analysis. Only in the last step are methods for processing the data developed to improve or gain more insight into the industrial and business processes. Such a pipeline makes many companies face a problem with huge amounts of data, an incomplete understanding of how the existing knowledge is represented in the data, under which conditions the knowledge no longer holds, or what new phenomena are hidden inside the data. We argue that this gap needs to be addressed by the next generation of XAI methods which should be expert-oriented and focused on knowledge generation tasks rather than model debugging. The paper is based on the findings of the EU CHIST-ERA project on Explainable Predictive Maintenance (XPM). © 2023 CEUR-WS. All rights reserved.

2023

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Autores
Andrade, C; Ribeiro, RP; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. This is the case of a dataset extracted from the Brazilian NF-e Project containing electronic invoice product descriptions, including many product clusters. While LDA-based clustering methods have shown to be crucial, they have been mainly evaluated on datasets with few clusters. We propose the Topic Model with Contextual Outlier Handling (TMCOH) method to overcome this limitation. This method combines the Dirichlet Process, specific word representation, and contextual outlier detection techniques to recycle identified outliers aiming to integrate them into appropriate clusters later on. The experimental results for our case study demonstrate the effectiveness of TMCOH when compared to state-of-the-art methods and its potential for application to text clustering in large datasets.

2023

Pollution Emission Patterns of Transportation in Porto, Portugal Through Network Analysis

Autores
Andrade, T; Shaji, N; Ribeiro, RP; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Over the past few decades, road transportation emissions have increased. Vehicles are among the most significant sources of pollutants in urban areas. As such, several studies and public policies emerged to address the issue. Estimating greenhouse emissions and air quality over space and time is crucial for human health and mitigating climate change. In this study, we demonstrate that it is feasible to utilize raw GPS data to measure regional pollution levels. By applying feature engineering techniques and using a microscopic emissions model to calculate vehicle-specific power (VSP) and various specific pollutants, we identify areas with higher emission levels attributable to a fleet of taxis in Porto, Portugal. Additionally, we conduct network analysis to uncover correlations between emission levels and the structural characteristics of the transportation network. These findings can potentially identify emission clusters based on the network's connectivity and contribute to developing an emission inventory for an urban city like Porto.

2023

Bayesian Federated Learning: A Survey

Autores
Cao, LB; Chen, H; Fan, XH; Gama, J; Ong, YS; Kumar, V;

Publicação
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023

Abstract
Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FLbased BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.

  • 86
  • 93