Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2022

Federated Anomaly Detection over Distributed Data Streams

Autores
Silva, PR; Vinagre, J; Gama, J;

Publicação
CoRR

Abstract

2022

Open challenges for Machine Learning based Early Decision-Making research

Autores
Bondu, A; Achenchabe, Y; Bifet, A; Clérot, F; Cornuéjols, A; Gama, J; Hébrail, G; Lemaire, V; Marteau, PF;

Publicação
SIGKDD Explor.

Abstract

2024

Forecasting financial market structure from network features using machine learning

Autores
Castilho, D; Souza, TTP; Kang, SM; Gama, J; de Carvalho, ACPLF;

Publicação
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to 40%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$40\%$$\end{document} improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.

FecharLer Abstract

2023

WINTENDED: WINdowed TENsor decomposition for Densification Event Detection in time-evolving networks

Autores
Fernandes, S; Fanaee T, H; Gama, J; Tisljaric, L; Smuc, T;

Publicação
MACHINE LEARNING

Abstract
Densification events in time-evolving networks refer to instants in which the network density, that is, the number of edges, is substantially larger than in the remaining. These events can occur at a global level, involving the majority of the nodes in the network, or at a local level involving only a subset of nodes.While global densification events affect the overall structure of the network, the same does not hold in local densification events, which may remain undetectable by the existing detection methods. In order to address this issue, we propose WINdowed TENsor decomposition for Densification Event Detection (WINTENDED) for the detection and characterization of both global and local densification events. Our method combines a sliding window decomposition with statistical tools to capture the local dynamics of the network and automatically find the irregular behaviours. According to our experimental evaluation, WINTENDED is able to spot global densification events at least as accurately as its competitors, while also being able to find local densification events, on the contrary to its competitors.

FecharLer Abstract

2023

Social network analytics and visualization: Dynamic topic-based influence analysis in evolving micro-blogs

Autores
Tabassum, S; Gama, J; Azevedo, PJ; Cordeiro, M; Martins, C; Martins, A;

Publicação
EXPERT SYSTEMS

Abstract
Influence Analysis is one of the well-known areas of Social Network Analysis. However, discovering influencers from micro-blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for topic modelling and identifying influencers in the same process. It incorporates dynamic sampling, community detection and network statistics over graph data stream from a social media activity management application. Further, we compare the graph measures against each other empirically and observe that there is no evidence of correlation between the sets of users having large number of friends and the users whose posts achieve high acceptance (i.e., highly liked, commented and shared posts). Therefore, we propose a novel approach that incorporates a user's reachability and also acceptability by other users. Consequently, we improve on graph metrics by including a dynamic acceptance score (integrating content quality with network structure) for ranking influencers in micro-blogs. Additionally, we analysed the topic clusters' structure and quality with empirical experiments and visualization.

FecharLer Abstract

2021

How can I choose an explainer?: An Application-grounded Evaluation of Post-hoc Explanations

Autores
Jesus, SM; Belém, C; Balayan, V; Bento, J; Saleiro, P; Bizarro, P; Gama, J;

Publicação
FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021

Abstract
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular XAI methods - LIME, SHAP, and TreeInterpreter - on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case. © 2021 ACM.

FecharLer Abstract