Publications

Publications by Artur Rocha

2023

Life course of retrospective harmonization initiatives: key elements to consider

Authors
Fortier, I; Wey, TW; Bergeron, J; de Moira, AP; Nybo Andersen, AM; Bishop, T; Murtagh, MJ; Miocevic, M; Swertz, MA; van Enckevort, E; Marcon, Y; Mayrhofer, MT; Ornelas, JP; Sebert, S; Santos, AC; Rocha, A; Wilson, RC; Griffith, LE; Burton, P;

Publication
JOURNAL OF DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE

Abstract
Optimizing research on the developmental origins of health and disease (DOHaD) involves implementing initiatives maximizing the use of the available cohort study data; achieving sufficient statistical power to support subgroup analysis; and using participant data presenting adequate follow-up and exposure heterogeneity. It also involves being able to undertake comparison, cross-validation, or replication across data sets. To answer these requirements, cohort study data need to be findable, accessible, interoperable, and reusable (FAIR), and more particularly, it often needs to be harmonized. Harmonization is required to achieve or improve comparability of the putatively equivalent measures collected by different studies on different individuals. Although the characteristics of the research initiatives generating and using harmonized data vary extensively, all are confronted by similar issues. Having to collate, understand, process, host, and co-analyze data from individual cohort studies is particularly challenging. The scientific success and timely management of projects can be facilitated by an ensemble of factors. The current document provides an overview of the 'life course' of research projects requiring harmonization of existing data and highlights key elements to be considered from the inception to the end of the project.

CloseRead Abstract

2025

Do LLMs Tell Us What We Want to Hear? Investigating Confirmation Bias in AI Responses to Health Queries

Authors
Ala, RR; Gonçalves, G; Lopes, LS; Dantas, TF; Paulino, D; Netto, AT; Guimarães, D; Rocha, A; Vivacqua, AS; Paredes, H;

Publication
SMC

Abstract
Large Language Models (LLMs) are widely used today in virtual assistants and content generation. However, there are suspicions that LLMs present confirmation bias, responding in a way that reinforces beliefs or assumptions embedded in users' questions, which can lead to erroneous decision-making, especially in sensitive areas such as healthcare. The objective of this research is to determine how often and under what conditions LLMs present confirmation bias and to identify the causes of this effect. The methodology involves conducting an experiment in which 52 biased healthcare questions are presented to 10 of the most popular models and analyzing whether their responses were biased. This work proves with statistical power the behavior of confirmation bias. We show that confirmation bias in LLMs occurs in all LLMs with a frequency of 20% to 60% of the occasions. The evidence suggests that the bias arises from the training database, the Transformer architecture itself, and the instructions in the fine-tuning phase by the companies behind the LLMs. This research explores pathways for the development of trustworthy LLMs.

CloseRead Abstract