Publicacoes - INESC TEC

Publicações

Publicações por João Correia Lopes

2018

Research Data Management Tools and Workflows: Experimental Work at the University of Porto

Autores
Ribeiro, C; Rocha da Silva, J; Aguiar Castro, J; Carvalho Amorim, R; Correia Lopes, J; David, G;

Publicação
IASSIST Quarterly

Abstract
Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable. Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts. We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running. The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.

FecharLer Abstract

2019

Ranking Dublin Core descriptor lists from user interactions: a case study with Dublin Core Terms using the Dendro platform

Autores
da Silva, JR; Ribeiro, C; Lopes, JC;

Publicação
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES

Abstract
Dublin Core descriptors capture metadata in most repositories, and this includes recent repositories dedicated to datasets. DC descriptors are generic and are being adapted to the requirements of different communities with the so-called Dublin Core Application Profiles that rely on the agreement within user communities, taking into account their evolving needs. In this paper, we propose an automated process to help curators and users discover the descriptors that best suit the needs of a specific research group in the task of describing and depositing datasets. Our approach is supported on Dendro, a prototype research data management platform, where an experimental method is used to rank and present DC Terms descriptors to the users based on their usage patterns. User interaction is recorded and used to score descriptors. In a controlled experiment, we gathered the interactions of two groups as they used Dendro to describe datasets from selected sources. One of the groups viewed descriptors according to the ranking, while the other had the same list of descriptors throughout the experiment. Preliminary results show that (1) some DC Terms are filled in more often than others, with different distribution in the two groups, (2) descriptors in higher ranks were increasingly accepted by users in detriment of manual selection, (3) users were satisfied with the performance of the platform, and (4) the quality of description was not hindered by descriptor ranking.

FecharLer Abstract

2019

Empowering Distributed Analysis Across Federated Cohort Data Repositories Adhering to FAIR Principles

Autores
Rocha, A; Ornelas, JP; Lopes, JC; Camacho, R;

Publicação
ERCIM NEWS

Abstract
Novel data collection tools, methods and new techniques in biotechnology can facilitate improved health strategies that are customised to each individual. One key challenge to achieve this is to take advantage of the massive volumes of personal anonymous data, relating each profile to health and disease, while accounting for high diversity in individuals, populations and environments. These data must be analysed in unison to achieve statistical power, but presently cohort data repositories are scattered, hard to search and integrate, and data protection and governance rules discourage central pooling.

FecharLer Abstract

2020

A New Approach to Crowd Journalism Using a Blockchain-Based Infrastructure

Autores
Teixeira, L; Amorim, I; Silva, AU; Lopes, JC; Filipe, V;

Publicação
MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA

Abstract
The significant evolution of smartphones has given ordinary people the power to create good-quality content which can then be spread, by the press, over multiple platforms. Citizens are almost always the first ones to arrive at a breaking news location and can provide the initial images of the scene. However, existing crowdsourced tools and platforms are predominantly centralized and are usually fed with unreliable and untrustworthy information. This work introduces a Crowd Journalism ecosystem whose core is a video marketplace web tool based on an organization-level decentralized system that can store, visualize, rate, and execute transactions of live-made videos. Smart contracts ensure that all the transactions are transparent and secure. This approach to Crowd Journalism exploits the inherent features of a blockchain such as offering trustful, anonymized, and immutable transactions, which has the potential to revolutionize the way news content is shared and commercially exploited.

FecharLer Abstract

2022

Development of a data classification system for preterm birth cohort studies: the RECAP Preterm project

Autores
Bamber, D; Collins, HE; Powell, C; Goncalves, GC; Johnson, S; Manktelow, B; Ornelas, JP; Lopes, JC; Rocha, A; Draper, ES;

Publicação
BMC MEDICAL RESEARCH METHODOLOGY

Abstract
Background The small sample sizes available within many very preterm (VPT) longitudinal birth cohort studies mean that it is often necessary to combine and harmonise data from individual studies to increase statistical power, especially for studying rare outcomes. Curating and mapping data is a vital first step in the process of data harmonisation. To facilitate data mapping and harmonisation across VPT birth cohort studies, we developed a custom classification system as part of the Research on European Children and Adults born Preterm (RECAP Preterm) project in order to increase the scope and generalisability of research and the evaluation of outcomes across the lifespan for individuals born VPT. Methods The multidisciplinary consortium of expert clinicians and researchers who made up the RECAP Preterm project participated in a four-phase consultation process via email questionnaire to develop a topic-specific classification system. Descriptive analyses were calculated after each questionnaire round to provide pre- and post- ratings to assess levels of agreement with the classification system as it developed. Amendments and refinements were made to the classification system after each round. Results Expert input from 23 clinicians and researchers from the RECAP Preterm project aided development of the classification system's topic content, refining it from 10 modules, 48 themes and 197 domains to 14 modules, 93 themes and 345 domains. Supplementary classifications for target, source, mode and instrument were also developed to capture additional variable-level information. Over 22,000 individual data variables relating to VPT birth outcomes have been mapped to the classification system to date to facilitate data harmonisation. This will continue to increase as retrospective data items are mapped and harmonised variables are created. Conclusions This bespoke preterm birth classification system is a fundamental component of the RECAP Preterm project's web-based interactive platform. It is freely available for use worldwide by those interested in research into the long term impact of VPT birth. It can also be used to inform the development of future cohort studies.

FecharLer Abstract

2022

WindsPT e-Science platform for wind measurement campaigns

Autores
Gomes D.F.; Lopes J.C.; Palma J.M.L.M.; Senra F.; Dias S.; Coimbra I.L.;

Publicação
Journal of Physics: Conference Series

Abstract
Experimental field campaigns for collecting wind data, essential for academic research and the wind energy industry, are non-trivial due to the complex equipment and infrastructure required. This paper reports the latest developments of the WindsPT e-Science platform for planning, executing, and disseminating wind measurement campaign data. Existing e-Science platforms have been developed for more generic domains, preventing them from capturing the details and requirements of the field. Additionally, we propose a protocol for transferring large volumes of data from the in-site devices to our platform, ensuring data replication. With an easy-to-use Web interface, WindsPT promotes collaboration between participants, disseminates results among the stakeholders, publishes metadata, uses DOI, and includes metadata that enables machine-to-machine communication. The platform has multiple sections, with maps, images, and documents, where there is information about the location of the stations, positioning of the sensors, operating dates, photos, technical sheets, calibration documents, among others. The WindsPT platform has been used to host the Perdigão 2017 experimental campaign and proved to be a valuable tool during all the phases of this large field experiment. A new version of WindsPT, designed to be FAIR, host multiple campaigns, and include multiple cross-campaign shared features, as full-text search capabilities, is now developed and tested.

FecharLer Abstract