Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Google Scholar page: https://scholar.google.pt/citations?user=GYoCHRYAAAAJ

João Rocha da Silva holds a PhD in Informatics Engineering from the Faculty of Engineering of the University of Porto, where he also teaches. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets.

Past experience includes two consulting companies: Deloitte and Sysnovare, in which he worked on SAP modules, business blueprinting and software processes restructuring.

He is experienced in many programming languages (Javascript-Node, PHP with MVC frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems (everyday Mac user). Regardless of language, he is a quick learner that can adapt to any new technology quickly and effectively.

He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in japanese classic cars.

Interest
Topics
Details

Details

002
Publications

2019

Hands-On Data Publishing with Researchers: Five Experiments with Metadata in Multiple Domains

Authors
Rodrigues, J; Castro, JA; da Silva, JR; Ribeiro, C;

Publication
Communications in Computer and Information Science

Abstract
The current requirements for open data in the EU are increasing the awareness of researchers with respect to data management and data publication. Metadata is essential in research data management, namely on data discovery and reuse. Current practices tend to either leave metadata definition to researchers, or to assign their creation to curators. The former typically results in ad-hoc descriptors, while the latter follows standards but lacks specificity. In this exploratory study, we adopt a researcher-curator collaborative approach in five data publication cases, involving researchers in data description and discussing the use of both generic and domain-oriented metadata. The study shows that researchers working on familiar datasets can contribute effectively to the definition of metadata models, in addition to the actual metadata creation. The cases also provide preliminary evidence of cross-disciplinary descriptor use. Moreover, the interaction with curators highlights the advantages of data management, making researchers more open to participate in the corresponding tasks. © Springer Nature Switzerland AG 2019.

2019

Ranking Dublin Core descriptor lists from user interactions: a case study with Dublin Core Terms using the Dendro platform

Authors
da Silva, JR; Ribeiro, C; Lopes, JC;

Publication
International Journal on Digital Libraries

Abstract
Dublin Core descriptors capture metadata in most repositories, and this includes recent repositories dedicated to datasets. DC descriptors are generic and are being adapted to the requirements of different communities with the so-called Dublin Core Application Profiles that rely on the agreement within user communities, taking into account their evolving needs. In this paper, we propose an automated process to help curators and users discover the descriptors that best suit the needs of a specific research group in the task of describing and depositing datasets. Our approach is supported on Dendro, a prototype research data management platform, where an experimental method is used to rank and present DC Terms descriptors to the users based on their usage patterns. User interaction is recorded and used to score descriptors. In a controlled experiment, we gathered the interactions of two groups as they used Dendro to describe datasets from selected sources. One of the groups viewed descriptors according to the ranking, while the other had the same list of descriptors throughout the experiment. Preliminary results show that (1) some DC Terms are filled in more often than others, with different distribution in the two groups, (2) descriptors in higher ranks were increasingly accepted by users in detriment of manual selection, (3) users were satisfied with the performance of the platform, and (4) the quality of description was not hindered by descriptor ranking. © 2018 Springer-Verlag GmbH Germany, part of Springer Nature

2019

Visualization in reproducible science

Authors
Marques, BM; da Silva, JR; Devezas, T;

Publication
Iberian Conference on Information Systems and Technologies, CISTI

Abstract
The increasing prevalence of Open Science has brought reproducibility to the center of discussion of the scientific community as a requirement for ensuring the transparency and correctness of a research workflow. The current publishing landscape is evolving, as shown by the emergence of notebook technologies powering a new generation of interactive Web Journals. These use state-of-the-art interactive graphical visualizations and on-demand data processing to research papers, allowing readers to trace every step of the process, from raw data to the finalized visualization. Since there are many Research Notebook technologies and interactive graphical visualization solutions to choose from, we present a summary comparative overview of Web Journals and the Notebook engines that power the interactive, data driven visualizations inside their publications. Given our focus on visualization, our metrics are the support for the most advanced, popular and widely adopted data visualization frameworks. We conclude that Jupyter Notebook is currently the best alternative for the average user, given its popularity and support, combined with broad support for powerful and high-level interactive visualization grammars. © 2019 AISTI.

2019

Knowledge Graph Implementation of Archival Descriptions Through CIDOC-CRM

Authors
Koch, I; Freitas, N; Ribeiro, C; Lopes, CT; da Silva, JR;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Archives have well-established description standards, namely the ISAD(G) and ISAAR(CPF) with a hierarchical structure adapted to the nature of archival assets. However, as archives connect to a growing diversity of data, they aim to make their representations more apt to the so-called linked data cloud. The corresponding move from hierarchical, ISAD-conforming descriptions to graph counterparts requires state-of-the-art technologies, data models and vocabularies. Our approach addresses this problem from two perspectives. The first concerns the data model and description vocabularies, as we adopt and build upon the CIDOC-CRM standard. The second is the choice of technologies to support a knowledge graph, including a graph database and an Object Graph Mapping library. The case study is the Portuguese National Archives, Torre do Tombo, and the overall goal is to build a CIDOC-CRM-compliant system for document description and retrieval, to be used by professionals and the public. The early stages described here include the design of the core data model for archival records represented as the ArchOnto ontology and its embodiment in the ArchGraph knowledge graph. The goal of a semantic archival information system will be pursued in the migration of existing records to the richer representation and the development of applications supported on the graph. © Springer Nature Switzerland AG 2019.

2019

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Authors
Fortuna, P; Rocha da Silva, JR; Soler Company, J; Wanner, L; Nunes, S;

Publication
THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE

Abstract
Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

Supervised
thesis

2017

Metadata gamification: Jogos sérios para melhoria de descrição de dados da investigação

Author
Bruno Coelho da Silva

Institution
UP-FEUP