Publications

Publications by HumanISE

2019

Interplay of Documents' Readability, Comprehension and Consumer Health Search Performance Across Query Terminology

Authors
Lopes, CT; Ribeiro, C;

Publication
PROCEEDINGS OF THE 2019 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL (CHIIR'19)

Abstract
Because of terminology mismatches, health consumers frequently face difficulties while searching the Web for health information. Difficulties arise in query formulation but also in understanding the retrieved documents. In this work we analyze how documents' readability affects users' comprehension and how both affect the retrieval performance, measured in different ways. In addition, we analyze how performance measures relate with each other. For this purpose we have conducted a laboratory user study with 40 participants. We found that readability is essential for a document to be at least partially relevant and that it becomes even more important if the document has medico-scientific terminology. Moreover, the relevance of a document to a specific user highly depends on its comprehension. In lay queries we found the medical accuracy of users' answers is related to the session's relevance assessments. This shows that users can, at least in part, relate their relevance assessments with the medical accuracy of the documents. On the other hand, this relationship does not exist with medico-scientific queries.

CloseRead Abstract

2019

Assisting Health Consumers While Searching the Web Through Medical Annotations

Authors
Lopes, CT; Sousa, H;

Publication
PROCEEDINGS OF THE 2019 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL (CHIIR'19)

Abstract
Health consumers usually face difficulties on their online searches, mainly because of the differences between terminologies used by laypeople and health professionals. This work presents a tool, HealthTranslator, available as a Google Chrome extension that intends to reduce this terminological gap while users are searching the Web for health information. HealthTranslator automatically annotates medical concepts in web documents, providing additional information, such as concept definition, related concepts and links to external references. The solution was evaluated regarding its: ( a) performance-the document processing is done gradually, typically from the top to the bottom of the document and performance was not an issue raised by the users; ( b) concept coverage-the solution was compared to a similar extension performing in English recognizing significantly more concepts. A comparison with a corpus of Portuguese documents manually annotated with medical concepts showed an average F-measure between 27% and 33%, depending on the type of concepts being recognized; ( c) users' receptivity to HealthTranslator and its usability-many aspects were surveyed on a user study. In general, the extension has a good acceptance and users find it useful.

CloseRead Abstract

2019

Characterizing and comparing Portuguese and English Wikipedia medicine-related articles

Authors
Domingues, G; Lopes, CT;

Publication
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 )

Abstract
Wikipedia is the largest on-line collaborative encyclopedia, containing information from a plethora of fields, including medicine. It has been shown that Wikipedia is one of the top visited sites by readers looking for information on this topic. The large reliance on Wikipedia for this type of information drives research towards the analysis of the quality of its articles. In this work, we evaluate and compare the quality of medicine-related articles in the English and Portuguese Wikipedia. For that we use metrics such as authority, completeness, complexity, informativeness, consistency, currency and volatility, and domain-specific measurements, in order to evaluate and compare the quality of medicine related articles in the English and Portuguese Wikipedia. We were able to conclude that the English articles score better across most metrics than the Portuguese articles.

CloseRead Abstract

2019

Knowledge Graph Implementation of Archival Descriptions Through CIDOC-CRM

Authors
Koch, I; Freitas, N; Ribeiro, C; Lopes, CT; da Silva, JR;

Publication
DIGITAL LIBRARIES FOR OPEN KNOWLEDGE, TPDL 2019

Abstract
Archives have well-established description standards, namely the ISAD(G) and ISAAR(CPF) with a hierarchical structure adapted to the nature of archival assets. However, as archives connect to a growing diversity of data, they aim to make their representations more apt to the so-called linked data cloud. The corresponding move from hierarchical, ISAD-conforming descriptions to graph counterparts requires state-of-the-art technologies, data models and vocabularies. Our approach addresses this problem from two perspectives. The first concerns the data model and description vocabularies, as we adopt and build upon the CIDOC-CRM standard. The second is the choice of technologies to support a knowledge graph, including a graph database and an Object Graph Mapping library. The case study is the Portuguese National Archives, Torre do Tombo, and the overall goal is to build a CIDOC-CRM-compliant system for document description and retrieval, to be used by professionals and the public. The early stages described here include the design of the core data model for archival records represented as the ArchOnto ontology and its embodiment in the ArchGraph knowledge graph. The goal of a semantic archival information system will be pursued in the migration of existing records to the richer representation and the development of applications supported on the graph.

CloseRead Abstract

2019

Readability of web content An analysis by topic

Authors
Antunes, H; Lopes, CT;

Publication
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)

Abstract
Readability is determined by the characteristics of the text that influence their understanding. The web is composed of content on various topics and the results retrieved in the top positions by the main search engines are expected to be those with the highest number of views. In this study, we analyzed the readability of web pages according to the topic to which it belongs and their position in the search result. For that, we collected the top-20 results retrieved by Google to 23,779 queries from 20 topics and used several readability metrics. The results of the analysis showed that the content from organizations (like colleges and other institutions) and health-related content have lower readability values. Categories Games and Home are on the opposite side. For the categories identified as having less readability, tools can be developed that help the user understand their content. We also found that top-ranked pages have higher values of readability. One can conclude that, directly or indirectly, readability is a factor that seems to be being considered by the Google search engine or has an influence on page popularity.

CloseRead Abstract

2019

Is it a lay or medico-scientific concept? Automatic classification in two languages

Authors
Santos, PM; Lopes, CT;

Publication
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)

Abstract
Searching for health information is the third most popular activity on the Internet. There is evidence that query suggestions in lay and medico-scientific terminology improve health information retrieval by who is not a health professional. Developing systems that suggest queries in these terminologies requires knowing if concepts are lay or medico-scientific. In this paper, we propose and compare approaches to compute the degree of association of a concept to lay and medico-scientific terminology. We use different thesauri for this purpose and use the cosine similarity to measure the closeness of concepts with subsets of those thesauri. The evaluation of our approaches uses an existing glossary containing concepts in both terminologies in English and Portuguese and a and a set of queries submitted by users and classified by health professionals as lay or medical-scientific. We concluded that the best method to classify a concept uses the CHV vocabulary as a subset.

CloseRead Abstract