2019
Authors
Lopes, CT; Sousa, H;
Publication
PROCEEDINGS OF THE 2019 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL (CHIIR'19)
Abstract
Health consumers usually face difficulties on their online searches, mainly because of the differences between terminologies used by laypeople and health professionals. This work presents a tool, HealthTranslator, available as a Google Chrome extension that intends to reduce this terminological gap while users are searching the Web for health information. HealthTranslator automatically annotates medical concepts in web documents, providing additional information, such as concept definition, related concepts and links to external references. The solution was evaluated regarding its: ( a) performance-the document processing is done gradually, typically from the top to the bottom of the document and performance was not an issue raised by the users; ( b) concept coverage-the solution was compared to a similar extension performing in English recognizing significantly more concepts. A comparison with a corpus of Portuguese documents manually annotated with medical concepts showed an average F-measure between 27% and 33%, depending on the type of concepts being recognized; ( c) users' receptivity to HealthTranslator and its usability-many aspects were surveyed on a user study. In general, the extension has a good acceptance and users find it useful.
2019
Authors
Domingues, G; Lopes, CT;
Publication
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 )
Abstract
Wikipedia is the largest on-line collaborative encyclopedia, containing information from a plethora of fields, including medicine. It has been shown that Wikipedia is one of the top visited sites by readers looking for information on this topic. The large reliance on Wikipedia for this type of information drives research towards the analysis of the quality of its articles. In this work, we evaluate and compare the quality of medicine-related articles in the English and Portuguese Wikipedia. For that we use metrics such as authority, completeness, complexity, informativeness, consistency, currency and volatility, and domain-specific measurements, in order to evaluate and compare the quality of medicine related articles in the English and Portuguese Wikipedia. We were able to conclude that the English articles score better across most metrics than the Portuguese articles.
2019
Authors
Antunes, H; Lopes, CT;
Publication
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)
Abstract
Readability is determined by the characteristics of the text that influence their understanding. The web is composed of content on various topics and the results retrieved in the top positions by the main search engines are expected to be those with the highest number of views. In this study, we analyzed the readability of web pages according to the topic to which it belongs and their position in the search result. For that, we collected the top-20 results retrieved by Google to 23,779 queries from 20 topics and used several readability metrics. The results of the analysis showed that the content from organizations (like colleges and other institutions) and health-related content have lower readability values. Categories Games and Home are on the opposite side. For the categories identified as having less readability, tools can be developed that help the user understand their content. We also found that top-ranked pages have higher values of readability. One can conclude that, directly or indirectly, readability is a factor that seems to be being considered by the Google search engine or has an influence on page popularity.
2019
Authors
Santos, PM; Lopes, CT;
Publication
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)
Abstract
Searching for health information is the third most popular activity on the Internet. There is evidence that query suggestions in lay and medico-scientific terminology improve health information retrieval by who is not a health professional. Developing systems that suggest queries in these terminologies requires knowing if concepts are lay or medico-scientific. In this paper, we propose and compare approaches to compute the degree of association of a concept to lay and medico-scientific terminology. We use different thesauri for this purpose and use the cosine similarity to measure the closeness of concepts with subsets of those thesauri. The evaluation of our approaches uses an existing glossary containing concepts in both terminologies in English and Portuguese and a and a set of queries submitted by users and classified by health professionals as lay or medical-scientific. We concluded that the best method to classify a concept uses the CHV vocabulary as a subset.
2019
Authors
Lopes, CT; Moura, D;
Publication
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)
Abstract
Classifying web queries into a set of categories is a crucial task to better understand the user's intent behind a query, contextualize their search and provide more relevant results to the user. However, web queries are typically short and ambiguous making the query classification a non-trivial problem. In this article, we present a new automatic approach for identifying and characterizing queries in the health domain. This method makes use of the search engine counts through a semantic similarity measure called Normalized Google Distance (NGD) combined with Support Vector Machines to classify queries into three dimensions: health-related, severity and semantic type. To evaluate our methods, we used two datasets in different languages, Portuguese and English, and built another for evaluating the last dimension. Overall, the results achieved were satisfactory. The most generic classification obtains better results than more specific ones. The NGD proved to be a valuable assent in query classification.
2019
Authors
Antunes, H; Lopes, CT;
Publication
Experimental IR Meets Multilinguality, Multimodality, and Interaction - 10th International Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9-12, 2019, Proceedings
Abstract
Readability is a linguistic feature that indicates how difficult it is to read a text. Traditional readability formulas were made for the English language. This study evaluates their adequacy to the Portuguese language. We applied the traditional formulas in 10 parallel corpora. We verified that the Portuguese language had higher grade scores (less readability) in the formulas that use the number of syllables per words or number of complex words per sentence. Formulas that use letters by words instead of syllables by words output similar grade scores. Considering this, we evaluated the correlation of the complex words in 65 Portuguese school books of 12 schooling years. We found out that the concept of complex word as a word with 4 or more syllables, instead of 3 or more syllables as originally used in traditional formulas applied to English texts, is more correlated with the grade of Portuguese school books. In the end, for each traditional readability formula, we adapted it to the Portuguese language performing a multiple linear regression in the same dataset of school books. © Springer Nature Switzerland AG 2019.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.