Publications

Publications by HumanISE

2018

Can user and task characteristics be used as predictors of success in health information retrieval sessions?

Authors
Oroszlányová, M; Lopes, CT; Nunes, S; Ribeiro, C;

Publication
INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL

Abstract
Introduction. The concept and study of relevance has been a central subject in information science. Although research in information retrieval has been focused on topical relevance, other kinds of relevance are also important and justify further study. Motivational relevance is typically inferred by criteria such as user satisfaction and success. Method. Using an existing dataset composed by an annotated set of health Web documents assessed for relevance and comprehension by a group of users, we build a multivariate prediction model for the motivational relevance of search sessions. Analysis. The analysis was based on lasso variable selection, followed by model selection using multiple logistic regression. Results. We have built two regression models; the full model, which considers all variables of the dataset, has a lower estimated prediction error than the reduced model, which contains the statistically-significant variables from the full model. The higher values of evaluation metrics, including accuracy, specificity and sensitivity in the full model support this finding. The full model has an accuracy of 91.94%, and is better at predicting motivational relevance. Conclusions. Our findings suggest features that can be considered by search engines to estimate motivational relevance, to be used in addition to topical relevance. Among these features, a high level of success in Web search and in health information search on social networks and chats are some of the most influencing user features. This shows that users with higher computer literacy might feel more satisfied and successful after completing the search tasks. In terms of task features, the results suggest that users with clearer goals feel more successful. Moreover, results show that users would benefit from the help of the system in clarifying the retrieved documents.

CloseRead Abstract

2018

Predicting the quality of health web documents using their characteristics

Authors
Oroszlányová, M; Lopes, CT; Nunes, S; Ribeiro, C;

Publication
ONLINE INFORMATION REVIEW

Abstract
Purpose The quality of consumer-oriented health information on the web has been defined and evaluated in several studies. Usually it is based on evaluation criteria identified by the researchers and, so far, there is no agreed standard for the quality indicators to use. Based on such indicators, tools have been developed to evaluate the quality of web information. The HONcode is one of such tools. The purpose of this paper is to investigate the influence of web document features on their quality, using HONcode as ground truth, with the aim of finding whether it is possible to predict the quality of a document using its characteristics. Design/methodology/approach The present work uses a set of health documents and analyzes how their characteristics (e.g. web domain, last update, type, mention of places of treatment and prevention strategies) are associated with their quality. Based on these features, statistical models are built which predict whether health-related web documents have certification-level quality. Multivariate analysis is performed, using classification to estimate the probability of a document having quality given its characteristics. This approach tells us which predictors are important. Three types of full and reduced logistic regression models are built and evaluated. The first one includes every feature, without any exclusion, the second one disregards the Utilization Review Accreditation Commission variable, due to it being a quality indicator, and the third one excludes the variables related to the HONcode principles, which might also be indicators of quality. The reduced models were built with the aim to see whether they reach similar results with a smaller number of features. Findings The prediction models have high accuracy, even without including the characteristics of Health on the Net code principles in the models. The most informative prediction model considers characteristics that can be assessed automatically (e.g. split content, type, process of revision and place of treatment). It has an accuracy of 89 percent. Originality/value This paper proposes models that automatically predict whether a document has quality or not. Some of the used features (e.g. prevention, prognosis or treatment) have not yet been explicitly considered in this context. The findings of the present study may be used by search engines to promote high-quality documents. This will improve health information retrieval and may contribute to reduce the problems caused by inaccurate information.

CloseRead Abstract

2018

Merging Datasets for Aggressive Text Identification

Authors
Fortuna, P; Ferreira, J; Pires, L; Routar, G; Nunes, S;

Publication
TRAC@COLING 2018

Abstract

2018

Merging Datasets for Hate Speech Classification in Italian

Authors
Fortuna, P; Bonavita, I; Nunes, S;

Publication
Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, December 12-13, 2018.

Abstract
This paper presents an approach to the shared task HaSpeeDe within Evalita 2018. We followed a standard machine learning procedure with training, validation, and testing phases. We considered word embedding as features and deep learning for classification. We tested the effect of merging two datasets in the classification of messages from Facebook and Twitter. We concluded that using data for training and testing from the same social network was a requirement to achieve a good performance. Moreover, adding data from a different social network allowed to improve the results, indicating that more generalized models can be an advantage.

CloseRead Abstract

2018

FEUP at TREC 2018 Common Core Track - Reranking for Diversity using Hypergraph-of-Entity and Document Profiling

Authors
Devezas, JL; Nunes, S; Guillén, A; Gutiérrez, Y; Muñoz, R;

Publication
TREC

Abstract

2018

Aspect composition for multiple target languages using LARA

Authors
Pinto, P; Carvalho, T; Bispo, J; Ramalho, MA; Cardoso, JMP;

Publication
COMPUTER LANGUAGES SYSTEMS & STRUCTURES

Abstract
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target programming language (e.g., Aspect J for JAVA and Aspect C++ for C++). Although providing AOP support with target language extensions may ease the adoption of an approach, it may impose constraints related with constructs and semantics. Furthermore, by tightly coupling the AOP language to the target language the reuse potential of many aspects, especially the ones regarding non-functional requirements, is lost. LARA is a domain-specific language inspired by AOP concepts, having the specification of source-to-source transformations as one of its main goals. LARA has been designed to be, as much as possible, independent of the target language and to provide constructs and semantics that ease the definition of concerns, especially related to non-functional requirements. In this paper, we propose techniques to overcome some of the challenges presented by a multilanguage approach to AOP of cross-cutting concerns focused on non-functional requirements and applied through the use of a weaving process. The techniques mainly focus on providing well-defined library interfaces that can have concrete implementations for each supported target language. The developer uses an agnostic interface and the weaver provides a specific implementation for the target language. We evaluate our approach using 8 concerns with varying levels of language agnosticism that support 4 target languages (C, C++, JAVA and MATLAB) and show that the proposed techniques contribute to more concise LARA aspects, high reuse of aspects, and to significant effort reductions when developing weavers for new imperative, object-oriented programming languages.

CloseRead Abstract