Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2015

Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model

Authors
Kar, M; Nunes, S; Ribeiro, C;

Publication
INFORMATION PROCESSING & MANAGEMENT

Abstract
In the area of Information Retrieval, the task of automatic text summarization usually assumes a static underlying collection of documents, disregarding the temporal dimension of each document. However, in real world settings, collections and individual documents rarely stay unchanged over time. The World Wide Web is a prime example of a collection where information changes both frequently and significantly over time, with documents being added, modified or just deleted at different times. In this context, previous work addressing the summarization of web documents has simply discarded the dynamic nature of the web, considering only the latest published version of each individual document. This paper proposes and addresses a new challenge - the automatic summarization of changes in dynamic text collections. In standard text summarization, retrieval techniques present a summary to the user by capturing the major points expressed in the most recent version of an entire document in a condensed form. In this new task, the goal is to obtain a summary that describes the most significant changes made to a document during a given period. In other words, the idea is to have a summary of the revisions made to a document over a specific period of time. This paper proposes different approaches to generate summaries using extractive summarization techniques. First, individual terms are scored and then this information is used to rank and select sentences to produce the final summary. A system based on Latent Dirichlet Allocation model (LDA) is used to find the hidden topic structures of changes. The purpose of using the LDA model is to identify separate topics where the changed terms from each topic are likely to carry at least one significant change. The different approaches are then compared with the previous work in this area. A collection of articles from Wikipedia, including their revision history, is used to evaluate the proposed system. For each article, a temporal interval and a reference summary from the article's content are selected manually. The articles and intervals in which a significant event occurred are carefully selected. The summaries produced by each of the approaches are evaluated comparatively to the manual summaries using ROUGE metrics. It is observed that the approach using the LDA model outperforms all the other approaches. Statistical tests reveal that the differences in ROUGE scores for the LDA-based approach is statistically significant at 99% over baseline.

2015

Engaging Researchers in Data Management with LabTablet, an Electronic Laboratory Notebook

Authors
Amorim, RC; Castro, JA; da Silva, JR; Ribeiro, C;

Publication
LANGUAGES, APPLICATIONS AND TECHNOLOGIES, SLATE 2015

Abstract
Dealing with research data management can be a complex task, and recent guidelines prompt researchers to actively participate in this activity. Emergent research data platforms are proposing workflows to motivate researchers to take an active role in the management of their data. Other tools, such as electronic laboratory notebooks, can be embedded in the laboratory environment to ease the collection of valuable data and metadata as soon as it is available. This paper reports an extension of the previously developed LabTablet application to gather data and metadata for different research domains. Along with this extension, we present a case study from the social sciences, concerning the identification of the data description requirements for one of its domains. We argue that the LabTablet can be crucial to engage researchers in data organization and description. After starting the process, researchers can then manage their data in Dendro, a staging platform with stronger, collaborative management capabilities, which allows them to export their annotated datasets to selected research data repositories.

2015

The influence of documents, users and tasks on the relevance and comprehension of health web documents

Authors
Oroszlanyova, M; Ribeiro, C; Nunes, S; Lopes, CT;

Publication
CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015

Abstract
Search engines typically estimate relevance using features of the documents. We believe that several features from the user and task can also contribute to this process. In the health domain there are specific characteristics of web documents that can also add value to this estimation. In the present work, using a dataset composed by set of annotated web pages and their assessment by a set of users regarding their relevance and comprehension, we analyse what characteristics affect documents' relevance and what characteristics influence how well users comprehend them. We have conducted a bivariate analysis using characteristics of the above data collection. The strongest relations we have found are linked to the task features, suggesting a direct association between tasks' clarity and easiness and both the relevance and the comprehension of the content. The language of the document, its medical certification, the update status, the content in pathology definitions, the content in prevention, prognosis and treatment information, are other characteristics valued by consumers in terms of relevance. Users' previous experience on health searches and, particularly, on the topic being searched, their gender, the language and terminology of their queries were shown to be related to their success in the search tasks. We have also found that lay terminology, knowledge about the medico-scientific terms and the language of the documents are good indicators of comprehension. Documents containing links and testimonies, and the ones recently updated were observed to be better understood by users, as well as blog posts and comments. (C) 2015 The Authors. Published by Elsevier B.V.

2015

Metadata Crosswalk for a museum collection in a Thematic digital library

Authors
Barroso, I; Hartmann, N; Ribeiro, C;

Publication
Journal of Library Metadata

Abstract
The Biblioteca Digital de Arte (BDArt) Digital Library hosted by the Thematic Repository at the University of Porto (Repositório Tem ático da U.Porto) aggregates documents from the library and the archive collections belonging to the Fine Arts School of the University of Porto (Faculdade de Belas Artes da U.Porto). This school has a museum collection containing a significant set of world-class ob- jects managed with distinct processes and tools from those currently used in libraries and archives elsewhere. Interoperability between the collections of the archive, the library, and the museum is necessary because many works allocated to different collections are closely related and can only be seen as a whole by cross-collection search functionalities. The goal of this work, the first of its kind to be developed at the University of Porto (U. Porto), is to integrate the museum collection with archives and library collections in the repository and to use an open-source technology (DSpace). Our experiment involved the selection of appropriate representations of the objects and the definition of a metadata crosswalk between the original metadata standards and qualified Dublin Core. As a result, we created the BDA Museum Collection as a BDArt subcom- munity using an XML export procedure that we expect to be helpful in future developments of other museum collections in the Thematic Repository at U.Porto. © Isabel Barroso, Nadia Hartmann, and Cristina Ribeiro.

2015

The Influence of Documents, Users and Tasks on the Relevance and Comprehension of Health Web Documents

Authors
Oroszlányová, M; Ribeiro, C; Nunes, S; Lopes, CT;

Publication
Conference on ENTERprise Information Systems/International Conference on Project MANagement/Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN/HCist 2015, Vilamoura, Portugal, October 7-9, 2015.

Abstract

2015

An Approach for Automated Scenario-based Testing of Distributed and Heterogeneous Systems

Authors
Lima, B; Faria, JP;

Publication
ICSOFT-EA 2015 - Proceedings of the 10th International Conference on Software Engineering and Applications, Colmar, Alsace, France, 20-22 July, 2015.

Abstract
The growing dependence of our society on increasingly complex software systems, makes software testing ever more important and challenging. In many domains, such as healthcare and transportation, several independent systems, forming a heterogeneous and distributed system of systems, are involved in the provisioning of endto- end services to users. However, existing testing techniques, namely in the model-based testing field, provide little tool support for properly testing such systems. Hence, in this paper, we propose an approach and a toolset architecture for automating the testing of end-to-end services in distributed and heterogeneous systems. The tester interacts with a visual modeling frontend to describe key behavioral scenarios, invoke test generation and execution, and visualize test results and coverage information back in the model. The visual modeling notation is converted to a formal notation amenable for runtime interpretation in the backend. A distributed test monitoring and control infrastructure is responsible for interacting with the components of the system under test, as test driver, monitor and stub. At the core of the toolset, a test execution engine coordinates test execution and checks the conformance of the observed execution trace with the expectations derived from the visual model. A real world example from the Ambient Assisted Living domain is presented to illustrate the approach.

  • 417
  • 667