2015
Authors
Castro, JA; Perrotta, D; Amorim, RC; da Silva, JR; Ribeiro, C;
Publication
METADATA AND SEMANTICS RESEARCH, MTSR 2015
Abstract
Data description is an essential part of research data management, and it is easy to argue for the importance of describing data early in the research workflow. Specific metadata schemas are often proposed to support description. Given the diversity of research domains, such schemas are often missing, and when available they may be too generic, too complex or hard to incorporate in a description platform. In this paper we present a method used to design metadata models for research data description as ontologies. Ontologies are gaining acceptance as knowledge representation structures, and we use them here in the scope of the Dendro platform. The ontology design process is illustrated with a case study from Vehicle Simulation. According to the design process, the resulting model was validated by a domain specialist.
2015
Authors
Kar, M; Nunes, S; Ribeiro, C;
Publication
INFORMATION PROCESSING & MANAGEMENT
Abstract
In the area of Information Retrieval, the task of automatic text summarization usually assumes a static underlying collection of documents, disregarding the temporal dimension of each document. However, in real world settings, collections and individual documents rarely stay unchanged over time. The World Wide Web is a prime example of a collection where information changes both frequently and significantly over time, with documents being added, modified or just deleted at different times. In this context, previous work addressing the summarization of web documents has simply discarded the dynamic nature of the web, considering only the latest published version of each individual document. This paper proposes and addresses a new challenge - the automatic summarization of changes in dynamic text collections. In standard text summarization, retrieval techniques present a summary to the user by capturing the major points expressed in the most recent version of an entire document in a condensed form. In this new task, the goal is to obtain a summary that describes the most significant changes made to a document during a given period. In other words, the idea is to have a summary of the revisions made to a document over a specific period of time. This paper proposes different approaches to generate summaries using extractive summarization techniques. First, individual terms are scored and then this information is used to rank and select sentences to produce the final summary. A system based on Latent Dirichlet Allocation model (LDA) is used to find the hidden topic structures of changes. The purpose of using the LDA model is to identify separate topics where the changed terms from each topic are likely to carry at least one significant change. The different approaches are then compared with the previous work in this area. A collection of articles from Wikipedia, including their revision history, is used to evaluate the proposed system. For each article, a temporal interval and a reference summary from the article's content are selected manually. The articles and intervals in which a significant event occurred are carefully selected. The summaries produced by each of the approaches are evaluated comparatively to the manual summaries using ROUGE metrics. It is observed that the approach using the LDA model outperforms all the other approaches. Statistical tests reveal that the differences in ROUGE scores for the LDA-based approach is statistically significant at 99% over baseline.
2015
Authors
Amorim, RC; Castro, JA; da Silva, JR; Ribeiro, C;
Publication
LANGUAGES, APPLICATIONS AND TECHNOLOGIES, SLATE 2015
Abstract
Dealing with research data management can be a complex task, and recent guidelines prompt researchers to actively participate in this activity. Emergent research data platforms are proposing workflows to motivate researchers to take an active role in the management of their data. Other tools, such as electronic laboratory notebooks, can be embedded in the laboratory environment to ease the collection of valuable data and metadata as soon as it is available. This paper reports an extension of the previously developed LabTablet application to gather data and metadata for different research domains. Along with this extension, we present a case study from the social sciences, concerning the identification of the data description requirements for one of its domains. We argue that the LabTablet can be crucial to engage researchers in data organization and description. After starting the process, researchers can then manage their data in Dendro, a staging platform with stronger, collaborative management capabilities, which allows them to export their annotated datasets to selected research data repositories.
2015
Authors
Oroszlanyova, M; Ribeiro, C; Nunes, S; Lopes, CT;
Publication
CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015
Abstract
Search engines typically estimate relevance using features of the documents. We believe that several features from the user and task can also contribute to this process. In the health domain there are specific characteristics of web documents that can also add value to this estimation. In the present work, using a dataset composed by set of annotated web pages and their assessment by a set of users regarding their relevance and comprehension, we analyse what characteristics affect documents' relevance and what characteristics influence how well users comprehend them. We have conducted a bivariate analysis using characteristics of the above data collection. The strongest relations we have found are linked to the task features, suggesting a direct association between tasks' clarity and easiness and both the relevance and the comprehension of the content. The language of the document, its medical certification, the update status, the content in pathology definitions, the content in prevention, prognosis and treatment information, are other characteristics valued by consumers in terms of relevance. Users' previous experience on health searches and, particularly, on the topic being searched, their gender, the language and terminology of their queries were shown to be related to their success in the search tasks. We have also found that lay terminology, knowledge about the medico-scientific terms and the language of the documents are good indicators of comprehension. Documents containing links and testimonies, and the ones recently updated were observed to be better understood by users, as well as blog posts and comments. (C) 2015 The Authors. Published by Elsevier B.V.
2015
Authors
Barroso, I; Hartmann, N; Ribeiro, C;
Publication
Journal of Library Metadata
Abstract
The Biblioteca Digital de Arte (BDArt) Digital Library hosted by the Thematic Repository at the University of Porto (Repositório Tem ático da U.Porto) aggregates documents from the library and the archive collections belonging to the Fine Arts School of the University of Porto (Faculdade de Belas Artes da U.Porto). This school has a museum collection containing a significant set of world-class ob- jects managed with distinct processes and tools from those currently used in libraries and archives elsewhere. Interoperability between the collections of the archive, the library, and the museum is necessary because many works allocated to different collections are closely related and can only be seen as a whole by cross-collection search functionalities. The goal of this work, the first of its kind to be developed at the University of Porto (U. Porto), is to integrate the museum collection with archives and library collections in the repository and to use an open-source technology (DSpace). Our experiment involved the selection of appropriate representations of the objects and the definition of a metadata crosswalk between the original metadata standards and qualified Dublin Core. As a result, we created the BDA Museum Collection as a BDArt subcom- munity using an XML export procedure that we expect to be helpful in future developments of other museum collections in the Thematic Repository at U.Porto. © Isabel Barroso, Nadia Hartmann, and Cristina Ribeiro.
2015
Authors
Oroszlányová, M; Ribeiro, C; Nunes, S; Lopes, CT;
Publication
Conference on ENTERprise Information Systems/International Conference on Project MANagement/Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN/HCist 2015, Vilamoura, Portugal, October 7-9, 2015.
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.