Publicacoes - INESC TEC

Publicações

Publicações por HumanISE

2014

Ontology-Based Multi-Domain metadata for research data management using triple stores

Autores
Silva, JRD; Ribeiro, C; Lopes, JC;

Publicação
ACM International Conference Proceeding Series

Abstract
Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. These are easy to understand and use, but their semantics are limited to general concepts, leaving out domain-specific metadata. The textual values for descriptors are easily indexed through free-text indexes, but faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because it means representing entities with attributes unknown at the time of modeling and that can change in time. Those traits, combined with the presence of hierarchies among the entities, can make the relational schema quite complex. This work demonstrates the approaches followed by current opensource platforms and proposes a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. The proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto. © 2014 ACM.

FecharLer Abstract

2014

The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Autores
da Silva, JR; Castro, JA; Ribeiro, C; Lopes, JC;

Publicação
Proceedings of the 11th International Conference on Digital Preservation, iPRES 2014, Melbourne, Australia, October 6 - 10, 2014

Abstract

2014

Beyond INSPIRE: An Ontology for Biodiversity Metadata Records

Autores
da Silva, JR; Castro, JA; Ribeiro, C; Honrado, J; Lomba, A; Goncalves, J;

Publicação
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2014 WORKSHOPS

Abstract
Managing research data often requires the creation or reuse of specialised metadata schemas to satisfy the metadata requirements of each research group. Ontologies present several advantages over metadata schemas. In particular, they can be shared and improved upon more easily, providing the flexibility required to establish relationships between datasets and concepts from distinct domains. In this paper, we present a preliminary experiment on the use of ontologies for the description of biodiversity datasets. With a strong focus on the dynamics of individual species, species diversity, biological communities and ecosystems, the Predictive Ecology research group of CIBIO has adopted the INSPIRE European recommendation as the primary tool for metadata compliance across its research data description. We build upon this experience to model the BIOME ontology for the biodiversity domain. The ontology combines concepts from INSPIRE, matching them against the ones defined in the Dublin Core, FOAF and CERIF ontologies. Dendro, a prototype for collaborative data description, uses the ontology to provide an environment where biodiversity metadata records are available as Linked Open Data.

FecharLer Abstract

2014

Identification and classification of health queries: Co-occurrences vs. domain- specific terminologies

Autores
Lopes, CT; Ribeiro, C;

Publicação
International Journal of Healthcare Information Systems and Informatics

Abstract
Identifying the user's intent behind a query is a key challenge in Information Retrieval. This information may be used to contextualize the search and provide better search results to the user. The automatic identification of queries targeting a search for health information allows the implementation of retrieval strategies specifically focused on the health domain. In this paper, two kinds of automatic methods to identify and classify health queries based on domain-specific terminology are proposed. Besides evaluating these methods, we compare them with a method that is based on co-occurrence statistics of query terms with the word "health". Although the best overall result was achieved with a variant of the co-occurrence method, the method based on domain-specific frequencies that generates a continuous output outperformed most of the other methods. Moreover, this method also allows the association of queries to the semantic tree of the Unified Medical Language System and thereafter their classification into appropriate subcategories. Copyright © 2014, IGI Global.

FecharLer Abstract

2014

WindS@UP: The e-Science Platform for WindScanner.eu

Autores
Gomes, F; Lopes, JC; Palma, JL; Ribeiro, LF;

Publicação
SCIENCE OF MAKING TORQUE FROM WIND 2014 (TORQUE 2014)

Abstract
The Wind Scanner e-Science platform architecture and the underlying premises are discussed. It is a collaborative platform that will provide a repository for experimental data and metadata. Additional data processing capabilities will be incorporated thus enabling in-situ data processing. Every resource in the platform is identified by a Uniform Resource Identifier (URI), enabling an unequivocally identification of the field(s) campaign(s) data sets and metadata associated with the data set or experience. This feature will allow the validation of field experiment results and conclusions as all managed resources will be linked. A centralised node (Hub) will aggregate the contributions of 6 to 8 local nodes from EC countries and will manage the access of 3 types of users: data-curator, data provider and researcher. This architecture was designed to ensure consistent and efficient research data access and preservation, and exploitation of new research opportunities provided by having this "Collaborative Data Infrastructure". The prototype platform-WindS@UP-enables the usage of the platform by humans via a Web interface or by machines using an internal API (Application Programming Interface). Future work will improve the vocabulary ("application profile") used to describe the resources managed by the platform.

FecharLer Abstract

2014

A model for analyzing estimation, productivity, and quality performance in the personal software process

Autores
Raza, M; Faria, JP;

Publicação
ACM International Conference Proceeding Series

Abstract
High-maturity software development processes, making intensive use of metrics and quantitative methods, such as the Team Software Process (TSP) and the accompanying Personal Software Process (PSP), can generate a significant amount of data that can be periodically analyzed to identify performance problems, determine their root causes and devise improvement actions. However, there is a lack of tool support for automating the data analysis and the recommendation of improvement actions, and hence diminish the manual effort and expert knowledge required. So, we propose in this paper a comprehensive performance model, addressing time estimation accuracy, quality and productivity, to enable the automated (tool based) analysis of performance data produced in the context of the PSP, namely, identify performance problems and their root causes, and subsequently recommend improvement actions. Performance ranges and dependencies in the model were calibrated and validated, respectively, based on a large PSP data set referring to more than 30,000 finished projects. © 2014 ACM.

FecharLer Abstract