Publications

Publications by HumanISE

2015

Data Warehouses in MongoDB vs SQL Server A comparative analysis of the querie performance

Authors
Pereira, D; Oliveira, P; Rodrigues, F;

Publication
PROCEEDINGS OF THE 2015 10TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2015)

Abstract
Due to its historical nature, data warehouses require that large volumes of data need to be stored in their repositories. Some organizations are beginning to have problems to manage and analyze these huge volumes of data. This is due, in large part, to the relational databases which are the primary method of data storage in a data warehouse, and start underperforming, crumbling under the weight of the data stored. In opposition to these systems, arise the NoSQL databases that are associated with the storage of very large volumes of data inherent to the Big Data paradigm. Thus, this article focuses on the study of the feasibility and the implications of the adoption of a NoSQL database, within the data warehousing context. MongoDB was selected to represent the NoSQL systems in this investigation. In this paper will be explained the processes required to design the structure of a data warehouse and typically dimensional queries in the MongoDB system. The undertaken research culminates in the performance analysis of queries executed in a traditional data warehouse, based on the SQL Server system, and an equivalent data warehouse based on the MongoDB system.

CloseRead Abstract

2014

ViBest SHM: an information system and data repository for structural health monitoring

Authors
da Costa, FP; Cunha, A; David, G;

Publication
EURODYN 2014: IX INTERNATIONAL CONFERENCE ON STRUCTURAL DYNAMICS

Abstract
This project has been motivated by the need to standardize, preserve, and share the data sets of the Laboratory of Vibrations and Structural Monitoring (ViBest, www.fe.up.pt/vibest) of FEUP, produced by several long term projects individually managed. The solution presented is meant to support the process of Structural Health Monitoring, offering features to catalogue the projects, their goals and components, to store and visualize their acquired and processed data through time, and to preserve the data in a standardized form for all the research unit and extensible to future applications. The result is a digital archive with automatic ingestion of new data files and a Web interface with access control and tools for information management. There is a batch export functionality to deal with large data transfers. It is being used on monitoring data related with different kinds of structural health monitoring applications. The standardization and preservation of all data sets acquired in multiple applications will be certainly a solid basis for further research, either at a local basis or in the context of international joint cooperation.

CloseRead Abstract

2014

Creating lightweight ontologies for dataset description Practical applications in a cross-domain research data management workflow

Authors
Castro, JA; da Silva, JR; Ribeiro, C;

Publication
2014 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL)

Abstract
The description of data is a central task in research data management. Describing datasets requires deep knowledge of both the data and the data creation process to ensure adequate capture of their meaning and context. Metadata schemas are usually followed in resource description to enforce comprehensiveness and interoperability, but they can be hard to understand and adopt by researchers. We propose to address data description using ontologies, which can evolve easily, express semantics at different granularity levels and be directly used in system development. Considering that existing ontologies are often hard to use in a cross domain research data management environment, we present an approach for creating lightweight ontologies to describe research data. We illustrate our process with two ontologies, and then use them as configuration parameters for Dendro, a software platform for research data management currently being developed at the University of Porto.

CloseRead Abstract

2014

Dendro: Collaborative Research Data Management Built on Linked Open Data

Authors
da Silva, JR; Castro, JA; Ribeiro, C; Lopes, JC;

Publication
SEMANTIC WEB: ESWC 2014 SATELLITE EVENTS

Abstract
Research datasets in the so-called "long-tail of science" are easily lost after their primary use. Support for preservation, if available, is hard to fit in the research agenda. Our previous work has provided evidence that dataset creators are motivated to spend time on data description, especially if this also facilitates data exchange within a group or a project. This activity should take place early in the data generation process, when it can be regarded as an actual part of data creation. We present the first prototype of the Dendro platform, designed to help researchers use concepts from domain-specific ontologies to collaboratively describe and share datasets within their groups. Unlike existing solutions, ontologies are used at the core of the data storage and querying layer, enabling users to establish meaningful domain-specific links between data, for any domain. The platform is currently being tested with research groups from the University of Porto.

CloseRead Abstract

2014

LabTablet: Semantic Metadata Collection on a Multi-domain Laboratory Notebook

Authors
Amorim, RC; Castro, JA; da Silva, JR; Ribeiro, C;

Publication
METADATA AND SEMANTICS RESEARCH, MTSR 2014

Abstract
The value of research data is recognized, and so is the importance of the associated metadata to contextualize, describe and ultimately render them understandable in the long term. Laboratory notebooks are an excellent source of domain-specific metadata, but this paper-based approach can pose risks of data loss, while limiting the possibilities of collaborative metadata production. The paper discusses the advantages of tools to complement paper-based laboratory notebooks in capturing metadata, regardless of the research domain. We propose LabTablet, an electronic laboratory book aimed at the collection of metadata from the early stages of the research workflow. To evaluate the use of LabTablet and the proposed workflow, researchers in two domains were asked to perform a set of tasks and provided insights about their experience. By rethinking the workflow and helping researchers to actively contribute to data description, the research outputs can be described with generic and domain-dependent metadata, thus improving their chances of being deposited, reused and preserved.

CloseRead Abstract

2014

Ontology-Based Multi-Domain metadata for research data management using triple stores

Authors
Silva, JRD; Ribeiro, C; Lopes, JC;

Publication
ACM International Conference Proceeding Series

Abstract
Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. These are easy to understand and use, but their semantics are limited to general concepts, leaving out domain-specific metadata. The textual values for descriptors are easily indexed through free-text indexes, but faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because it means representing entities with attributes unknown at the time of modeling and that can change in time. Those traits, combined with the presence of hierarchies among the entities, can make the relational schema quite complex. This work demonstrates the approaches followed by current opensource platforms and proposes a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. The proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto. © 2014 ACM.

CloseRead Abstract