Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

I am a PhD student in the MAPi program advised by Professor Fernando Silva and Professor Pedro Ribeiro. My main research interests are in Network Science, Parallel Computing and Data Mining.

Interest
Topics
Details

Details

  • Name

    David Oliveira Aparício
  • Cluster

    Computer Science
  • Role

    Research Assistant
  • Since

    01st December 2013
Publications

2017

Extending the Applicability of Graphlets to Directed Networks

Authors
Aparicio, D; Ribeiro, P; Silva, F;

Publication
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Abstract
With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights intomolecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Graphlets are particularly well suited to compare networks and to assess their level of similarity due to the rich topological information that they offer but are almost always used as small undirected graphs of up to five nodes, thus limiting their applicability in directed networks. However, a large set of interesting biological networks such asmetabolic, cell signaling, or transcriptional regulatory networks are intrinsically directional, and using metrics that ignore edge direction may gravely hinder information extraction. Our main purpose in this work is to extend the applicability of graphlets to directed networks by considering their edge direction, thus providing a powerful basis for the analysis of directed biological networks. We tested our approach on two network sets, one composed of synthetic graphs and another of real directed biological networks, and verified that they were more accurately grouped using directed graphlets than undirected graphlets. It is also evident that directed graphlets offer substantially more topological information than simple graph metrics such as degree distribution or reciprocity. However, enumerating graphlets in large networks is a computationally demanding task. Our implementation addresses this concern by using a state-of-the-art data structure, the g-trie, which is able to greatly reduce the necessary computation. We compared our tool to other state-of-the art methods and verified that it is the fastest general tool for graphlet counting.

2016

A Subgraph-Based Ranking System for Professional Tennis Players

Authors
Aparicio, D; Ribeiro, P; Silva, F;

Publication
Complex Networks VII - Proceedings of the 7th Workshop on Complex Networks, CompleNet 2016, Dijon, France, March 23-25, 2016.

Abstract
This paper introduces a novel ranking system for competitive sports based around the notion of subgraphs. Although the system is targeted specifically to professional tennis it could be applied to any dominance network due to its generality. The results of about 140,000 tennis matches played between Top-100 players are used to create a colored directed network where colors represent different surfaces and edge direction depends on head-to-read results between players. The main contribution of this work is a ranking system which relies on the occurrences of 4-node directed subgraphs and the positions (or orbits) where the players appear on them. Since the concept of orbit is intrinsically connected with node dominance, appearing frequently in dominant orbits indicates that the player himself is dominant. Even in a very sparse network and without any background knowledge on the tournaments or stages of the matches, our proposal is able to extract meaningful rankings which capture the intricate competitive relationships between players from different eras. © Springer International Publishing Switzerland 2016.

2015

Network comparison using directed graphlets

Authors
Aparício, DO; Ribeiro, PMP; Silva, FMA;

Publication
CoRR

Abstract

2014

Parallel Subgraph Counting for Multicore Architectures

Authors
Aparicio, D; Ribeiro, P; Silva, F;

Publication
2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA)

Abstract
Computing the frequency of small subgraphs on a large network is a computationally hard task. This is, however, an important graph mining primitive, with several applications, and here we present a novel multicore parallel algorithm for this task. At the core of our methodology lies a state-of-the-art data structure, the g-trie, which represents a collection of subgraphs and allows for a very efficient sequential search. Our implementation was done using Pthreads and can run on any multicore personal computer. We employ a diagonal work sharing strategy to dynamically and effectively divide work among threads during the execution. We assess the performance of our Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, we obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of our algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.

2014

A Scalable Parallel Approach for Subgraph Census Computation

Authors
Aparicio, D; Paredes, P; Ribeiro, P;

Publication
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II

Abstract
Counting the occurrences of small subgraphs in large networks is a fundamental graph mining metric with several possible applications. Computing frequencies of those subgraphs is also known as the subgraph census problem, which is a computationally hard task. In this paper we provide a parallel multicore algorithm for this purpose. At its core we use FaSE, an efficient network-centric sequential subgraph census algorithm, which is able to substantially decrease the number of isomorphism tests needed when compared to past approaches. We use one thread per core and employ a dynamic load balancing scheme capable of dealing with the highly unbalanced search tree induced by FaSE and effectively redistributing work during execution. We assessed the scalability of our algorithm on a varied set of representative networks and achieved near linear speedup up to 32 cores while obtaining a high efficiency for the total 64 cores of our machine.