Publications

Publications by Fernando Silva

2010

Efficient Parallel Subgraph Counting Using G-Tries

Authors
Pinto Ribeiro, PM; Silva, FMA; Lopes, LMB;

Publication
Proceedings of the 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, 20-24 September, 2010

Abstract
Finding and counting the occurrences of a collection of subgraphs within another larger network is a computationally hard problem, closely related to graph isomorphism. The subgraph count is by itself a very powerful characterization of a network and it is crucial for other important network measurements. G-tries are a specialized data-structure designed to store and search for subgraphs. By taking advantage of subgraph common substructure, g-tries can provide considerable speedups over previously used methods. In this paper we present a parallel algorithm based precisely on gtries that is able to efficiently find and count subgraphs. The algorithm relies on randomized receiver-initiated dynamic load balancing and is able to stop its computation at any given time, efficiently store its search position, divide what is left to compute in two halfs, and resume from where it left. We apply our algorithm to several representative real complex networks from various domains and examine its scalability. We obtain an almost linear speedup up to 128 processors, thus allowing us to reach previously unfeasible limits. We showcase the multidisciplinary potential of the algorithm by also applying it to network motif discovery. © 2010 IEEE.

CloseRead Abstract

2009

Strategies for Network Motifs Discovery

Authors
Pinto Ribeiro, PM; Silva, FMA; Kaiser, M;

Publication
Fifth International Conference on e-Science, e-Science 2009, 9-11 December 2009, Oxford, UK

Abstract
Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" - a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks.We hope to inform the choice of strategy and critically discuss future improvements in motif detection. © 2009 IEEE.

CloseRead Abstract

1998

Distribution and Mobility with Lexical Scoping in Process Calculi

Authors
Vasconcelos, VT; Lopes, LMB; Silva, FMA;

Publication
Electr. Notes Theor. Comput. Sci.

Abstract
We propose a simple model of distribution for mobile processes, independent of the underlying calculus. Conventional processes compute within sites; inter-site computation is achieved by message sending and object migration, both obeying a lexical scope. We focus on the semantics of networks, on programming practice, and on physical realization with current technology. ©1998 Published by Elsevier Science B.V.

CloseRead Abstract

2012

Motif Mining in Weighted Networks

Authors
Choobdar, S; Ribeiro, P; Silva, F;

Publication
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012)

Abstract
Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the g-trie data structure to find instances of k-sized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of co-authorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.

CloseRead Abstract

2006

April - An inductive logic programming system

Authors
Fonseca, NA; Silva, F; Camacho, R;

Publication
LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
Inductive Logic Programming (ILP) is a Machine Learning research field that has been quite successful in knowledge discovery in relational domains. ILP systems use a set of pre-classified examples (positive and negative) and prior knowledge to learn a theory in which positive examples succeed and the negative examples fail. In this paper we present a novel ILP system called April, capable of exploring several parallel strategies in distributed and shared memory machines.

CloseRead Abstract

2009

BIORED - A Genetic Algorithm for Pattern Detection in Biosequences

Authors
Pereira, P; Silva, F; Fonseca, NA;

Publication
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)

Abstract
We present a new, efficient and scalable tool, named BIORED, for pattern discovery in proteomic and genomic sequences. It uses a genetic algorithm to find interesting patterns in the form of regular expressions, and a new efficient pattern matching procedure to count pattern occurrences. We studied the performance, scalability and usefulness of BIORED using several databases of biosequences. The results show that BIORED was successful in finding previously known patterns, thus an excellent indicator for its potential. BIORED is available for download under the GNU Public License at http://www.dcc.fc.up.pt/bi-ored/. An online demo is available at the same address.

CloseRead Abstract