Publications

Publications by Pedro Manuel Ribeiro

2015

Rand-FaSE: fast approximate subgraph census

Authors
Paredes, P; Ribeiro, P;

Publication
SOCIAL NETWORK ANALYSIS AND MINING

Abstract
Determining the frequency of small subgraphs is an important graph mining primitive. One major class of algorithms for this task is based upon the enumeration of all sets of k connected nodes. These are known as network-centric algorithms. FAst Subgraph Enumeration (FaSE) is a exact algorithm for subgraph counting that contrasted with its past approaches by performing the isomorphism tests while doing the enumeration, encapsulating the topological information in a g-trie and thus largely reducing the number of required isomorphism tests. Our goal with this paper is to expand this approach by providing an approximate algorithm, which we called Rand-FaSE. It uses an unbiased sampling estimator for the number of subgraphs of each type, allowing an user to trade some accuracy for even faster execution times. We tested our algorithm on a set of representative complex networks, comparing it with the exact alternative, FaSE. We also do an extensive analysis by studying its accuracy and speed gains against previous sampling approaches. With all of this, we believe FaSE and Rand-FaSE pave the way for faster network-centric census algorithms.

CloseRead Abstract

2013

Towards a Faster Network-Centric Subgraph Census

Authors
Paredes, P; Ribeiro, P;

Publication
2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM)

Abstract
Determining the frequency of small subgraphs is an important computational task lying at the core of several graph mining methodologies, such as network motifs discovery or graphlet based measurements. In this paper we try to improve a class of algorithms available for this purpose, namely network-centric algorithms, which are based upon the enumeration of all sets of k connected nodes. Past approaches would essentially delay isomorphism tests until they had a finalized set of k nodes. In this paper we show how isomorphism testing can be done during the actual enumeration. We use a customized g-trie, a tree data structure, in order to encapsulate the topological information of the embedded subgraphs, identifying already known node permutations of the same subgraph type. With this we avoid redundancy and the need of an isomorphism test for each subgraph occurrence. We tested our algorithm, which we called FaSE, on a set of different real complex networks, both directed and undirected, showcasing that we indeed achieve significant speedups of at least one order of magnitude against past algorithms, paving the way for a faster network-centric approach.

CloseRead Abstract

2016

A Subgraph-Based Ranking System for Professional Tennis Players

Authors
Aparicio, D; Ribeiro, P; Silva, F;

Publication
COMPLEX NETWORKS VII

Abstract
This paper introduces a novel ranking system for competitive sports based around the notion of subgraphs. Although the system is targeted specifically to professional tennis it could be applied to any dominance network due to its generality. The results of about 140,000 tennis matches played between Top-100 players are used to create a colored directed network where colors represent different surfaces and edge direction depends on head-to-read results between players. The main contribution of this work is a ranking system which relies on the occurrences of 4-node directed subgraphs and the positions (or orbits) where the players appear on them. Since the concept of orbit is intrinsically connected with node dominance, appearing frequently in dominant orbits indicates that the player himself is dominant. Even in a very sparse network and without any background knowledge on the tournaments or stages of the matches, our proposal is able to extract meaningful rankings which capture the intricate competitive relationships between players from different eras.

CloseRead Abstract

2014

A Scalable Parallel Approach for Subgraph Census Computation

Authors
Aparicio, D; Paredes, P; Ribeiro, P;

Publication
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II

Abstract
Counting the occurrences of small subgraphs in large networks is a fundamental graph mining metric with several possible applications. Computing frequencies of those subgraphs is also known as the subgraph census problem, which is a computationally hard task. In this paper we provide a parallel multicore algorithm for this purpose. At its core we use FaSE, an efficient network-centric sequential subgraph census algorithm, which is able to substantially decrease the number of isomorphism tests needed when compared to past approaches. We use one thread per core and employ a dynamic load balancing scheme capable of dealing with the highly unbalanced search tree induced by FaSE and effectively redistributing work during execution. We assessed the scalability of our algorithm on a varied set of representative networks and achieved near linear speedup up to 32 cores while obtaining a high efficiency for the total 64 cores of our machine.

CloseRead Abstract

2017

Scalable subgraph counting using MapReduce

Authors
Eddin, AN; Pinto Ribeiro, PM;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
Networks are powerful in representing a wide variety of systems in many fields of study. Networks are composed of smaller substructures (subgraphs) that characterize them and give important information related to their topology and functionality. Therefore, discovering and counting these subgraph patterns is very important towards mining the features of networks. Algorithmically, subgraph counting in a network is a computationally hard problem and the needed execution time grows exponentially as the size of the subgraph or the network increases. The main goal of this paper is to contribute towards subgraph search, by providing an accessible and scalable parallel methodology for counting subgraphs. For that we present a dynamic iterative MapReduce strategy to parallelize algorithms that induce an unbalanced search tree, and apply it in the subgraph counting realm. At the core of our methods lies the g-trie, a state-of-the-art data structure that was created precisely for this task. Our strategy employs an adaptive time threshold and an efficient work-sharing mechanism to dynamically do load balancing between the workers. We evaluate our implementations using Spark on a large set of representative complex networks from different fields. The results obtained are very promising and we achieved a consistent and almost linear speedup up to 32 cores, with an average efficiency close to 80%. To the best of our knowledge this is the fastest and most scalable method for subgraph counting within the MapReduce programming model. Copyright 2017 ACM.

CloseRead Abstract

2017

TensorCast: Forecasting with Context using Coupled Tensors

Authors
Araujo, M; Ribeiro, P; Faloutsos, C;

Publication
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)

Abstract
Given an heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TENSORCAST, a novel method that forecasts time-evolving networks more accurately than current state of the art methods by incorporating multiple data sources in coupled tensors. TENSORCAST is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with different structure. We run our method on multiple real-world networks, including DBLP and a Twitter temporal network with over 310 million non-zeros, where we predict the evolution of the activity of the use of political hashtags.

CloseRead Abstract