Publications

Publications by Pedro Manuel Ribeiro

2015

Discovering Weighted Motifs in Gene co-expression Networks

Authors
Choobdar, S; Ribeiro, P; Silva, F;

Publication
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II

Abstract
An important dimension of complex networks is embedded in the weights of its edges. Incorporating this source of information on the analysis of a network can greatly enhance our understanding of it. This is the case for gene co-expression networks, which encapsulate information about the strength of correlation between gene expression profiles. Classical un-weighted gene co-expression networks use thresholding for defining connectivity, losing some of the information contained in the different connection strengths. In this paper, we propose a mining method capable of extracting information from weighted gene co-expression networks. We study groups of differently connected nodes and their importance as network motifs. We define a subgraph as a motif if the weights of edges inside the subgraph hold a significantly different distribution than what would be found in a random distribution. We use the Kolmogorov-Smirnov test to calculate the significance score of the subgraph, avoiding the time consuming generation of random networks to determine statistic significance. We apply our approach to gene co-expression networks related to three different types of cancer and also to two healthy datasets. The structure of the networks is compared using weighted motif profiles, and our results show that we are able to clearly distinguish the networks and separate them by type. We also compare the biological relevance of our weighted approach to a more classical binary motif profile, where edges are unweighted. We use shared Gene Ontology annotations on biological processes, cellular components and molecular functions. The results of gene enrichment analysis show that weighted motifs are biologically more significant than the binary motifs.

CloseRead Abstract

2015

Dynamic inference of social roles in information cascades

Authors
Choobdar, S; Ribeiro, P; Parthasarathy, S; Silva, F;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Nodes in complex networks inherently represent different kinds of functional or organizational roles. In the dynamic process of an information cascade, users play different roles in spreading the information: some act as seeds to initiate the process, some limit the propagation and others are in-between. Understanding the roles of users is crucial in modeling the cascades. Previous research mainly focuses on modeling users behavior based upon the dynamic exchange of information with neighbors. We argue however that the structural patterns in the neighborhood of nodes may already contain enough information to infer users' roles, independently from the information flow in itself. To approach this possibility, we examine how network characteristics of users affect their actions in the cascade. We also advocate that temporal information is very important. With this in mind, we propose an unsupervised methodology based on ensemble clustering to classify users into their social roles in a network, using not only their current topological positions, but also considering their history over time. Our experiments on two social networks, Flickr and Digg, show that topological metrics indeed possess discriminatory power and that different structural patterns correspond to different parts in the process. We observe that user commitment in the neighborhood affects considerably the influence score of users. In addition, we discover that the cohesion of neighborhood is important in the blocking behavior of users. With this we can construct topological fingerprints that can help us in identifying social roles, based solely on structural social ties, and independently from nodes activity and how information flows.

CloseRead Abstract

2016

FastStep: Scalable Boolean Matrix Decomposition

Authors
Araujo, M; Ribeiro, P; Faloutsos, C;

Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I

Abstract
Matrix Decomposition methods are applied to a wide range of tasks, such as data denoising, dimensionality reduction, co-clustering and community detection. However, in the presence of boolean inputs, common methods either do not scale or do not provide a boolean reconstruction, which results in high reconstruction error and low interpretability of the decomposition. We propose a novel step decomposition of boolean matrices in non-negative factors with boolean reconstruction. By formulating the problem using threshold operators and through suitable relaxation of this problem, we provide a scalable algorithm that can be applied to boolean matrices with millions of non-zero entries. We show that our method achieves significantly lower reconstruction error when compared to standard state of the art algorithms. We also show that the decomposition keeps its interpretability by analyzing communities in a flights dataset (where the matrix is interpreted as a graph in which nodes are airports) and in a movie-ratings dataset with 10 million non-zeros.

CloseRead Abstract

2015

Network comparison using directed graphlets

Authors
Aparício, DO; Ribeiro, PMP; Silva, FMA;

Publication
CoRR

Abstract

2015

Pairwise structural role mining for user categorization in information cascades

Authors
Choobdar, S; Ribeiro, P; Silva, F;

Publication
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015)

Abstract
It is well known that many social networks follow the homophily principle, dictating that individuals tend to connect with similar peers. Past studies focused on non-topological properties, such as the age, gender, beliefs or educations. In this paper we focus precisely on the topology itself, exploring the possible existence of pairwise role dependency, that is, purely structural homophily. We show that while pairwise dependency is necessary for some structural roles, it may be misleading for others. We also present SR-Diffuse, a novel method for identifying the structural roles of nodes within a network. It is an iterative algorithm following an optimization model able to learn simultaneously from topological features and structural homophily, combining both aspects. For assessing our method, we applied it in a classification problem in information cascades, comparing its performance against several baseline methods. The experimental results with Flickr and Digg data show that SR-Diffuse can improve the quality of the discovered roles and can better represent the profile of the individuals, leading to a better prediction of social classes within information cascades.

CloseRead Abstract

2014

Discovering Colored Network Motifs

Authors
Pinto Ribeiro, PM; Silva, FMA;

Publication
Complex Networks V - Proceedings of the 5th Workshop on Complex Networks CompleNet 2014, Bologna, Italy, March 12-14, 2014

Abstract
Network motifs are small over represented patterns that have been used successfully to characterize complex networks. Current algorithmic approaches focus essentially on pure topology and disregard node and edge nature. However, it is often the case that nodes and edges can also be classified and separated into different classes. This kind of networks can be modeled by colored (or labeled) graphs. Here we present a definition of colored motifs and an algorithm for efficiently discovering them.We use g-tries, a specialized data-structure created for finding sets of subgraphs. G-Tries encapsulate common sub-structure, and with the aid of symmetry breaking conditions and a customized canonization methodology, we are able to efficiently search for several colored patterns at the same time. We apply our algorithm to a set of representative complex networks, showing that it can find colored motifs and outperform previous methods. © 2014 Springer International Publishing Switzerland.

CloseRead Abstract