Publications

Publications by CRACS

2014

G-Tries: a data structure for storing and finding subgraphs

Authors
Ribeiro, P; Silva, F;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The ability to find and count subgraphs of a given network is an important non trivial task with multidisciplinary applicability. Discovering network motifs or computing graphlet signatures are two examples of methodologies that at their core rely precisely on the subgraph counting problem. Here we present the g-trie, a data-structure specifically designed for discovering subgraph frequencies. We produce a tree that encapsulates the structure of the entire graph set, taking advantage of common topologies in the same way a prefix tree takes advantage of common prefixes. This avoids redundancy in the representation of the graphs, thus allowing for both memory and computation time savings. We introduce a specialized canonical labeling designed to highlight common substructures and annotate the g-trie with a set of conditional rules that break symmetries, avoiding repetitions in the computation. We introduce a novel algorithm that takes as input a set of small graphs and is able to efficiently find and count them as induced subgraphs of a larger network. We perform an extensive empirical evaluation of our algorithms, focusing on efficiency and scalability on a set of diversified complex networks. Results show that g-tries are able to clearly outperform previously existing algorithms by at least one order of magnitude.

CloseRead Abstract

2014

Parallel Subgraph Counting for Multicore Architectures

Authors
Aparicio, D; Ribeiro, P; Silva, F;

Publication
2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA)

Abstract
Computing the frequency of small subgraphs on a large network is a computationally hard task. This is, however, an important graph mining primitive, with several applications, and here we present a novel multicore parallel algorithm for this task. At the core of our methodology lies a state-of-the-art data structure, the g-trie, which represents a collection of subgraphs and allows for a very efficient sequential search. Our implementation was done using Pthreads and can run on any multicore personal computer. We employ a diagonal work sharing strategy to dynamically and effectively divide work among threads during the execution. We assess the performance of our Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, we obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of our algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.

CloseRead Abstract

2014

Querying Volatile and Dynamic Networks

Authors
Choobdar, S; Pinto Ribeiro, PM; Silva, FMA;

Publication
Encyclopedia of Social Network Analysis and Mining

Abstract
Dynamic (or volatile) network: A network which structure changes over time, some nodes and edges may appear and disappear Local property: A metric that describes a topological aspect on the local neighborhood of a node Structural role: The structural position of a node in the network, characterized by its local properties Transition pattern: A typical role change, characterized by its time interval and by the origin and destination role Definition Many networks are intrinsically dynamic and change over time. These networks can be very volatile, with a significant number of edges and nodes appearing and disappearing. The majority of the existing network mining methodologies are however geared toward a more static scenario, with a single graph describing the topology of the system being analyzed. There is still a need for measurements and tools that allow the temporal dimension on network analysis to be…. © Springer Science+Business Media New York 2014.

CloseRead Abstract

2014

Preface

Authors
Silva, F; Dutra, I; Costa, VS;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2014

Sequencing Educational Resources with Seqins

Authors
Queiros, R; Leal, JP; Campos, J;

Publication
COMPUTER SCIENCE AND INFORMATION SYSTEMS

Abstract
Existing adaptive educational hypermedia systems have been using learning resources sequencing approaches in order to enrich the learning experience. In this context, educational resources, either expository or evaluative, play a central role. However, there is a lack of tools that support sequencing essentially due to the fact that existing specifications are complex. This paper presents Seqins as a sequencing tool of digital educational resources. Seqins includes a simple and flexible sequencing model that will foster heterogeneous students to learn at different rhythms. The tool communicates through the IMS Learning Tools Interoperability specification with a plethora of e-learning systems such as learning management systems, repositories, authoring and automatic evaluation systems. In order to validate Seqins we integrate it in an e-learning Ensemble framework instance for the computer programming learning domain.

CloseRead Abstract

2014

A study of machine learning methods for detecting user interest during web sessions

Authors
Jorge, AM; Leal, JP; Anand, SS; Dias, H;

Publication
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
The ability to have an automated real time detection of user interest during a web session is very appealing and can be very useful for a number of web intelligence applications. Low level interaction events associated with user interest manifestations form the basis of user interest models. However such data sets present a number of challenges from a machine learning perspective, including the level of noise in the data and class imbalance (given that the majority of content will not be of interest to a user). In this paper we evaluate a large number of machine learning techniques aimed at learning from class imbalanced data using two data sets collected from a real user study. We use the AUC, recall, precision and model complexity to compare the relative merits of these techniques and conclude that useful models with AUC above 0.8 can be obtained using a mix of sampling and cost based methods. Ensemble models can provide further accuracy but make deployment more complex.

CloseRead Abstract