2009
Autores
Pinto Ribeiro, PM; Silva, FMA; Kaiser, M;
Publicação
Fifth International Conference on e-Science, e-Science 2009, 9-11 December 2009, Oxford, UK
Abstract
Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" - a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks.We hope to inform the choice of strategy and critically discuss future improvements in motif detection. © 2009 IEEE.
2009
Autores
Pereira, P; Silva, F; Fonseca, NA;
Publicação
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)
Abstract
We present a new, efficient and scalable tool, named BIORED, for pattern discovery in proteomic and genomic sequences. It uses a genetic algorithm to find interesting patterns in the form of regular expressions, and a new efficient pattern matching procedure to count pattern occurrences. We studied the performance, scalability and usefulness of BIORED using several databases of biosequences. The results show that BIORED was successful in finding previously known patterns, thus an excellent indicator for its potential. BIORED is available for download under the GNU Public License at http://www.dcc.fc.up.pt/bi-ored/. An online demo is available at the same address.
2009
Autores
Fonseca, NA; Costa, VS; Rocha, R; Camacho, R; Silva, F;
Publicação
SOFTWARE-PRACTICE & EXPERIENCE
Abstract
Inductive logic programming (ILP) is a sub-field of machine learning that provides an excellent framework for multi-relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well-known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright (C) 2008 John Wiley & Sons, Ltd.
2009
Autores
Fonseca, NA; Srinivasan, A; Silva, F; Camacho, R;
Publicação
MACHINE LEARNING
Abstract
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.
2009
Autores
Ribeiro, P; Simonotto, J; Kaiser, M; Silva, F;
Publicação
JOURNAL OF NEUROSCIENCE METHODS
Abstract
When calculating correlation networks from multi-electrode array (MEA) data, one works with extensive computations. Unfortunately, as the MEAs grow bigger, the time needed for the computation grows even more: calculating pair-wise correlations for current 60 channel systems can take hours on normal commodity computers whereas for future 1000 channel systems it would take almost 280 times as long, given that the number of pairs increases with the square of the number of channels. Even taking into account the increase of speed in processors, soon it can be unfeasible to compute correlations in a single computer. Parallel computing is a way to sustain reasonable calculation times in the future. We provide a general tool for rapid computation of correlation networks which was tested for: (a) a single computer cluster with 16 cores, (b) the Newcastle Condor System utilizing idle processors of university computers and (c) the inter-cluster, with 192 cores. Our reusable tool provides a simple interface for neuroscientists, automating data partition and job submission, and also allowing coding in any programming language. It is also sufficiently flexible to be used in other high-performance computing environments.
2009
Autores
Leal, JP;
Publicação
Proceedings of the IADIS International Conference WWW/Internet 2009, ICWI 2009
Abstract
This paper reports on the design, implementation and evaluation of XwQuest, a Web tool for responding to questionnaires on the Web. The distinctive feature of this tool is an XML definition of the questionnaire that focus on questions and admissible answers while avoiding presentation details. This questionnaire definition is processed on the browser side and converted to an Ajax application. Collected responses are periodically sent back to the server and can be retrieved by researchers and processed on a standard spreadsheet program. The paper details the questionnaire language of XwQuest and a generator that converts them into Ajax applications. Two case studies where XwQuest was actually used with good results are also presented. © 2009 IADIS.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.