2017
Authors
Rodrigues, A; Silva, C; Koerich Borges, PV; Silva, S; Dutra, I;
Publication
IJBDI
Abstract
2017
Authors
Loewe, L; Scheuer, KS; Keel, SA; Vyas, V; Liblit, B; Hanlon, B; Ferris, MC; Yin, J; Dutra, I; Pietsch, A; Javid, CG; Moog, CL; Meyer, J; Dresel, J; McLoone, B; Loberger, S; Movaghar, A; Gilchrist Scott, M; Sabri, Y; Sescleifer, D; Pereda Zorrilla, I; Zietlow, A; Smith, R; Pietenpol, S; Goldfinger, J; Atzen, SL; Freiberg, E; Waters, NP; Nusbaum, C; Nolan, E; Hotz, A; Kliman, RM; Mentewab, A; Fregien, N; Loewe, M;
Publication
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
Abstract
Names in programming are vital for understanding the meaning of code and big data. We define code2brain (C2B) interfaces as maps in compilers and brains between meaning and naming syntax, which help to understand executable code. While working toward an Evolvix syntax for general-purpose programming that makes accurate modeling easy for biologists, we observed how names affect C2B quality. To protect learning and coding investments, C2B interfaces require long-term backward compatibility and semantic reproducibility (accurate reproduction of computational meaning fromcoder-brains to reader-brains by code alone). Semantic reproducibility is often assumed until confusing synonyms degrade modeling in biology to deciphering exercises. We highlight empirical naming priorities from diverse individuals and roles of names in different modes of computing to show how naming easily becomes impossibly difficult. We present the Evolvix BEST (Brief, Explicit, Summarizing, Technical) Names concept for reducing naming priority conflicts, test it on a real challenge by naming subfolders for the Project Organization Stabilizing Tool system, and provide naming questionnaires designed to facilitate C2B debugging by improving names used as keywords in a stabilizing programming language. Our experiences inspired us to develop Evolvix using a flipped programming language design approach with some unexpected features and BEST Names at its core.
2017
Authors
Barbosa, J; Camacho, R; Dutra, I; Marques, O;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2017
Authors
Santos Pereira, C; Cruz Correia, R; Brito, AC; Augusto, AB; Correia, ME; Bento, MJ; Antunes, L;
Publication
2017 12TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)
Abstract
A cancer registry is a standardized tool to produce population-based data on cancer incidence and survival. Cancer registries can retrieve and store information on all cancer cases occurring in a defined population. The main sources of data on cancer cases usually include: treatment and diagnostic facilities (oncology centres or hospital departments, pathology laboratories, or imaging facilities etc.) and the official territorial death registry. The aim of this paper is to evaluate the north regional cancer registry (RORENO) of Portugal using a qualitative research. We want to characterize: the main functionalities and core processes, team involved, different healthcare institutions in the regional network and an identification of issues and potential improvements. RORENO links data of thirteen-two healthcare institutions and is responsible for the production of cancer incidence and survival report for this region. In our semi-structure interviews and observation of RORENO we identified a serious problem due to a lack of an automatic integration of data from the different sources. Most of the data are inserted manually in the system and this implies an extra effort from the RORENO team. At this moment RORENO team are still collecting data from 2011. In a near future it is crucial to automatize the integration of data linking the different healthcare institutions in the region. However, it is important to think which functionalities this system should give to the institutions in the network to maximize the engagement with the project. More than a database this should be a source of knowledge available to all the collaborative oncologic network.
2017
Authors
Eddin, AN; Pinto Ribeiro, PM;
Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017
Abstract
Networks are powerful in representing a wide variety of systems in many fields of study. Networks are composed of smaller substructures (subgraphs) that characterize them and give important information related to their topology and functionality. Therefore, discovering and counting these subgraph patterns is very important towards mining the features of networks. Algorithmically, subgraph counting in a network is a computationally hard problem and the needed execution time grows exponentially as the size of the subgraph or the network increases. The main goal of this paper is to contribute towards subgraph search, by providing an accessible and scalable parallel methodology for counting subgraphs. For that we present a dynamic iterative MapReduce strategy to parallelize algorithms that induce an unbalanced search tree, and apply it in the subgraph counting realm. At the core of our methods lies the g-trie, a state-of-the-art data structure that was created precisely for this task. Our strategy employs an adaptive time threshold and an efficient work-sharing mechanism to dynamically do load balancing between the workers. We evaluate our implementations using Spark on a large set of representative complex networks from different fields. The results obtained are very promising and we achieved a consistent and almost linear speedup up to 32 cores, with an average efficiency close to 80%. To the best of our knowledge this is the fastest and most scalable method for subgraph counting within the MapReduce programming model. Copyright 2017 ACM.
2017
Authors
Araujo, M; Ribeiro, P; Faloutsos, C;
Publication
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
Abstract
Given an heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TENSORCAST, a novel method that forecasts time-evolving networks more accurately than current state of the art methods by incorporating multiple data sources in coupled tensors. TENSORCAST is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with different structure. We run our method on multiple real-world networks, including DBLP and a Twitter temporal network with over 310 million non-zeros, where we predict the evolution of the activity of the use of political hashtags.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.