Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CRACS

2014

Relational machine learning for electronic health record-driven phenotyping

Authors
Peissig, PL; Costa, VS; Caldwell, MD; Rottscheit, C; Berg, RL; Mendonca, EA; Page, D;

Publication
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Objective: Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. Methods: Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. Results: We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p = 0.039), J48 (p = 0.003) and JRIP (p = 0.003). Discussion: ILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts. Conclusion: Relational learning using ILP offers a viable approach to EHR-driven phenotyping.

2014

Late Breaking Papers of the 23rd International Conference on Inductive Logic Programming, Rio de Janeiro, Brazil, August 28th - to - 30th, 2013

Authors
Zaverucha, G; Costa, VS; Paes, AM;

Publication
ILP (Late Breaking Papers)

Abstract

2014

Inductive Logic Programming - 23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers

Authors
Zaverucha, G; Costa, VS; Paes, A;

Publication
ILP

Abstract

2014

Lifted Variable Elimination for Probabilistic Logic Programming

Authors
Bellodi, E; Lamma, E; Riguzzi, F; Costa, VS; Zese, R;

Publication
THEORY AND PRACTICE OF LOGIC PROGRAMMING

Abstract
Lifted inference has been proposed for various probabilistic logical frameworks in order to compute the probability of queries in a time that depends on the size of the domains of the random variables rather than the number of instances. Even if various authors have underlined its importance for probabilistic logic programming (PLP), lifted inference has been applied up to now only to relational languages outside of logic programming. In this paper we adapt Generalized Counting First Order Variable Elimination (GC-FOVE) to the problem of computing the probability of queries to probabilistic logic programs under the distribution semantics. In particular, we extend the Prolog Factor Language (PFL) to include two new types of factors that are needed for representing ProbLog programs. These factors take into account the existing causal independence relationships among random variables and are managed by the extension to variable elimination proposed by Zhang and Poole for dealing with convergent variables and heterogeneous factors. Two new operators are added to GC-FOVE for treating heterogeneous factors. The resulting algorithm, called LP2 for Lifted Probabilistic Logic Programming, has been implemented by modifying the PFL implementation of GC-FOVE and tested on three benchmarks for lifted inference. A comparison with PITA and ProbLog2 shows the potential of the approach.

2014

A Distributed Architecture for Remote Validation of Software Licenses Using USB/IP Protocol

Authors
Antunes, MJ; Afonso, A; Pinto, FM;

Publication
NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2

Abstract
USB dongles have been used by a wide range of software manufacturers to store a copy-protected of their application's license. The licenses validation procedure through USB dongles faces several concerns, as the risks of theft or losing dongle. Also, in scenarios where the number of dongles is reduced, users may have to wait for dongle access, which may lead to loss of productivity. In this paper we propose a client/server distributed architecture for remote software licenses validation, through USB/IP protocol. The proposed approach aims to take advantage of USB/IP for distributed access to a set of USB dongles physically connected to a remote USB server, over a TCP/IP network. We describe the deployment and enhancements made to an existing open source USB/ IP implementation and also present the results obtained with this architecture in a real world scenario, for validation of computer forensics applications licenses that uses USB dongles.

2014

Concept Drift Awareness in Twitter Streams

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA)

Abstract
Learning in non-stationary environments is not an easy task and requires a distinctive approach. The learning model must not only have the ability to continuously learn, but also the ability to acquired new concepts and forget the old ones. Additionally, given the significant importance that social networks gained as information networks, there is an ever-growing interest in the extraction of complex information used for trend detection, promoting services or market sensing. This dynamic nature tends to limit the performance of traditional static learning models and dynamic learning strategies must be put forward. In this paper we present a learning strategy to learn with drift in the occurrence of concepts in Twitter. We propose three different models: a time-window model, an ensemble-based model and an incremental model. Since little is known about the types of drift that can occur in Twitter, we simulate different types of drift by artificially timestamping real Twitter messages in order to evaluate and validate our strategy. Results are so far encouraging regarding learning in the presence of drift, along with classifying messages in Twitter streams.

  • 124
  • 209