Publicacoes - INESC TEC

Publicações

Publicações por Inês Dutra

2020

A Representation Method for Cellular Lines based on SVM and Text Mining

Autores
Carrera, I; Dutra, I; Tejera, E;

Publicação
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE

Abstract
One important problem in Bioinformatics is the discovery of new interactions between cellular lines and chemical compounds. In silico methods for cell-line screening are fundamental to optimize cost and time in the drug discovery processes. In order to build these methods, we need to computationally represent cell lines. Current methods for modeling cell line interactions rely on comparing genetic expression profiles. However, these profiles are usually unknown. In this work, we present a method to characterize and represent cell lines by text processing the related scientific literature. We collect abstracts of scientific papers about cellular lines from Cellosaurus and PubMed. These documents are then represented as TF-IDF vectors. We build a data set for classification with the document vectors having the cell line identifier as the target class. We then apply a multiclass SVM classification method. We use Support Vector Domain Description to describe and characterize each cell line with its corresponding hyperplane obtained with a one-vs-rest training. We evaluated several configurations of classifiers, using micro-averaged precision as metric to choose the best classifier, and were able to differentiate cellular lines from a set of 200+.

FecharLer Abstract

2021

Simple Matrix Factorization Collaborative Filtering for Drug Repositioning on Cell Lines

Autores
Carrera, I; Tejera, E; Dutra, I;

Publicação
HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF

Abstract
The discovery of new biological interactions, such as interactions between drugs and cell lines, can improve the way drugs are developed. Recently, there has been important interest for predicting interactions between drugs and targets using recommender systems; and more specifically, using recommender systems to predict drug activity on cellular lines. In this work, we present a simple and straightforward approach for the discovery of interactions between drugs and cellular lines using collaborative filtering. We represent cellular lines by their drug affinity profile, and correspondingly, represent drugs by their cell line affinity profile in a single interaction matrix. Using simple matrix factorization, we predicted previously unknown values, minimizing the regularized squared error. We build a comprehensive dataset with information from the ChEMBL database. Our dataset comprises 300,000+ molecules, 1,200+ cellular lines, and 3,000,000+ reported activities. We have been able to successfully predict drug activity, and evaluate the performance of our model via utility, achieving an Area Under ROC Curve (AUROC) of near 0.9.

FecharLer Abstract

2019

Using Grover's search quantum algorithm to solve Boolean satisfiability problems: Part I

Autores
Fernandes, D; Dutra, I;

Publicação
XRDS

Abstract

2019

Characterizing Bipolar Disorder-Associated Single Nucleotide Polymorphisms in a Large British Cohort Using Association Rules

Autores
Pinheira, A; Silva Dias, Rd; Nascimento, C; Dutra, I;

Publicação
CIBB

Abstract
Bipolar Disorder (BD) is chronic and severe psychiatric illness presenting with mood alterations, including manic, hypomanic and depressive episodes. Due to the high clinical heterogeneity and lack of biological validation, both BD treatment and diagnostic are still problematic. Patients and clinicians would benefit from better clinical and biological characterization, ultimately opening a new possibility to distinct forms of treatment. In this context, we studied genome wide association (GWA) data from the Wellcome Trust Case Control Consortium (WTCCC). After an exploratory analysis, we found a higher prevalence of homozygous compared with heterozygous in different single nucleotide polymorphisms (SNPs) in genes previously associated with BD risk. Results from our association rules analysis indicate that there is a group of patients presenting with different groups of genotypes, including pairs or triples, while others present only one. We performed the same analysis with a control group from the same cohort (WTCCC) and found that although healthy subjects may present the same SNPs combinations, the risky alleles occur in a lower frequency. Moreover, no subject in the control group presented the same pairs or triples of genotypes found in the BD group, and if a pair or triple is found, the support and confidence are lower than in the BD group (< 50 %).

FecharLer Abstract

2019

Using Grover's search quantum algorithm to solve Boolean satisfiability problems, part 2

Autores
Fernandes, D; Silva, C; Dutra, I;

Publicação
XRDS

Abstract

2021

Predictive Maintenance for Sensor Enhancement in Industry 4.0

Autores
Silva, C; da Silva, MF; Rodrigues, A; Silva, J; Costa, VS; Jorge, A; Dutra, I;

Publicação
ACIIDS (Companion)

Abstract
This paper presents an effort to timely handle 400+ GBytes of sensor data in order to produce Predictive Maintenance (PdM) models. We follow a data-driven methodology, using state-of-the-art python libraries, such as Dask and Modin, which can handle big data. We use Dynamic Time Warping for sensors behavior description, an anomaly detection method (Matrix Profile) and forecasting methods (AutoRegressive Integrated Moving Average - ARIMA, Holt-Winters and Long Short-Term Memory - LSTM). The data was collected by various sensors in an industrial context and is composed by attributes that define their activity characterizing the environment where they are inserted, e.g. optical, temperature, pollution and working hours. We successfully managed to highlight aspects of all sensors behaviors, and produce forecast models for distinct series of sensors, despite the data dimension.

FecharLer Abstract