2019
Authors
Moulton, RH; Viktor, HL; Japkowicz, N; Gama, J;
Publication
CoRR
Abstract
2019
Authors
Costa Júnior, JD; de Faria, ER; Andrade Silva, Jd; Gama, J; Cerri, R;
Publication
BRACIS
Abstract
In Multi-Label Stream Classification (MLSC) examples arriving in a stream can be simultaneously classified into multiple classes. This is a very challenging task, especially considering that new classes can emerge during the stream (Concept Evolution), and known classes can change over time (Concept Drift). In real situations, these characteristics come together with a scenario with Infinitely Delayed Labels, where we can never access the true class labels of the examples to update classifiers. In order to overcome these issues, this paper proposes a new method called MultI-label learNing Algorithm for Data Streams with Binary Relevance transformation (MINAS-BR). Our proposal uses a new Novelty Detection (ND) procedure to detect concept evolution and concept drift, being updated in an unsupervised fashion. We also propose a new methodology to evaluate MLSC methods in scenarios with Infinitely Delayed Labels. Experiments over synthetic data sets attested the potential of MINAS-BR, which was able to adapt to different concept drift and concept evolution scenarios, obtaining superior or competitive performances in comparison to literature baselines.
2019
Authors
Demircioglu, D; Cukuroglu, E; Kindermans, M; Nandi, T; Calabrese, C; Fonseca, NA; Kahles, A; Kjong Van Lehmann,; Stegle, O; Brazma, A; Brooks, AN; Ratsch, G; Tan, P; Goke, J;
Publication
CELL
Abstract
Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.
2019
Authors
Athar, A; Füllgrabe, A; George, N; Iqbal, H; Huerta, L; Ali, A; Snow, C; Fonseca, NA; Petryszak, R; Papatheodorou, I; Sarkans, U; Brazma, A;
Publication
NUCLEIC ACIDS RESEARCH
Abstract
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in experiments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.
2019
Authors
Santos, DF; Rodrigues, PP;
Publication
Int. J. Data Sci. Anal.
Abstract
In obstructive sleep apnea, respiratory effort is maintained but ventilation decreases/disappears due to upper-airway partial/total occlusion. This condition affects about 4% of men and 2% of women worldwide. This study aimed to define an auxiliary diagnostic method that can support the decision to perform polysomnography, based on risk and diagnostic factors. Our sample performed polysomnography between January and May 2015. Two Bayesian classifiers were used to build the models: Naïve Bayes and Tree Augmented Naïve Bayes, using 38 variables identified by literature review or just a selection of 6. Area under the ROC curve, sensitivity, specificity and predictive values were evaluated using leave-one-out and cross-validation techniques. From a total of 241 patients, only 194 fulfilled the inclusion criteria, 123 (63%) were male, with a mean age of 58 years, 66 (34%) patients had a normal result and 128 (66%) a diagnosis of obstructive sleep apnea. The cross-validated AUCs for each model were: NB38: 69.2%; TAN38: 69.0%; NB6: 74.6% and TAN6: 63.6%. Regarding risk matrix, female gender presented a starting rate of 8%, comparing to 20% in male gender, almost 3 times higher. The high (34%) proportion of normal results confirms the need for a pre-evaluation prior to polysomnography, making the search for a validated model to screen patients with suspicion of obstructive sleep apnea essential, especially at primary care level.
2019
Authors
Bischoff, F; Carmo Koch, Md; Rodrigues, PP;
Publication
EFMI-STC
Abstract
The current algorithm to support platelets stock management assumes that there are always sufficient whole blood donations (WBD) to produce the required amount of pooled platelets. Unfortunately, blood donation rate is uncertain so there is the need to backup pooled platelets productions with single-donor (apheresis) collections to compensate periods of low WBD. The aim of this work was to predict the daily number of WBD to a tertiary care center to preemptively account for a decrease of platelets production. We have collected 62,248 blood donations during 3 years, the daily count of which was used to feed (standalone and ensemble versions of) six prediction models, which were evaluated using the Mean Absolute Error (MAE). Forecast models have shown better performances with a MAE of about 8.6 donations, 34% better than using means or medians alone. Trend lines of donations are better modeled by autoregressive integrated moving average (ARIMA) using a frequency of 365 days, the trade-off being the need for at least two years of data.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.