2017
Authors
Gama, J;
Publication
Encyclopedia of Machine Learning and Data Mining
Abstract
Clustering is one of the most popular data mining techniques. In this article, we review the relevant methods and algorithms for designing cluster algorithms under the data streams computational model, and discuss research directions in tracking evolving clusters. © Springer Science+Business Media New York 2011, 2017
2017
Authors
Duarte, J; Gama, J;
Publication
SAC
Abstract
Feature selection and feature ranking are two aspects of the same learning task. They are well studied in batch scenarios, but not in the streaming setting. This paper presents a study on feature ranking from data streams in online learning regression models. The main challenge here is the relevance of features might change over time: features relevant in the past might be irrelevant now and vice-versa. We propose three new online feature ranking algorithms designed for Hoeffding algorithms. We have implemented the three methods in AMRules, a streaming regression algorithm to learn model rules. We compare their behaviour experimentally and present the pros and cons of each method.
2017
Authors
Oliveira, E; Cardoso, HL; Gama, J; Vale, Z;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2017
Authors
Oliveira, E; Gama, J; Vale, Z; Lopes Cardoso, H;
Publication
Lecture Notes in Computer Science
Abstract
2017
Authors
Cavadas, B; Ferreira, J; Camacho, R; Fonseca, NA; Pereira, L;
Publication
PACBB
Abstract
The huge amount of genomic and transcriptomic data obtained to characterize human diversity can also be exploited to indirectly gather information on the human microbiome. Here we present the pipeline QmihR designed to identify and quantify the abundance of known microbiome communities and to search for new/rare pathogenic species in RNA-seq datasets. We applied QmihR to 36 RNA-seq tumor tissue samples from Ukrainian gastric carcinoma patients available in TCGA, in order to characterize their microbiome and check for efficiency of the pipeline. The microbes present in the samples were in accordance to published data in other European datasets, and the independent BLAST evaluation of microbiome-aligned reads confirmed that the assigned species presented the highest BLAST match-hits. QmihR is available at GitHub (https://github.com/ Pereira-lab/QmihR).
2017
Authors
Petryszak, R; Fonseca, NA; Füllgrabe, A; Huerta, L; Keays, M; Tang, YA; Brazma, A;
Publication
Bioinform.
Abstract
Motivation: The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. Results: The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.