2016
Authors
Sousa, R; Gama, J;
Publication
Proceedings of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016), Riva del Garda, Italy, September 23, 2016.
Abstract
Machine Learning and Data Mining research strongly depend on the quality and quantity of the real world datasets for the evaluation stages of the developing methods. In the context of the emerging Online Multi-Target Regression and Multi-Label Classification methodologies, datasets present new characteristics that require specific testing and represent new challenges. The first difficulty found in evaluation is the reduced amount of examples caused by data damage, privacy preservation or high cost of acquirement. Secondly, few data events of interest such as data changes are difficult to find in the datasets of specific domains, since these events naturally scarce. For those reasons, this work suggests a method of producing synthetic datasets with desired properties(number of examples, data changes events, ... ) for the evaluation of Multi-Target Regression and Multi-Label Classification methods. These datasets are produced using First Principle Models which give more realistic and representative properties such as real world meaning ( physical, financial, ... ) for the outputs and inputs variables. This type of dataset generation can be used to produce infinite streams and to evaluate incremental methods such as online anomaly and change detection. This paper illustrates the use of synthetic data generation through two showcases of data changes evaluation.
2016
Authors
Gavaldà, R; Žliobaite, I; Gama, J;
Publication
CEUR Workshop Proceedings
Abstract
2016
Authors
T, HadiFanaee; Gama, Joao;
Publication
CoRR
Abstract
2016
Authors
Camacho, R; Barbosa, JG; Sampaio, AM; Ladeiras, J; Fonseca, NA; Costa, VS;
Publication
Resource Management for Big Data Platforms - Algorithms, Modelling, and High-Performance Computing Techniques
Abstract
2016
Authors
Tello Ruiz, MK; Stein, J; Wei, S; Preece, J; Olson, A; Naithani, S; Amarasinghe, V; Dharmawardhana, P; Jiao, YP; Mulvaney, J; Kumari, S; Chougule, K; Elser, J; Wang, B; Thomason, J; Bolser, DM; Kerhornou, A; Walts, B; Fonseca, NA; Huerta, L; Keays, M; Tanga, YA; Parkinson, H; Fabregat, A; McKay, S; Weiser, J; D'Eustachio, P; Stein, L; Petryszak, R; Kersey, PJ; Jaiswal, P; Ware, D;
Publication
NUCLEIC ACIDS RESEARCH
Abstract
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to similar to 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to pro-vide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
2016
Authors
Petryszak, R; Keays, M; Tang, YA; Fonseca, NA; Barrera, E; Burdett, T; Füllgrabe, A; Pomer Fuentes, AM; Jupp, S; Koskinen, S; Mannion, O; Huerta, L; Megy, K; Snow, C; Williams, E; Barzine, M; Hastings, E; Weisser, H; Wright, J; Jaiswal, P; Huber, W; Choudhary, J; Parkinson, HE; Brazma, A;
Publication
Nucleic Acids Research
Abstract
Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from Array Express, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown sevenfold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates. © The Author(s) 2015.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.