2017
Autores
Cachada, M; Abdulrahman, SM; Brazdil, P;
Publicação
Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms co-located with the European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017.
Abstract
Machine learning users need methods that can help them identify algorithms or even workflows (combination of algorithms with preprocessing tasks, using or not hyperparameter configurations that are different from the defaults), that achieve the potentially best performance. Our study was oriented towards average ranking (AR), an algorithm selection method that exploits meta-data obtained on prior datasets. We focused on extending the use of a variant of AR* that takes A3R as the relevant metric (combining accuracy and run time). The extension is made at the level of diversity of the portfolio of workflows that is made available to AR. Our aim was to establish whether feature selection and different hyperparameter configurations improve the process of identifying a good solution. To evaluate our proposal we have carried out extensive experiments in a leave-one-out mode. The results show that AR* was able to select workflows that are likely to lead to good results, especially when the portfolio is diverse. We additionally performed a comparison of AR* with Auto-WEKA, running with different time budgets. Our proposed method shows some advantage over Auto-WEKA, particularly when the time budgets are small.
2017
Autores
Brazdil, P; Vanschoren, J; Hutter, F; Hoos, H;
Publicação
AutoML@PKDD/ECML
Abstract
2017
Autores
Souza Roza, R; Brazdil, P; Reis, JL; Cerdeira, A; Martins, P; Felgueiras, O;
Publicação
Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao
Abstract
The combination of information obtained from data mining technique from physicochemical and organoleptic data analysis allowed similarities between the wines of the nine sub-regions in the Demarcated Region of Vinho Verde. Through clustering techniques, four clusters were identified, each characterized by its centroid. The measure of information gain, together with supervised rule-based learning, was used to find the differentiating characteristics. This study allowed the interconnection of the characteristics of the wines of these sub-regions, which can improve the decision making on the profiles of these same wines.
2017
Autores
Roxo, MT; Brito, PQ;
Publicação
RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2
Abstract
Augmented Reality (AR) is no longer just a gimmick. 50 years after the development of the first head-mounted display, and approaching the 20th anniversary of the first conference dedicated to AR, it is time for a new review on the theme. As such, we present a bibliometric analysis of scientific literature since 1997, using as database the Web of Science. This allowed identifying the most relevant authors, their distribution by subjects, the evolution of publishing by year and the most frequent publications.
2017
Autores
Oliveira Brochado, A; Brito, PQ; Oliveira Brochado, F;
Publicação
SCIENCE & SPORTS
Abstract
The aim of this research is to analyze the correlates of adults' participation in sport and frequency of sport. A hurdle model approach comprising a binary choice regression to model participation in sport and a count model to address frequency of sport was applied to analyze the data obtained from 516 personal interviews in a Portuguese city. Participation in sport and frequent sport are associated with men, younger people, not married and without children under 2 years, nonsmokers and regular drinkers and with good perceived health. However, participation in sport and frequency of sport participation are associated with different levels of perception of the benefits of sport activity. Whereas awareness of the health and enjoyment benefits fosters participation, fitness, socializing and appearance might increase the frequency of sport. Sport communication strategies might play a prominent role in persuading potential participants of the benefits of sport activity and frequency.
2017
Autores
Teixeira, V; Camacho, R; Ferreira, PG;
Publicação
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)
Abstract
Cancer genome projects are characterizing the genome, epigenome and transcriptome of a large number of samples using the latest high-throughput sequencing assays. The generated data sets pose several challenges for traditional statistical and machine learning methods. In this work we are interested in the task of deriving the most informative genes from a cancer gene expression data set. For that goal we built denoising autoencoders (DAE) and stacked denoising autoencoders and we studied the influence of the input nodes on the final representation of the DAE. We have also compared these deep learning approaches with other existing approaches. Our study is divided into two main tasks. First, we built and compared the performance of several feature extraction methods as well as data sampling methods using classifiers that were able to distinguish the samples of thyroid cancer patients from samples of healthy persons. In the second task, we have investigated the possibility of building comprehensible descriptions of gene expression data by using Denoising Autoencoders and Stacked Denoising Autoencoders as feature extraction methods. After extracting information related to the description built by the network, namely the connection weights, we devised post-processing techniques to extract comprehensible and biologically meaningful descriptions out of the constructed models. We have been able to build high accuracy models to discriminate thyroid cancer from healthy patients but the extraction of comprehensible models is still very limited.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.