2017
Autores
Brazdil, P; Vilalta, R; Giraud Carrier, CG; Soares, C;
Publicação
Encyclopedia of Machine Learning and Data Mining
Abstract
In the area machine learning / data mining many diverse algorithms are available nowadays and hence the selection of the most suitable algorithm may be a challenge. Tbhis is aggravated by the fact that many algorithms require that certain parameters be set. If a wrong algorithm and/or parameter configuration is selected, substandard results may be obtained. The topic of metalearning aims to facilitate this task. Metalearning typically proceeds in two phases. First, a given set of algorithms A (e.g. classification algorithms) and datasets D is identified and different pairs < ai,dj > from these two sets are chosen for testing. The dataset di is described by certain meta-features which together with the performance result of algorithm ai constitute a part of the metadata. In the second phase the metadata is used to construct a model, usually again with recourse to machine learning methods. The model represents a generalization of various base-level experiments. The model can then be applied to the new dataset to recommend the most suitable algorithm or a ranking ordered by relative performance. This article provides more details about this area. Besides, it discusses also how the method can be combined with hyperparameter optimization and extended to sequences of operations (workflows). © Springer Science+Business Media New York 2011, 2017
2017
Autores
Saleiro, P; Frayling, NM; Rodrigues, EM; Soares, C;
Publicação
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017
Abstract
Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where columns represent entity types and the table structure implies one or more relationships among the entities. Editorial work involves creating natural language queries based on relationships represented by the entries in the table. We have publicly released the RELink test collection comprising 600 queries and relevance judgments obtained from a sample of Wikipedia List-of-lists-oflists tables. The latter comprise tuples of entities that are extracted from columns and labelled by corresponding entity types and relationships they represent. In order to facilitate research in complex E-R retrieval, we have created and released as open source the RELink Framework that includes Apache Lucene indexing and search specifically tailored to E-R retrieval. RELink includes entity and relationship indexing based on the ClueWeb-09-BWeb collection with FACC1 text span annotations linked to Wikipedia entities. With ready to use search resources and a comprehensive test collection, we support community in pursuing E-R research at scale. © 2017 ACM.
2017
Autores
Cunha, T; Soares, C; de Carvalho, ACPLF;
Publicação
DISCOVERY SCIENCE, DS 2017
Abstract
Recommender Systems have become increasingly popular, propelling the emergence of several algorithms. As the number of algorithms grows, the selection of the most suitable algorithm for a new task becomes more complex. The development of new Recommender Systems would benefit from tools to support the selection of the most suitable algorithm. Metalearning has been used for similar purposes in other tasks, such as classification and regression. It learns predictive models to map characteristics of a dataset with the predictive performance obtained by a set of algorithms. For such, different types of characteristics have been proposed: statistical and/or information-theoretical, model-based and landmarkers. Recent studies argue that landmarkers are successful in selecting algorithms for different tasks. We propose a set of landmarkers for a Metalearning approach to the selection of Collaborative Filtering algorithms. The performance is compared with a state of the art systematic metafeatures approach using statistical and/or information-theoretical metafeatures. The results show that the metalevel accuracy performance using landmarkers is not statistically significantly better than the metafeatures obtained with a more traditional approach. Furthermore, the baselevel results obtained with the algorithms recommended using landmarkers are worse than the ones obtained with the other metafeatures. In summary, our results show that, contrary to the results obtained in other tasks, these landmarkers are not necessarily the best metafeatures for algorithm selection in Collaborative Filtering.
2017
Autores
Saleiro, P; Rodrigues, EM; Soares, C; Oliveira, EC;
Publicação
Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017
Abstract
2017
Autores
Nogueira, AR; Ferreira, CA; Gama, J;
Publicação
Foundations of Intelligent Systems - 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings
Abstract
This work aims to help in the correct and early diagnosis of the acute kidney injury, through the application of data mining techniques. The main goal is to be implemented in Intensive Care Units (ICUs) as an alarm system, to assist health professionals in the diagnosis of this disease. These techniques will predict the future state of the patients, based on his current medical state and the type of ICU. Through the comparison of three different approaches (Markov Chain Model, Markov Chain Model ICU Specialists and Random Forest), we came to the conclusion that the best method is the Markov Chain Model ICU Specialists. © Springer International Publishing AG 2017.
2017
Autores
Sarmento, RP; Cordeiro, M; Brazdil, P; Gama, J;
Publicação
Complex Networks & Their Applications VI - Proceedings of Complex Networks 2017 (The Sixth International Conference on Complex Networks and Their Applications), COMPLEX NETWORKS 2017, Lyon, France, November 29 - December 1, 2017.
Abstract
Social Network Analysis (SNA) is an important research area. It originated in sociology but has spread to other areas of research, including anthropology, biology, information science, organizational studies, political science, and computer science. This has stimulated research on how to support SNA with the development of new algorithms. One of the critical areas involves calculation of different centrality measures. The challenge is how to do this fast, as many increasingly larger datasets are available. Our contribution is an incremental version of the Laplacian Centrality measure that can be applied not only to large graphs but also to dynamically changing networks. We have conducted several tests with different types of evolving networks. We show that our incremental version can process a given large network, faster than the corresponding batch version in both incremental and full dynamic network setups. © Springer International Publishing AG 2018.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.