Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2015

Medical Mining

Autores
Spiliopoulou, M; Rodrigues, PP; Menasalvas, E;

Publicação
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15

Abstract

2015

28th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2015, Sao Carlos, Brazil, June 22-25, 2015

Autores
Jr., CT; Rodrigues, PP; Kane, B; Marques, PMdA; Traina, AJM;

Publicação
CBMS

Abstract

2015

Algorithm selection via meta-learning and sample-based active testing

Autores
Abdulrahman, SM; Brazdil, P; Van Rijn, JN; Vanschoren, J;

Publicação
CEUR Workshop Proceedings

Abstract
Identifying the best machine learning algorithm for a given problem continues to be an active area of research. In this paper we present a new method which exploits both meta-level information acquired in past experiments and active testing, an algorithm selection strategy. Active testing attempts to iteratively identify an algorithm whose performance will most likely exceed the performance of previously tried algorithms. The novel method described in this paper uses tests on smaller data sample to rank the most promising candidates, thus optimizing the schedule of experiments to be carried out. The experimental results show that this approach leads to considerably faster algorithm selection.

FecharLer Abstract

2015

Fast Algorithm Selection Using Learning Curves

Autores
van Rijn, JN; Abdulrahman, SM; Brazdil, P; Vanschoren, J;

Publicação
Advances in Intelligent Data Analysis XIV

Abstract
One of the challenges in Machine Learning to find a classifier and parameter settings that work well on a given dataset. Evaluating all possible combinations typically takes too much time, hence many solutions have been proposed that attempt to predict which classifiers are most promising to try. As the first recommended classifier is not always the correct choice, multiple recommendations should be made, making this a ranking problem rather than a classification problem. Even though this is a well studied problem, there is currently no good way of evaluating such rankings. We advocate the use of Loss Time Curves, as used in the optimization literature. These visualize the amount of budget (time) needed to converge to a acceptable solution. We also investigate a method that utilizes the measured performances of classifiers on small samples of data to make such recommendation, and adapt it so that it works well in Loss Time space. Experimental results show that this method converges extremely fast to an acceptable solution.

FecharLer Abstract

2015

Retrieval, visualization and validation of affinities between documents

Autores
Trigo, L; Víta, M; Sarmento, R; Brazdil, P;

Publicação
IC3K 2015 - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Abstract
We present an Information Retrieval tool that facilitates the task of the user when searching for a particular information that is of interest to him. Our system processes a given set of documents to produce a graph, where nodes represent documents and links the similarities. The aim is to offer the user a tool to navigate in this space in an easy way. It is possible to collapse/expand nodes. Our case study shows affinity groups based on the similarities of text production of researchers. This goes beyond the already established communities revealed by co-authorship. The system characterizes the activity of each author by a set of automatically generated keywords and by membership to a particular affinity group. The importance of each author is highlighted visually by the size of the node corresponding to the number of publications and different measures of centrality. Regarding the validation of the method, we analyse the impact of using different combinations of titles, abstracts and keywords on capturing the similarity between researchers.

FecharLer Abstract

2015

Density-based graph model summarization: Attaining better performance and efficiency

Autores
Valizadeh, M; Brazdil, P;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Several algorithms based on PageRank algorithm have been proposed to rank the document sentences in the multi-document summarization field and LexRank and T-LexRank algorithms are well known examples. In literature different concepts such as weighted inter-cluster edge, cluster-sensitive graph model and document-sensitive graph model have been proposed to improve LexRank and T-LexRank algorithms (e.g. DsR-G, DsR-Q) for multi-document summarization. In this paper, a density-based graph model for multi-document summarization is proposed by adding the concept of density to LexRank and T-LexRank algorithms. The resulting generic multi-document summarization systems, DensGS and DensGSD were evaluated on DUC 2004 while the query-based variants, DensQS, DensQSD were evaluated on DUC 2006, DUC 2007 and TAC 2010 task A. ROUGE measure was used in the evaluation. Experimental results show that density concept improves LexRank and T-LexRank algorithms and outperforms previous graph-based models (DsR-G and DsR-Q) in generic and query-based multi-document summarization tasks. Furthermore, the comparison of the number of iterations indicates that the density-based algorithm is faster than the other algorithms based on PageRank.

FecharLer Abstract