2013
Autores
Gomes, EF; Jorge, AM; Azevedo, PJ;
Publicação
International C* Conference on Computer Science & Software Engineering, C3S2E13, Porto, Portugal - July 10 - 12, 2013
Abstract
The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification. © 2013 ACM.
2013
Autores
Domingues, MA; Soares, C; Jorge, AM;
Publicação
INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT
Abstract
The goal of many web portals is to select, organize and distribute content in order to satisfy its users/customers. This process is usually based on meta-data that represent and describe content. In this paper we describe a methodology and a system to monitor the quality of the meta-data used to describe content in web portals. The methodology is based on the analysis of the meta-data using statistics, visualization and data mining tools. The methodology enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We also define a general architecture for a system to support the proposed methodology. We have implemented this system and tested it on a Portuguese portal for management executives. The results validate the methodology proposed.
2013
Autores
Jorge, AM;
Publicação
ACM International Conference Proceeding Series
Abstract
Recommender Systems are a hot application area these days, made popular by well known web sites. The problem of predicting user preferences is very demanding from the data mining algorithm design point of view, but it also poses challenges to evaluation and monitoring. Moreover, there is a lot of information that can be exploited, from clickstreams and background information to musical content and social interaction. As data grows and recommendation requests must be answered in a split second, online and agile solutions must be implemented. In this talk we will give a brief introduction to binary recommender systems, describe a particular hybrid application to music recommendation - from algorithm to online evaluation, and refer to context aware and online recommender algorithms. © 2013 ACM.
2013
Autores
Motta, R; Nogueira, BM; Jorge, AM; De Andrade Lopes, A; Rezende, SO; De Oliveira, MCF;
Publicação
Proceedings of the ACM Symposium on Applied Computing
Abstract
Cluster detection methods are widely studied in Propositional Data Mining. In this context, data is individually represented as a feature vector. This data has a natural nonrelational structure, but can be represented in a relational form through similarity-based network models. In these models, examples are represented by vertices and an edge connects two examples with high similarity. This relational representation allows employing network-based algorithms in Relational Data Mining. Specifically in clustering tasks, these models allow to use community detection algorithms in networks in order to detect data clusters. In this work, we compared traditional non-relational data-based clustering algorithms with clustering detection algorithms based on relational data using measures for community detection in networks. We carried out an exploratory analysis over 23 numerical datasets and 10 textual datasets. Results show that network models can efficiently represent the data topology, allowing their application in cluster detection with higher precision when compared to non-relational methods. Copyright 2013 ACM.
2013
Autores
Torgo, L; Ribeiro, RP; Pfahringer, B; Branco, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013
Abstract
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable. © 2013 Springer-Verlag.
2013
Autores
Saleiro, P; Rei, L; Pasquali, A; Soares, C; Teixeira, J; Pinto, F; Nozari, M; Felix, C; Strecht, P;
Publicação
CEUR Workshop Proceedings
Abstract
Filtering tweets relevant to a given entity is an important task for online reputation management systems. This contributes to a reliable analysis of opinions and trends regarding a given entity. In this paper we describe our participation at the Filtering Task of RepLab 2013. The goal of the competition is to classify a tweet as relevant or not relevant to a given entity. To address this task we studied a large set of features that can be generated to describe the relationship between an entity and a tweet. We explored different learning algorithms as well as, different types of features: text, keyword similarity scores between enti-ties metadata and tweets, Freebase entity graph and Wikipedia. The test set of the competition comprises more than 90000 tweets of 61 entities of four distinct categories: automotive, banking, universities and music. Results show that our approach is able to achieve a Reliability of 0.72 and a Sensitivity of 0.45 on the test set, corresponding to an F-measure of 0.48 and an Accuracy of 0.908.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.