Publications

Publications by Jorge Miguel Silva

2015

A Parallel Computing Hybrid Approach for Feature Selection

Authors
Silva, J; Aguiar, A; Silva, F;

Publication
2015 IEEE 18TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE)

Abstract
The ultimate goal of feature selection is to select the smallest subset of features that yields minimum generalization error from an original set of features. This effectively reduces the feature space, and thus the complexity of classifiers. Though several algorithms have been proposed, no single one outperforms all the other in all scenarios, and the problem is still an actively researched field. This paper proposes a new hybrid parallel approach to perform feature selection. The idea is to use a filter metric to reduce feature space, and then use an innovative wrapper method to search extensively for the best solution. The proposed strategy is implemented on a shared memory parallel environment to speedup the process. We evaluated its parallel performance using up to 32 cores and our results show 30 times gain in speed. To test the performance of feature selection we used five datasets from the well known NIPS challenge and were able to obtain an average score of 95.90% for all solutions.

CloseRead Abstract

2014

VOCE Corpus: Ecologically Collected Speech Annotated with Physiological and Psychological Stress Assessments

Authors
Aguiar, A; Kaiseler, M; Cunha, M; Silva, J; Meinedo, H; Almeida, PR;

Publication
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Public speaking is a widely requested professional skill, and at the same time an activity that causes one of the most common adult phobias (Miller and Stone, 2009). It is also known that the study of stress under laboratory conditions, as it is most commonly done, may provide only limited ecological validity (Wilhelm and Grossman, 2010). Previously, we introduced an inter-disciplinary methodology to enable collecting a large amount of recordings under consistent conditions (Aguiar et al., 2013). This paper introduces the VOCE corpus of speech annotated with stress indicators under naturalistic public speaking (PS) settings. The novelty of this corpus is that the recordings are carried out in objectively stressful PS situations, as recommended in (Zanstra and Johnston, 2011). The current database contains a total of 38 recordings, 13 of which contain full psychologic and physiologic annotation. We show that the collected recordings validate the assumptions of the methodology, namely that participants experience stress during the PS events. We describe the various metrics that can be used for physiologic and psychologic annotation, and we characterise the sample collected so far, providing evidence that demographics do not affect the relevant psychologic or physiologic annotation. The collection activities are on-going, and we expect to increase the number of complete recordings in the corpus to 30 by June 2014.

CloseRead Abstract

2015

Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Authors
Juliao, M; Silva, J; Aguiar, A; Moniz, H; Batista, F;

Publication
LANGUAGES, APPLICATIONS AND TECHNOLOGIES, SLATE 2015

Abstract
Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. The project VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6125 features extracted with openSMILE toolkit plus 160 Teager Energy Operator (TEO) features. We use a Mutual Information (MI) filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.4% for generalisation accuracy.

CloseRead Abstract

2017

Feature extraction for the author name disambiguation problem in a bibliographic database

Authors
Silva, JMB; Silva, FMA;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
Author name disambiguation in bibliographic databases has been, and still is, a challenging research task due to the high uncertainty there is when matching a publication author with a concrete researcher. Common approaches normally either resort to clustering to group author's publications, or use a binary classifier to decide whether a given publication is written by a specific author. Both approaches benefit from authors publishing similar works (e.g. subject areas and venues), from the previous publication history of an author (the higher, the better), and validated publicationauthor associations for model creation. However, whenever such an algorithm is confronted with different works from an author, or an author without publication history, often it makes wrong identifications. In this paper, we describe a feature extraction method that aims to avoid the previous problems. Instead of generally characterizing an author, it selectively uses features that associate the author to a certain publication. We build a Random Forest model to assess the quality of our set of features. Its goal is to predict whether a given author is the true author of a certain publication. We use a bibliographic database named Authenticus with more than 250, 000 validated author-publication associations to test model quality. Our model achieved a top result of 95.37% accuracy in predicting matches and 91.92% in a real test scenario. Furthermore, in the last case the model was able to correctly predict 61.86% of the cases where authors had no previous publication history. Copyright 2017 ACM.

CloseRead Abstract

2018

Parallel Asynchronous Strategies for the Execution of Feature Selection Algorithms

Authors
Silva, J; Aguiar, A; Silva, F;

Publication
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING

Abstract
Reducing the dimensionality of datasets is a fundamental step in the task of building a classification model. Feature selection is the process of selecting a smaller subset of features from the original one in order to enhance the performance of the classification model. The problem is known to be NP-hard, and despite the existence of several algorithms there is not one that outperforms the others in all scenarios. Due to the complexity of the problem usually feature selection algorithms have to compromise the quality of their solutions in order to execute in a practicable amount of time. Parallel computing techniques emerge as a potential solution to tackle this problem. There are several approaches that already execute feature selection in parallel resorting to synchronous models. These are preferred due to their simplicity and capability to use with any feature selection algorithm. However, synchronous models implement pausing points during the execution flow, which decrease the parallel performance. In this paper, we discuss the challenges of executing feature selection algorithms in parallel using asynchronous models, and present a feature selection algorithm that favours these models. Furthermore, we present two strategies for an asynchronous parallel execution not only of our algorithm but of any other feature selection approach. The first strategy solves the problem using the distributed memory paradigm, while the second exploits the use of shared memory. We evaluate the parallel performance of our strategies using up to 32 cores. The results show near linear speedups for both strategies, with the shared memory strategy outperforming the distributed one. Additionally, we provide an example of adapting our strategies to execute the Sequential forward Search asynchronously. We further test this version versus a synchronous one. Our results revealed that, by using an asynchronous strategy, we are able to save an average of 7.5% of the execution time.

CloseRead Abstract

2018

Hierarchical Expert Profiling Using Heterogeneous Information Networks

Authors
Silva, JMB; Ribeiro, P; Silva, FMA;

Publication
Discovery Science - 21st International Conference, DS 2018, Limassol, Cyprus, October 29-31, 2018, Proceedings

Abstract
Linking an expert to his knowledge areas is still a challenging research problem. The task is usually divided into two steps: identifying the knowledge areas/topics in the text corpus and assign them to the experts. Common approaches for the expert profiling task are based on the Latent Dirichlet Allocation (LDA) algorithm. As a result, they require pre-defining the number of topics to be identified which is not ideal in most cases. Furthermore, LDA generates a list of independent topics without any kind of relationship between them. Expert profiles created using this kind of flat topic lists have been reported as highly redundant and many times either too specific or too general. In this paper we propose a methodology that addresses these limitations by creating hierarchical expert profiles, where the knowledge areas of a researcher are mapped along different granularity levels, from broad areas to more specific ones. For the purpose, we explore the rich structure and semantics of Heterogeneous Information Networks (HINs). Our strategy is divided into two parts. First, we introduce a novel algorithm that can fully use the rich content of an HIN to create a topical hierarchy, by discovering overlapping communities and ranking the nodes inside each community. We then present a strategy to map the knowledge areas of an expert along all the levels of the hierarchy, exploiting the information we have about the expert to obtain an hierarchical profile of topics. To test our proposed methodology, we used a computer science bibliographical dataset to create a star-schema HIN containing publications as star-nodes and authors, keywords and ISI fields as attribute-nodes. We use heterogeneous pointwise mutual information to demonstrate the quality and coherence of our created hierarchies. Furthermore, we use manually labelled data to serve as ground truth to evaluate our hierarchical expert profiles, showcasing how our strategy is capable of building accurate profiles. © 2018, Springer Nature Switzerland AG.

CloseRead Abstract