Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Vítor Mangaravite

2018

A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Authors
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;

Publication
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)

Abstract
In this work, we propose a lightweight approach for keyword extraction and ranking based on an unsupervised methodology to select the most important keywords of a single document. To understand the merits of our proposal, we compare it against RAKE, TextRank and SingleRank methods (three well-known unsupervised approaches) and the baseline TF. IDF, over four different collections to illustrate the generality of our approach. The experimental results suggest that extracting keywords from documents using our method results in a superior effectiveness when compared to similar approaches.

2018

YAKE! Collection-Independent Automatic Keyword Extractor

Authors
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;

Publication
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)

Abstract
In this paper, we present YAKE!, a novel feature-based system for multi-lingual keyword extraction from single documents, which supports texts of different sizes, domains or languages. Unlike most systems, YAKE! does not rely on dictionaries or thesauri, neither it is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in many different languages without the need for external knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding (IBM NLU) and Rake system. YAKE! demo is available at http://bit.ly/YakeDemoECIR2018. A python implementation of YAKE! is also available at PyPi repository (https://pypi.python.org/pypi/yake/).