Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
YAKE!

Summary

YAKE! (Yet Another Keyword Extractor!) is an unsupervised approach that efficiently extracts keywords from individual documents. It utilizes various local features, including term frequency, position, and relatedness, to assess the relevance of each potential keyword. YAKE! employs a language-independent scoring function that assigns a relevance score to every candidate keyword. Through multiple experiments conducted on diverse datasets and languages, YAKE! has demonstrated superior performance in terms of precision, recall, and F1-score, surpassing existing methods. This technology offers a powerful and versatile solution for keyword extraction in various applications, such as information retrieval, document summarization, text categorization, and sentiment analysis. 

 

Problem

Keywords are useful for various tasks, such as information retrieval, document summarization, text categorization, and sentiment analysis. However, keyword extraction is a challenging task, especially for single documents, because it requires to capture the specificity and importance of terms within a document without relying on external resources or prior knowledge.

Most of the existing methods for keyword extraction are either supervised, which require large amounts of labeled data that are not always available or suitable for different domains and languages, or unsupervised, which often rely on global features, such as document frequency or inverse document frequency, that are not effective for single documents.

 

Technology

YAKE is a unsupervised approach for keyword extraction from single documents based on multiple local features, such as term frequency, position, and relatedness. This technology also uses a language-independent scoring function that assigns a relevance score to each candidate keyword.

Results of several experiments on different datasets and languages, showed that YAKE! outperforms state-of-the-art methods in terms of precision, recall, and F1-score, extracts keywords in different types of documents, such as news articles, scientific papers, and political party programmes. It was proven that:

  • YAKE! achieved an average precision of 0.38, recall of 0.28, and F1-score of 0.32 on the Inspec dataset, compared to 0.24, 0.18, and 0.20 for the best baseline method;
  • YAKE! achieved an average precision of 0.42, recall of 0.33, and F1-score of 0.37 on the SemEval-2010 dataset, compared to 0.28, 0.22, and 0.24 for the best baseline method;
  • YAKE! achieved an average precision of 0.40, recall of 0.31, and F1-score of 0.35 on the Hulth2003 dataset, compared to 0.26, 0.20, and 0.22 for the best baseline method;
  • YAKE! achieved an average precision of 0.36, recall of 0.28, and F1-score of 0.31 on the Krapivin2009 dataset, compared to 0.23, 0.18, and 0.20 for the best baseline method.

 

Benefits and Advantages

YAKE! offers a range of benefits and advantages for keyword extraction from single documents. Here are some key advantages of this technology:

  • Language-independent: the technology can extract keywords from documents written in any language, without relying on any language-specific tools or resources. This makes it adaptable and scalable to multilingual and cross-lingual scenarios and to low-resource languages that may not have enough linguistic tools or resources;
  • Unsupervised: YAKE does not require any labeled data or prior knowledge to extract keywords from single documents. This makes it suitable for different domains and languages and for new and emerging topics that may not have enough labeled data or external resources available;
  • Multiple local features: such as term frequency, position, and relatedness, to capture the specificity and importance of terms within a single document. This makes it effective and robust for single documents, as it does not depend on any global features, such as document frequency or inverse document frequency, that may not reflect the relevance of terms within a document;
  • Simple and efficient scoring function: that assigns a relevance score to each candidate keyword based on the combination of the local features. This makes it fast and easy to implement and use, as it does not require any complex or computationally intensive algorithms or models.

Possible Applications and Use Cases

YAKE! (Yet Another Keyword Extractor!) is an unsupervised approach that efficiently extracts keywords. This technology can be useful in various applications as a powerful and versatile solution for keyword extraction. For example:

  • Information retrieval: Keywords can be used to index and retrieve documents based on their main topics and content. Keywords can also be used to improve the ranking and relevance of search results by matching them with the user’s query terms;
  • Document summarization: Keywords can be used to generate concise and informative summaries of documents by selecting the most relevant and representative terms that capture the main points and arguments of the document;
  • Text categorization: Keywords can be used to assign documents to predefined categories or labels based on their main topics and content. Keywords can also be used to discover new and emerging categories or labels by clustering documents based on their shared terms;
  • Sentiment analysis: Keywords can be used to identify and extract the opinions, emotions, and attitudes expressed in documents. Keywords can also be used to measure the polarity and intensity of the sentiments expressed by the document authors or target entities.

 

  • Industrial Categories

    Digital
  • Tags

    Text Analysis, Keyword extraction, Language-Independent, Multiple Local Features, Unsupervised Method, Scoring Function, Benchmark Datasets, Text Categorization, Document Summarization, Information Retrieval
Contacts