Publicacoes - INESC TEC

Publicações

Publicações por João Cordeiro

2022

NLP-based platform as a service: a brief review

Autores
Pais, S; Cordeiro, J; Jamil, ML;

Publicação
JOURNAL OF BIG DATA

Abstract
Natural language processing (NLP) refers to the field of study that focuses on the interactions between human language and computers. It has recently gained much attention for analyzing human language computationally and has spread its applications for various tasks such as machine translation, information extraction, summarization, question answering, and others. With the rapid growth of cloud computing services, merging NLP in the cloud is a significant benefit. It allows researchers to conduct NLP-related experiments on large amounts of data handled by big data techniques while harnessing the cloud's vast, on-demand computing power. However, it has not sufficiently spread its tools and applications as a service in the cloud and there is little literature available that discusses the scope of interdisciplinary work. NLP, cloud Computing, and big data are vast domains and contain their challenges and potentials. By overcoming those challenges and integrating these fields, great potential for NLP and its applications can be unleashed. This paper presents a survey of NLP in cloud computing with a key focus on the comparison of cloud-based NLP services, challenges of NLP and big data while emphasizing the necessity of viable cloud-based NLP services. In the first part of this paper, an overview of NLP is presented by discussing different levels of NLP and components of natural language generation (NLG), followed by the applications of NLP. In the second part, the concept of cloud computing is discussed that highlights the architectural layers and deployment models of cloud computing and cloud-hosted NLP services. In the third part, the field of big data in the cloud is discussed with an emphasis on NLP. Furthermore, information extraction via NLP techniques within big data is introduced.

FecharLer Abstract

2022

Detection of extreme sentiments on social networks with BERT

Autores
Jamil, ML; Pais, S; Cordeiro, J; Dias, G;

Publicação
SOCIAL NETWORK ANALYSIS AND MINING

Abstract
Online social networking platforms allow people to freely express their ideas, opinions, and emotions negatively or positively. Previous studies have examined sentiments on these platforms to study their behavior in different contexts and purposes. The mechanism of collecting public opinion information has attracted researchers to automatically classify the polarity of public opinions based on the use of concise language in messages, such as tweets, by analyzing social media data. In this paper, we extend the preceding work where an unsupervised approach to automatically detect extreme opinions/posts in social networks is proposed. The performance of the proposed approach is evaluated on five different social network and media datasets. In this work, we use a semi-supervised approach known as BERT to reevaluate the accuracy of our prior approach and the obtained classified dataset. The experiment proves that in these datasets, posts that were previously classified as negative or positive extreme are extremely negative or positive in many cases while using BERT. Furthermore, BERT shows the capability to classify the extreme sentiments when fine-tuned with an appropriate extreme sentiments dataset.

FecharLer Abstract

2021

A Comparative Study of Linguistic and Computational Features Based on a Machine Learning for Arabic Anaphora Resolution

Autores
Abolohom, A; Omar, N; Pais, S; Cordeiro, J;

Publicação
AI IN COMPUTATIONAL LINGUISTICS

Abstract
Anaphora resolution is one of the problems in natural language processing. It is the process of disambiguating the antecedent of a referring expression from the set of entities in a discourse. The correct interpretation of pronouns plays an important role in the construction of meaning Thus, the resolution of pronominal anaphors remains a very important task for many natural language processing applications. Additionally, it plays an increasingly significant role in computational linguistics. However, a significant amount of work on anaphora resolution is focused on English; anaphora resolution for other languages, including Arabic, is still limited. In this paper, we present a new set of computational and linguistic features to resolve Arabic anaphors using a machine learning approach. In this paper, an in-depth study was conducted on a set of computational and linguistic features to exploit their effectiveness and investigate their effect on anaphora resolution. The aim was to efficiently integrate different feature sets and classification algorithms to synthesize a more accurate classification procedure. Four well-known machine learning algorithms k-nearest neighbor, maximum entropy, decision tree and meta-classifier, were employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on Quran datasets was conducted, the discussion presented, and conclusions were drawn. The experimental results show that our approach gives satisfactory results. (C) 2021 The Authors. Published by Elsevier B.V.

FecharLer Abstract

2010

Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis

Autores
Dias, G; Moraliyski, R; Cordeiro, J; Doucet, A; Ahonen Myka, H;

Publicação
NATURAL LANGUAGE ENGINEERING

Abstract
Thesauri, which list the most salient semantic relations between words, have mostly been compiled manually. Therefore, the inclusion of an entry depends on the subjective decision of the lexicographer. As a consequence, those resources are usually incomplete. In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal differs from all other research presented so far as it tries to take the best of two different methodologies, i.e. semantic space models and information extraction models. In particular, it can be applied to extract close semantic relations, it limits the search space to few, highly probable options and it is unsupervised.

FecharLer Abstract

2007

Learning paraphrases from WNS corpora

Autores
Cordeiro, J; Dias, G; Brazdil, P;

Publicação
Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007

Abstract
Paraphrase detection can be seen as the task of aligning sentences that convey the same information but yet are written in different forms. Such resources are important to automatically learn text-to-text rewriting rules. In this paper, we present a new metric for unsupervised detection of paraphrases and apply it in the context of clustering of paraphrases. An exhaustive evaluation is conducted over a set of standard paraphrase corpora and real-world web news stories (WNS) corpora. The results are promising as they outperform state-of-the-art measures developed for similar tasks. Copyright

FecharLer Abstract

2007

A Metric for Paraphrase Detection

Autores
Cordeiro, J; Dias, G; Brazdil, P;

Publicação
2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07)

Abstract