Publicacoes - INESC TEC

Publicações

Publicações por João Cordeiro

2013

Rule Induction for Sentence Reduction

Autores
Cordeiro, J; Dias, G; Brazdil, P;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013

Abstract
Sentence Reduction has recently received a great attention from the research community of Automatic Text Summarization. Sentence Reduction consists in the elimination of sentence components such as words, part-of-speech tags sequences or chunks without highly deteriorating the information contained in the sentence and its grammatical correctness. In this paper, we present an unsupervised scalable methodology for learning sentence reduction rules. Paraphrases are first discovered within a collection of automatically crawled Web News Stories and then textually aligned in order to extract interchangeable text fragment candidates, in particular reduction cases. As only positive examples exist, Inductive Logic Programming (ILP) provides an interesting learning paradigm for the extraction of sentence reduction rules. As a consequence, reduction cases are transformed into first order logic clauses to supply a massive set of suitable learning instances and an ILP learning environment is defined within the context of the Aleph framework. Experiments evidence good results in terms of irrelevancy elimination, syntactical correctness and reduction rate in a real-world environment as opposed to other methodologies proposed so far.

FecharLer Abstract

2015

Fractal Beauty in Text

Autores
Cordeiro, J; Inacio, PRM; Fernandes, DAB;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
This paper assesses if text possesses fractal properties, namely if several attributes that characterize sentences are self-similar. In order to do that, seven corpora were analyzed using several statistical tools, so as to determine if the empirical sequences for the attributes were Gaussian and self-similar. The Kolmogorov-Smirnov goodness-of-fit test and two Hurst parameter estimators were employed. The results show that there is a fractal beauty in the text produced by humans and suggest that its quality is directly proportional to the self-similarity degree.

FecharLer Abstract

2018

ECIR 2018: Text2Story Workshop - Narrative Extraction from Texts

Autores
Jorge, A; Campos, R; Jatowt, A; Nunes, S; Rocha, C; Cordeiro, JP; Pasquali, A; Mangaravite, V;

Publicação
SIGIR Forum

Abstract

2018

Extracting Adverse Drug Effects from User Experiences: A Baseline

Autores
Abrantes, D; Cordeiro, J;

Publicação
Proceedings - IEEE Symposium on Computer-Based Medical Systems

Abstract
It has been proved that pharmacovigilance benefits from the analysis and extraction of user generated data from blogs, medical forums or other social networks, regarding adverse effect mentions or complaints that occur from taking certain drugs. Data mining, machine learning, pattern recognition, content summarization and natural language processing techniques are often used in this field with promising results. However, there are still several difficulties concerning the extraction, as the highly domain-specific vocabulary presents a few challenges. This is mainly because patients like to use idiomatic or vernacular expressions along with descriptive symptom explanations, which tend to deviate from grammatical rules or expected terms. To address this issue, we propose a well-curated baseline. We believe that building a specific lexicon, identifying common linguistic patterns and observing certain phrasal structures is key to first understanding how a user generates contents online. From there, we can later develop sets of tailored rules that will allow data classification/extraction systems to potentially improve their efficiency at these tasks. © 2018 IEEE.

FecharLer Abstract

2019

Association and Temporality between News and Tweets

Autores
Moutinho, V; Brazdil, P; Cordeiro, J;

Publicação
Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2019, Volume 1: KDIR, Vienna, Austria, September 17-19, 2019.

Abstract
With the advent of social media, the boundaries of mainstream journalism and social networks are becoming blurred. User-generated content is increasing, and hence, journalists dedicate considerable time searching platforms such as Facebook and Twitter to announce, spread, and monitor news and crowd check information. Many studies have looked at social networks as news sources, but the relationship and interconnections between this type of platform and news media have not been thoroughly investigated. In this work, we have studied a series of news articles and examined a set of related comments on a social network during a period of six months. Specifically, a sample of articles from generalist Portuguese news sources published on the first semester of 2016 was clustered, and the resulting clusters were then associated with tweets of Portuguese users with the recourse to a similarity measure. Focusing on a subset of clusters, we have performed a temporal analysis by examining the evolution of the two types of documents (articles and tweets) and the timing of when they appeared. It appears that for some stories, namely Brexit and the European Football Cup, the publishing of news articles intensifies on key dates (event-oriented), while the discussion on social media is more balanced throughout the months leading up to those events. Copyright

FecharLer Abstract

2019

SocialNetCrawler: Online Social Network Crawler

Autores
Pais, S; Cordeiro, J; Martins, R; Albardeiro, M;

Publicação
11th International Conference on Management of Digital EcoSystems, MEDES 2019, Limassol, Cyprus, November, 2019

Abstract
The emergence and popularization of online social networks suddenly made available a large amount of data from social organization, interaction and human behavior. All this information opens new perspectives and challenges to the study of social systems, being of interest to many fields. Although most online social networks are recent, a vast amount of scientific papers was already published on this topic, dealing with a broad range of analytical methods and applications. Therefore, the development of a tool capable of gather tailored information from social networks is something that can help a lot of researchers on their work, especially in the area of Natural Language Processing (NLP). Nowadays, the daily base medium where people use more often text language lays precisely on social networks. Therefore, the ubiquitous crawling of social networks is of the utmost importance for researchers. Such a tool will allow the researcher to get the relevant needed information, allowing a faster research in what really matters, without loosing time on the development of his own crawler. In this paper, we present an extensive analysis of the existing social networks and their APIs, and also describe the conception and design of a social network crawler which will help NLP researchers. © 2019 Association for Computing Machinery.

FecharLer Abstract