2023
Authors
Castro, M; Jorge, A; Campos, R;
Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III
Abstract
The rise of social media has brought a great transformation to the way news are discovered and shared. Unlike traditional news sources, social media allows anyone to cover a story. Therefore, sometimes an event is already discussed by people before a journalist turns it into a news article. Twitter is a particularly appealing social network for discussing events, since its posts are very compact and, therefore, contain colloquial language and abbreviations. However, its large volume of tweets also makes it impossible for a user to keep up with an event. In this work, we present TweetStream2Story, a web app for extracting narratives from tweets posted in real time, about a topic of choice. This framework can be used to provide new information to journalists or be of interest to any user who wishes to stay up-to-date on a certain topic or ongoing event. As a contribution to the research community, we provide a live version of the demo, as well as its source code.
2023
Authors
Goncalves, F; Campos, R; Jorge, A;
Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III
Abstract
In recent years, the amount of information generated, consumed and stored has grown at an astonishing rate, making it difficult for those seeking information to extract knowledge in good time. This has become even more important, as the average reader is not as willing to spare more time out of their already busy schedule as in the past, thus prioritizing news in a summarized format, which are faster to digest. On top of that, people tend to increasingly rely on strong visual components to help them understand the focal point of news articles in a less tiresome manner. This growing demand, focused on exploring information through visual aspects, urges the need for the emergence of alternative approaches concerned with text understanding and narrative exploration. This motivated us to propose Text2Storyline, a platform for generating and exploring enriched storylines from an input text, a URL or a user query. The latter is to be issued on the PortugueseWebArchive (Arquivo.pt), therefore giving users the chance to expand their knowledge and build up on information collected from web sources of the past. To fulfill this objective, we propose a system that makes use of the TimeMatters algorithm to filter out non-relevant dates and organize relevant content by means of different displays: `Annotated Text', `Entities', `Storyline', `Temporal Clustering' and `Word Cloud'. To extend the users' knowledge, we rely on entity linking to connect persons, events, locations and concepts found in the text to Wikipedia pages, a process also known as Wikification. Each of the entities is then illustrated by means of an image collected from the Arquivo.pt.
2023
Authors
Campos, R; Jatowt, A; Jorge, A;
Publication
Information for a Better World: Normality, Virtuality, Physicality, Inclusivity - 18th International Conference, iConference 2023, Virtual Event, March 13-17, 2023, Proceedings, Part I
Abstract
Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) political party programs made available in the context of the legislative elections that took place in 30th of January 2022. Our contributions are two-fold. At the level of resources, we make available a curated dataset and a python notebook that systematizes the process of transforming text into quantitative data and into visual aspects. At the methodological level, we propose to extend the keyword extraction algorithm used in this study to extract the most relevant keywords, not only from individual political party programs, but also across the entire collection of documents. A further contribution is the case-study itself, which calls attention to the fact that such solutions may be of interest not only to common people, but also to journalists or politicians alike. Broadly, we demonstrate how the discussion and the analysis that stems from the results obtained may foster the political science research by making available large-scale processing of documents with marginal costs. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
2023
Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;
Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III
Abstract
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved challenges and the interest in the community are at the basis of the sixth edition of Text2Story workshop to be held in Dublin on April 2nd, 2023 in conjunction with the 45th European Conference on Information Retrieval (ECIR'23). In its sixth edition, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction and generation task. Text2Story includes sessions devoted to full research papers, work-in-progress, demos and dissemination papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.
2023
Authors
Sousa, H; Jorge, A; Campos, R;
Publication
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023
Abstract
Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades. Such endeavors have led to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult to benchmark TIE systems. On the one hand, different datasets have different annotation schemes, which hinders the comparison between competitors across different corpora. On the other hand, the fact that each corpus is disseminated in a different format requires a considerable engineering effort for a researcher/practitioner to develop parsers for all of them. These constraints force researchers to select a limited amount of datasets to evaluate their systems which consequently limits the comparability of the systems. Yet another obstacle to the comparability of TIE systems is the evaluation metric employed. While most research works adopt traditional metrics such as precision, recall, and..1, a few others prefer temporal awareness - a metric tailored to be more comprehensive on the evaluation of temporal systems. Although the reason for the absence of temporal awareness in the evaluation of most systems is not clear, one of the factors that certainly weighs on this decision is the need to implement the temporal closure algorithm, which is neither straightforward to implement nor easily available. All in all, these problems have limited the fair comparison between approaches and consequently, the development of TIE systems. To mitigate these problems, we have developed tieval, a Python library that provides a concise interface for importing different corpora and is equipped with domain-specific operations that facilitate system evaluation. In this paper, we present the first public release of tieval and highlight its most relevant features. The library is available as open source, under MIT License, at PyPI1 and GitHub(2).
2023
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;
Publication
Text2Story@ECIR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.