2024
Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;
Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V
Abstract
The Text2Story Workshop series, dedicated to Narrative Extraction from Texts, has been running successfully since 2018. Over the past six years, significant progress, largely propelled by Transformers and Large Language Models, has advanced our understanding of natural language text. Nevertheless, the representation, analysis, generation, and comprehensive identification of the different elements that compose a narrative structure remains a challenging objective. In its seventh edition, the workshop strives to consolidate a common platform and a multidisciplinary community for discussing and addressing various issues related to narrative extraction tasks. In particular, we aim to bring to the forefront the challenges involved in understanding narrative structures and integrating their representation into established frameworks, as well as in modern architectures (e.g., transformers) and AI-powered language models (e.g., chatGPT) which are now common and form the backbone of almost every IR and NLP application. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with keynote talks. Moreover, there is dedicated space for informal discussions on methods, challenges, and the future of research in this dynamic field.
2023
Authors
Mansouri, B; Campos, R; Jatowt, A;
Publication
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023
Abstract
Timeline summarization (TLS) is a challenging research task that requires researchers to distill extensive and intricate temporal data into a concise and easily comprehensible representation. This paper proposes a novel approach to timeline summarization using Abstract Meaning Representations (AMRs), a graphical representation of the text where the nodes are semantic concepts and the edges denote relationships between concepts. With AMR, sentences with different wordings, but similar semantics, have similar representations. To make use of this feature for timeline summarization, a two-step sentence selection method that leverages features extracted from both AMRs and the text is proposed. First, AMRs are generated for each sentence. Sentences are then filtered out by removing those with no named-entities and keeping the ones with the highest number of named-entities. In the next step, sentences to appear in the timeline are selected based on two scores: Inverse Document Frequency (IDF) of AMR nodes combined with the score obtained by applying a keyword extraction method to the text. Our experimental results on the TLS-Covid19 test collection demonstrate the potential of the proposed approach.
2022
Authors
Jatowt, A; Doucet, A; Campos, R;
Publication
WWW (Companion Volume)
Abstract
Time expressions embedded in text are important for many downstream tasks in NLP and IR. They have been, for example, utilized for timeline summarization, named entity recognition, temporal information retrieval, question answering and others. In this paper, we introduce a novel analytical approach to analyzing characteristics of time expressions in diachronic text collections. Based on a collection of news articles published over a 33-years' long time span, we investigate several aspects of time expressions with a focus on their interplay with publication dates of containing documents. We utilize a graph-based representation of temporal expressions to represent them through their co-occurring named entities. The proposed approach results in several observations that could be utilized in automatic systems that rely on processing temporal signals embedded in text. It could be also of importance for professionals (e.g., historians) who wish to understand fluctuations in collective memories and collective expectations based on large-scale, diachronic document collections.
2023
Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;
Publication
SIGIR Forum
Abstract
2023
Authors
Mansouri, B; Campos, R;
Publication
CoRR
Abstract
2023
Authors
Mansouri, B; Durgin, S; Franklin, S; Fletcher, S; Campos, R;
Publication
CLEF (Working Notes)
Abstract
This paper describes the participation of the Artificial Intelligence and Information Retrieval (AIIR) Lab from the University of Southern Maine and the Laboratory of Artificial Intelligence and Decision Support (LIAAD) lab from INESC TEC in the CLEF 2023 SimpleText lab. There are three tasks defined for SimpleText: (T1) What is in (or out)?, (T2) What is unclear?, and (T3) Rewrite this!. Five runs were submitted for Task 1 using traditional Information Retrieval, and Sentence-BERT models. For Task 2, three runs were submitted, using YAKE! and KBIR keyword extraction models. Finally, for Task 3, two models were deployed, one using OpenAI Davinci embeddings and the other combining two unsupervised simplification models.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.