2023
Authors
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;
Publication
ORSUM@RecSys
Abstract
2023
Authors
Muhammad, SH; Abdulmumin, I; Ayele, AA; Ousidhoum, N; Adelani, DI; Yimam, SM; Ahmad, IS; Beloucif, M; Mohammad, SM; Ruder, S; Hourrane, O; Jorge, A; Brazdil, P; António Ali, FDM; David, D; Osei, S; Bello, BS; Lawan, FI; Gwadabe, T; Rutunda, S; Belay, TD; Messelle, WB; Balcha, HB; Chala, SA; Gebremichael, HT; Opoku, B; Arthur, S;
Publication
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023
Abstract
Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorùbá) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task 1. We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness. ©2023 Association for Computational Linguistics.
2023
Authors
Rabaev, I; Litvak, M; Younkin, V; Campos, R; Jorge, AM; Jatowt, A;
Publication
Proceedings of the IACT - The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval held in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), Taipei, Taiwan, July 27, 2023.
Abstract
This paper describes the shared task on Automatic Classification of Literary Epochs (CoLiE) held as a part of the 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT’23) held at SIGIR 2023. The competition aimed to enhance the capabilities of large-scale analysis and cross-comparative studies of literary texts by automating their classification into the respective epochs. We believe that the competition contributed to the field of information retrieval by exposing the first large benchmark dataset and the first study’s results with various methods applied to this dataset. This paper presents the details of the contest, the dataset used, the evaluation procedure, and an overview of participating methods. © 2022 Copyright for this paper by its authors.
2023
Authors
Sousa, H; Campos, R; Jorge, A;
Publication
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023
Abstract
Temporal expression identification is crucial for understanding texts written in natural language. Although highly effective systems such as HeidelTime exist, their limited runtime performance hampers adoption in large-scale applications and production environments. In this paper, we introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime, supporting six languages, and achieving state-of-the-art results in four of them. To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed Professor HeidelTime, a comprehensive weakly labeled corpus of news texts annotated with HeidelTime. This corpus comprises a total of 138, 069 documents (over six languages) with 1, 050, 921 temporal expressions, the largest open-source annotated dataset for temporal expression identification to date. By describing how the models were produced, we aim to encourage the research community to further explore, refine, and extend the set of models to additional languages and domains. Code, annotations, and models are openly available for community exploration and use. The models are conveniently on HuggingFace for seamless integration and application.
2023
Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;
Publication
IACT@SIGIR
Abstract
2023
Authors
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;
Publication
Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023
Abstract
Modern online platforms for user modeling and recommendation require complex data infrastructures to collect and process data. Some of this data has to be kept to later be used in batches to train personalization models. However, since user activity data can be generated at very fast rates it is also useful to have algorithms able to process data streams online, in real time. Given the continuous and potentially fast change of content, context and user preferences or intents, stream-based models, and their synchronization with batch models can be extremely challenging. Therefore, it is important to investigate methods able to transparently and continuously adapt to the inherent dynamics of user interactions, preferably over long periods of time. Models able to continuously learn from such flows of data are gaining attention in the recommender systems community, and are being increasingly deployed in online platforms. However, many challenges associated with learning from streams need further investigation. The objective of this workshop is to foster contributions and bring together a growing community of researchers and practitioners interested in online, adaptive approaches to user modeling, recommendation and personalization, and their implications regarding multiple dimensions, such as reproducibility, privacy, fairness, diversity, transparency, auditability, and compliance with recently adopted or upcoming legal frameworks worldwide. © 2023 Owner/Author.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.