Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Ricardo Campos

2023

TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification

Authors
Sousa, H; Campos, R; Jorge, A;

Publication
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023

Abstract
Temporal expression identification is crucial for understanding texts written in natural language. Although highly effective systems such as HeidelTime exist, their limited runtime performance hampers adoption in large-scale applications and production environments. In this paper, we introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime, supporting six languages, and achieving state-of-the-art results in four of them. To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed Professor HeidelTime, a comprehensive weakly labeled corpus of news texts annotated with HeidelTime. This corpus comprises a total of 138, 069 documents (over six languages) with 1, 050, 921 temporal expressions, the largest open-source annotated dataset for temporal expression identification to date. By describing how the models were produced, we aim to encourage the research community to further explore, refine, and extend the set of models to additional languages and domains. Code, annotations, and models are openly available for community exploration and use. The models are conveniently on HuggingFace for seamless integration and application.

2023

Proceedings of the IACT - The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval held in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), Taipei, Taiwan, July 27, 2023

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
IACT@SIGIR

Abstract

2024

Pre-trained language models: What do they know?

Authors
Guimaraes, N; Campos, R; Jorge, A;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.This article is categorized under:Fundamental Concepts of Data and Knowledge > Key Design Issues in DataMiningTechnologies > Artificial Intelligence

2023

The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT'23)

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
The first edition of the Implicit Author Characterization from Texts for Search and Retrieval (IACT'23) aims at bringing to the forefront the challenges involved in identifying and extracting from texts implicit information about authors (e.g., human or AI) and using it in IR tasks. The IACT workshop provides a common forum to consolidate multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the task of extracting implicit author-related information from the textual content, including novel tasks and datasets. We will also discuss the ethical implications of implicit information extraction. In addition, we announce a shared task focused on automatically determining the literary epochs of written books.

2007

Digital libraries and engines of search: new information systems in the context of the digital preservation

Authors
Campos, R;

Publication
Proceedings of the 2007 Euro American conference on Telematics and Information Systems, EATIS 2007, Faro, Portugal, May 14-17, 2007

Abstract
The first's library projects occur some years ago with digitization, but just in 1996, the first's web archive initiatives start occurring. Such, was based in the Internet growth and in its increasing use, items that revealed to be an opportunity to transform and readapt the traditional library services. In this context, search engines play a fundamental role of support to the new paradigm of knowledge, by capturing, storing and providing access to the resources, allowing the existence of a digital library in each computer with internet access. In this article we analyze the ways of developing a digital library, taking higher attention to the web harvesting technique, and presenting digital libraries capabilities and limitations. Then we fully summarize relevant projects and initiatives, to finally study the role of search engines in what concerns to, digital preservation, access and information diffusion.

2012

Temporal Web Image Retrieval

Authors
Dias, G; Moreno, JG; Jatowt, A; Campos, R;

Publication
STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012

Abstract
Temporal Web Image Retrieval can be defined as the process that retrieves sets of Web images with their temporal dimension from explicit or implicit temporal text queries. Supposing that (a) the temporal dimension is included in image indexing and (b) the query is explicitly expressed with a time tag (e. g. "Fukushima 2011"), the retrieval task can be straightforward as image retrieval has been studied for several years with success. However, text queries are usually implicit in time (e. g. "Second World War") and automatically capturing the time dimension included in Web images is a challenge that has not been studied so far to the best of our knowledge. In this paper, we will discuss different research issues about Temporal Web Image Retrieval and the current progresses of our research in temporal ephemeral clustering and temporal image filtering.

  • 18
  • 22