Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Sobre

Sobre

Ricardo Campos é professor auxiliar do Departamento de Informática da Universidade da Beira Interior (UBI) e Professor convidado da Porto Business School. É investigador sénior do LIAAD-INESC TEC, Laboratório de Inteligência Artificial e Apoio à Decisão da Universidade do Porto, e colaborador do Ci2.ipt, Centro de Investigação em Cidades Inteligentes do Instituto Politécnico de Tomar. É doutorado em Ciências da Computação pela Universidade do Porto (U. Porto), mestre e licenciado pela Universidade da Beira Interior (UBI). Possui mais de 10 anos de experiência de investigação nas áreas de recuperação de informação e processamento da linguagem natural, período durante o qual o seu trabalho foi distinguido com vários prémios de mérito científico em conferências internacionais e competições científicas. É autor do software de extração de keywords YAKE!, do projeto Conta-me Histórias e Arquivo Público, entre outros. Participou em vários projetos de investigação financiados pela FCT. A sua investigação foca-se no desenvolvimento de métodos relacionados com o processo de extração de narrativas a partir de textos, em particular na identificação e no relacionamento entre entidades, eventos e os seus aspetos temporais. Co-organizou conferências e workshops internacionais na área da recuperação de informação, e é regularmente membro do comité científico de várias conferências internacionais. É também membro do editorial board do Information Processing and Management Journal (Elsevier). É membro do fórum de aconselhamento científico da Portulan Clarin - Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem, que pertence ao Roteiro Nacional de Infraestruturas de Investigação de Relevância Estratégica. Para mais informações clique aqui.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Ricardo Campos
  • Cluster

    Informática
  • Cargo

    Investigador Sénior
  • Desde

    01 julho 2012
001
Publicações

2023

Public News Archive: A Searchable Sub-archive to Portuguese Past News Articles

Autores
Campos, R; Correia, D; Jatowt, A;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III

Abstract
Over the past fewdecades, the amount of information generated turned the Web into the largest knowledge infrastructure existing to date. Web archives have been at the forefront of data preservation, preventing the losses of significant data to humankind. Different snapshots of the web are saved everyday enabling users to surf the past web and to travel through this overtime. Despite these efforts, many people are not aware that the web is being preserved, often finding these infrastructures to be unattractive or difficult to use, when compared to common search engines. In this paper, we give a step towards making use of this preserved information to develop Public Archive an intuitive interface that enables end-users to search and analyze a large-scale of 67,242 past preserved news articles belonging to a Portuguese reference newspaper (Jornal Publico). The referred collection was obtained by scraping 10,976 versions of the homepage of the Jornal Publico preserved by the Portuguese web archive infrastructure (Arquivo.pt) during the time-period of 2010 to 2021. By doing this, we aim, not only to mark a stand in what respects to make use of this preserved information, but also to come up with an easy-to-follow solution, the Public Archive python package, which creates the roots to be used (with minor adaptations) by other news source providers interested in offering their readers access to past news articles.

2023

Text2Storyline: Generating Enriched Storylines from Text

Autores
Goncalves, F; Campos, R; Jorge, A;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III

Abstract
In recent years, the amount of information generated, consumed and stored has grown at an astonishing rate, making it difficult for those seeking information to extract knowledge in good time. This has become even more important, as the average reader is not as willing to spare more time out of their already busy schedule as in the past, thus prioritizing news in a summarized format, which are faster to digest. On top of that, people tend to increasingly rely on strong visual components to help them understand the focal point of news articles in a less tiresome manner. This growing demand, focused on exploring information through visual aspects, urges the need for the emergence of alternative approaches concerned with text understanding and narrative exploration. This motivated us to propose Text2Storyline, a platform for generating and exploring enriched storylines from an input text, a URL or a user query. The latter is to be issued on the PortugueseWebArchive (Arquivo.pt), therefore giving users the chance to expand their knowledge and build up on information collected from web sources of the past. To fulfill this objective, we propose a system that makes use of the TimeMatters algorithm to filter out non-relevant dates and organize relevant content by means of different displays: `Annotated Text', `Entities', `Storyline', `Temporal Clustering' and `Word Cloud'. To extend the users' knowledge, we rely on entity linking to connect persons, events, locations and concepts found in the text to Wikipedia pages, a process also known as Wikification. Each of the entities is then illustrated by means of an image collected from the Arquivo.pt.

2023

Proceedings of Text2Story - Sixth Workshop on Narrative Extraction From Texts held in conjunction with the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2, 2023

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
Text2Story@ECIR

Abstract

2023

Text Mining and Visualization of Political Party Programs Using Keyword Extraction Methods: The Case of Portuguese Legislative Elections

Autores
Campos, R; Jatowt, A; Jorge, A;

Publicação
Information for a Better World: Normality, Virtuality, Physicality, Inclusivity - 18th International Conference, iConference 2023, Virtual Event, March 13-17, 2023, Proceedings, Part I

Abstract
Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) political party programs made available in the context of the legislative elections that took place in 30th of January 2022. Our contributions are two-fold. At the level of resources, we make available a curated dataset and a python notebook that systematizes the process of transforming text into quantitative data and into visual aspects. At the methodological level, we propose to extend the keyword extraction algorithm used in this study to extract the most relevant keywords, not only from individual political party programs, but also across the entire collection of documents. A further contribution is the case-study itself, which calls attention to the fact that such solutions may be of interest not only to common people, but also to journalists or politicians alike. Broadly, we demonstrate how the discussion and the analysis that stems from the results obtained may foster the political science research by making available large-scale processing of documents with marginal costs. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2023

The 6th International Workshop on Narrative Extraction from Texts: Text2Story 2023

Autores
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III

Abstract
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved challenges and the interest in the community are at the basis of the sixth edition of Text2Story workshop to be held in Dublin on April 2nd, 2023 in conjunction with the 45th European Conference on Information Retrieval (ECIR'23). In its sixth edition, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction and generation task. Text2Story includes sessions devoted to full research papers, work-in-progress, demos and dissemination papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.