Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I am an associate professor at the Department of Computer Science of the Faculty of Science of the University of Porto and the coordinator of LIAAD , the Artificial Intelligence and Decision Support Lab of UP. LIAAD is a unit of INESC TEC (Laboratório Associado) since 2007. I am a PhD in Computer Science by U. Porto, MSc. on Foundations of Advanced Information Technology by the Imperial Collegeand BSc. in Applied Maths and Computer Science, currently Computer Science (U. Porto). My research interests are Data Mining and Machine Learning, in particular association rules, web and text intelligence and data mining for decision support. My past research also includes Inductive Logic Programming and Collaborative Data Mining. I lecture courses related to programming, information processing, data mining, and other areas of computing. While at the Faculty of Economics, where I stayed from 1996 to 2009, I launched, with other colleagues, the MSc. on Data Analysis and Decisison Support Systems, which I coordinated from 2000 to April 2008. I lead research projects on data mining and web intelligence. I was the director of the Masters in Computer Science at DCC-FCUP from June 2010 to August 2013. I co-chaired international conferences (ECML/PKD 2015, Discovery Science 2009, ECML/PKDD 05 and EPIA 01), workshops and seminars in data mining and artificial intelligence. I was Vice-President of APPIA the Portuguese Association for Artificial Intelligence.

Interest
Topics
Details

Details

  • Name

    Alípio Jorge
  • Role

    Centre Coordinator
  • Since

    01st January 2008
024
Publications

2025

Preface

Authors
Campos, R; Jorge, M; Jatowt, A; Bhatia, S; Litvak, M;

Publication
CEUR Workshop Proceedings

Abstract
[No abstract available]

2025

The 8th International Workshop on Narrative Extraction from Texts: Text2Story 2025

Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V

Abstract
For seven years, the Text2Story Workshop series has fostered a vibrant community dedicated to understanding narrative structure in text, resulting in significant contributions to the field and developing a shared understanding of the challenges in this domain. While traditional methods have yielded valuable insights, the advent of Transformers and LLMs have ignited a new wave of interest in narrative understanding. The previous iteration of the workshop also witnessed a surge in LLM-based approaches, demonstrating the community’s growing recognition of their potential. In this eighth edition we propose to go deeper into the role of LLMs in narrative understanding. While LLMs have revolutionized the field of NLP and are the go-to tools for any NLP task, the ability to capture, represent and analyze contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. Consequently, this iteration of the workshop will explore the issues involved in using LLMs to unravel narrative structures, while also examining the characteristics of narratives generated by LLMs. By fostering dialogue on these emerging areas, we aim to continue the workshop's tradition of driving innovation in narrative understanding research. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with one keynote talk. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Enhancing Portuguese Variety Identification with Cross-Domain Approaches

Authors
Sousa, HO; Almeida, R; Silvano, P; Cantante, I; Campos, R; Jorge, AM;

Publication
AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA

Abstract
Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation of European Portuguese resources, we developed a cross-domain language variety identifier (LVI) to discriminate between European and Brazilian Portuguese. Motivated by the findings of our literature review, we compiled the PtBrVarId corpus, a cross-domain LVI dataset, and study the effectiveness of transformer-based LVI classifiers for cross-domain scenarios. Although this research focuses on two Portuguese varieties, our contribution can be extended to other varieties and languages. We open source the code, corpus, and models to foster further research in this task. © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

2025

Tradutor: Building a Variety Specific Translation Model

Authors
Sousa, HO; Almasian, S; Campos, R; Jorge, AM;

Publication
AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA

Abstract
Language models have become foundational to many widely used systems. However, these seemingly advantageous models are double-edged swords. While they excel in tasks related to resource-rich languages like English, they often lose the fine nuances of language forms, dialects, and varieties that are inherent to languages spoken in multiple regions of the world. Languages like European Portuguese are neglected in favor of their more popular counterpart, Brazilian Portuguese, leading to suboptimal performance in various linguistic tasks. To address this gap, we introduce the first open-source translation model specifically tailored for European Portuguese, along with a novel dataset specifically designed for this task. Results from automatic evaluations on two benchmark datasets demonstrate that our best model surpasses existing open-source translation systems for Portuguese and approaches the performance of industry-leading closed-source systems for European Portuguese. By making our dataset, models, and code publicly available, we aim to support and encourage further research, fostering advancements in the representation of underrepresented language varieties. © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

2024

Pre-trained language models: What do they know?

Authors
Guimaraes, N; Campos, R; Jorge, A;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.This article is categorized under:Fundamental Concepts of Data and Knowledge > Key Design Issues in DataMiningTechnologies > Artificial Intelligence

Supervised
thesis

2023

Digital technology and the social monitoring of climate change

Author
Ana Sofia Cabral Cardoso

Institution
UP-FCUP

2023

Building Portuguese Language Resources for Natural Language Processing Tasks

Author
Rúben Filipe Seabra de Almeida

Institution
UP-FCUP

2023

Heart Sound Analysis for Cardiovascular Diseases Identification

Author
Diogo Marcelo Esterlita Nogueira

Institution
UP-FCUP

2023

Product Complaint Understanding using NLP Techniques

Author
Beatriz Marques Arcipreste

Institution
UP-FCUP

2023

Unfolding the Temporal Structure of Narratives

Author
Hugo Miguel Oliveira de Sousa

Institution
UP-FCUP