INESC TEC PhD Defenses
PhD Candidate full name:
Isabel Cristina Rio-Torto de Oliveira
Dissertation Title: Weakly Supervised Multimodal Explanations for Medical Image Classification
Date: 05/05/2026 14:00
Location: UP | FCUP | DCC | Auditório do Departamento de Ciência de Computadores
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Computer Science (FCUP)
Abstract or Public Summary: This thesis addresses the limited interpretability of deep learning models in medical image analysis by moving beyond traditional post-hoc visual explanations toward more faithful and adaptable approaches. It proposes weakly supervised methods that reduce annotation burden while enabling multiple explanation modalities, including visual, concept-based, and natural language explanations tailored to different users. The results demonstrate that combining weak supervision with multimodal frameworks can produce accurate, interpretable, and clinically relevant explanations without costly annotation pipelines.
Research Centre: CTM
Principal Supervisor at INESC TEC: Luís F. Teixeira
Additional Supervisor: Jaime S. Cardoso (INESC TEC)
Scientific Domain: [Artificial Intelligence]; [Computer Science and Engineering]
Keywords: Explainable Artificial Intelligence; Multimodal Explanations; Weakly Supervised Learning; Large Vision-Language Models; Medical Image Classification
PhD Candidate full name:
João Miguel Ramos Chaves Fernandes
Dissertation Title: Energy-efficient Scheduling for Sustainable Production and Transportation in Flexible Manufacturing Systems
Date: 16/04/2026 09:30
Location: University of Porto | FEUP | Building A | Sala de atos - A104
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Industrial Engineering and Management (FEUP)
Abstract or Public Summary: The steady growth in energy demand and environmental awareness has placed increasing pressure on manufacturing industries to improve energy efficiency and reduce emissions while coping with volatile electricity costs and stricter environmental regulations. In this context, energy-efficient scheduling has emerged as a cost-effective strategy to achieve sustainability goals without requiring significant investment in new equipment or product design. This has propelled a surge in research on sustainable manufacturing, with a growing number of works focusing on energy-efficient operation scheduling. This thesis addresses the Energy-Efficient Job Shop Scheduling Problem (EEJSP) and its extensions, focusing on decisions that minimize makespan and energy consumption in bi-objective problems. This research explores multiple energy-saving strategies – namely variable machine and vehicle speed settings – as well as transportation scheduling with limited resources. Two main goals are pursued. First, the thesis seeks to comprehensively analyze and systematize EEJSP literature, clarifying modelling assumptions, performance measures, and energy-saving strategies and additionally identifying gaps related to the integration of transport. Second, it aims to develop appropriate models and algorithms for relevant EEJSP extensions, namely the EEJSP with limited transportation resources (EEJSPT), which better reflects real-world production settings where optimized scheduling can reduce emissions and energy costs. Throughout the development of this work, two peer-reviewed journal articles were published, each presented as an individual chapter in this thesis. The first provides a systematic literature review of EEJSP articles, identifying research gaps and highlighting the limited integration of transport and speed-related decisions. The second article introduces both an exact and a heuristic approach to jointly optimize production and transport operations in EEJSPT. The exact approach employs a bi-objective mixed-integer linear programming (MILP) model that, in combination with a lexicographic and $\epsilon$-contraint method approximates the true Pareto front. The heuristic approach presents a novel multi-population biased random-key genetic algorithm to approximate Pareto-optimal solutions. Computational experiments across multiple test instances demonstrate that coordinated speed control across machines and vehicles can significantly reduce energy consumption with limited impact on makespan, with the test instances having been made publicly available for future research. Overall, the thesis advances the study of energy-efficient scheduling by providing tools and insights to support sustainable manufacturing decisions through a comprehensive review of EEJSP state of the art, proposing a new and more realistic extension to this problem, and providing a novel MILP problem formulation, a new multi-population genetic algorithm, and benchmark instances. Keywords: Job shop scheduling, Energy efficiency, Sustainable manufacturing, Optimization, Multi-objective scheduling, MILP, NSGA-II, BRKGA.
Research Centre: LIAAD
Principal Supervisor at INESC TEC: Dalila Benedita Machado Martins Fontes
Additional Supervisor: Seyed Mahdi Homayoun (Contructor University Bremen)
Scientific Domain: [Computer Science and Engineering]; [Power and Energy Systems]; [Systems Engineering and Management]
Keywords: Multi-objective Job shop scheduling, Energy efficiency, Sustainable manufacturing, Optimization, MILP, BRKGA
PhD Candidate full name:
Mariana Silva Sousa
Dissertation Title: Integrating consumer behavior into the management of perishable products
Date: 18/03/2026 14:00
Location: UP | FEUP | DEGI | L202A
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Industrial Engineering and Management (FEUP)
Abstract or Public Summary: Perishable food products are at the core of grocery retail, yet their limited shelf life and progressive loss of freshness create substantial operational challenges. As products approach the end of their shelf life, consumers’ willingness to pay for them declines sharply, increasing the risk of revenue loss and waste. Despite this, many assortment and markdown decisions still rely on demand models that treat price as the primary driver of choice. In practice, however, consumers evaluate price, freshness, and markdown signals jointly, and they frequently substitute across closely related alternatives within a category. This dissertation demonstrates that incorporating consumer behavior is crucial for understanding and optimizing outcomes in perishable categories. It develops an integrated analytical approach that links consumers’ valuation of freshness to retailers’ assortment and markdown decisions. Using transaction-level data from a large European grocery retailer, the dissertation combines empirical consumer choice modeling with optimization and decision-support methods. The empirical analysis highlights two behaviorally decisive mechanisms. First, freshness valuation is highly nonlinear, with demand declining disproportionately as products near expiration. Second, explicit low remaining shelf life labels introduce an adverse signaling effect that depresses demand beyond what would be predicted by freshness alone. Together, these mechanisms influence both purchase timing and substitution patterns, intensifying cannibalization within product families. These insights are embedded into two decision support tools designed for retail practice. The first is a simulation-based assortment optimization model that integrates freshness-dependent demand with inventory dynamics, enabling an explicit characterization of trade-offs among profitability, cannibalization, and waste. The second is a dynamic markdown model that formulates markdown decisions as a sequential optimization problem solved via reinforcement learning, enabling coordinated decisions across time, products, and freshness states. Overall, this dissertation provides a unified and empirically grounded link between consumer choice and retail operations. The results demonstrate that accounting for freshness-dependent willingness to pay, substitution effects, and promotional signaling can significantly alter assortment and markdown policies, thereby improving profitability while reducing food waste.
Research Centre: SYSTEM
Principal Supervisor at INESC TEC: Pedro Amorim
Additional Supervisor: Maria João Santos (INESC TEC); Sara Martins (INESC TEC)
Scientific Domain: [Artificial Intelligence]; [Operations Research]
Keywords: Retail operations; Perishable products; Substitution effects; Assortment optimization; Dynamic markdown
PhD Candidate full name:
André Nuno de Pinho Tavares Gurgo e Cirne
Dissertation Title: Identity and Trust based on Hardware for IoT
Date: 11/03/2026 14:30
Location: UP | FCUP | DCC | FC6 0.29
Higher Education Institution: University of Porto
Doctoral Programme - Doctoral Program in Computer Science (FCUP)
Abstract or Public Summary: As the number of Internet of Things (IoT) devices continues to grow exponentially, the question of how to establish and maintain trust among billions of interconnected objects has become a central challenge in modern cybersecurity. At the core of this challenge lies the notion of device identity: without a secure and verifiable identity, it is nearly impossible to determine whether a device can be trusted. Yet, implementing robust identity mechanisms in IoT environments is far from trivial. These devices are typically constrained by limited computational and energy resources, and are often deployed in remote, uncontrolled, or physically exposed locations, conditions that make them especially vulnerable to both software and hardware-based attacks. This thesis explores the fundamental relationship between identity, trust, and hardware in the context of IoT. While hardware is frequently regarded as a bottleneck that limits the feasibility of strong security solutions, our work investigates the opposite perspective: can hardware itself become the enabler of secure and efficient identity mechanisms? To address this question, we begin by examining the current landscape of hardware based identity systems and identifying the main challenges that arise from hardware security limitations. We then conduct an experimental case study on a real device to illustrate how its identity can be compromised in practice and to evaluate potential mitigation strategies. Building upon these insights, we propose a novel runtime attestation frame work that leverages a recently introduced hardware feature, Pointer Authentication and Branch Target Identification ( PACBTI). In parallel, we analyze the security implications and practical trade-offs introduced by this feature. Our findings reveal that several existing hardware-based technologies, such as Trusted Execution Environment ( TEE) and PACBTI, are already present in many commodity devices and can be effectively leveraged to overcome the resource constraints that traditionally hinder the deployment of secure identity and trust mechanisms in IoT. However, despite this promising potential, a number of architectural, usability, and standardization barriers continue to limit their widespread adoption. By highlighting these challenges and opportunities, this work contributes to a deeper understanding of how hardware can evolve from being a limitation to becoming a cornerstone of secure identity in the future of IoT.
Research Centre: CRACS
Principal Supervisor at INESC TEC: João Resende
Additional Supervisor: Patrícia Raquel Vieira Sousa (INESCTEC - research collaborator); Luís Antunes (University of Porto)
Scientific Domain: [Computer Science and Engineering]
Keywords: identity; trust; hardware-security; IoT
PhD Candidate full name:
Marcella Luiza Santos Mendes
Dissertation Title: From Micro-Mechanisms to Efficiency: Research & Development and Innovation in a Related Diversification Context
Date: 06/03/2026 10:30
Location: University of Porto | FEUP | Building A | Sala de Atos
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Industrial Engineering and Management (FEUP)
Abstract or Public Summary: This thesis is motivated by the increasing importance and challenges of managing and monitoring Research & Development and Innovation (RDI) in related-diversified organizations, where the complexity of internal RDI collaborative projects demands tailored approaches to coordination and integration. The research focuses on understanding how collaboration dynamics unfold within related-diversified contexts, conceptualized through micro-mechanisms of decision-making, leadership, and actor mobilization, and the influence of contextual attributes such as relatedness, complementarity, complexity, and decomposability. The main goal of the thesis is to explore how RDI operates in related-diversified contexts, focusing on identifying effective approaches for its management and measurement. The research work is divided into three interconnected parts. The first study provides an in-depth analysis of internal collaboration in RDI projects, focusing on how micro-mechanisms operate across project phases (inception, development, and conclusion) and how the direction of the idea (top-down or bottom-up) shapes collaboration. The study highlights the critical role of the corporate level in moderating and facilitating collaboration, exploring how cross-unit collaboration paths influence the outputs of RDI. In the second study, the focus shifts to RDI collaboration networks, particularly examining how the organization's RDI network reflects the architecture of individual projects, as suggested by the mirroring hypothesis. While most projects mirrored the broader organizational network, strategic exceptions demonstrated the flexibility of collaboration approaches in achieving specific goals. This study provides insights on how cross-unit collaboration align with RDI project interdependencies. The third study evaluates the innovation efficiency of BUs within the organization using a 2-step Data Envelopment Analysis (DEA), performing also a stepwise regression. The findings reveal how contextual factors influence innovation outcomes, with complexity and relatedness emerging as drivers of efficiency, while excessive decomposability negatively affects performance due to fragmentation and coordination costs. This thesis contributes to the literature on internal collaboration dynamics, RDI networks, and innovation management by advancing the theoretical understanding of collaboration dynamics, refining the mirroring hypothesis, and extending assumptions about modularity and decomposition. It also provides practical insights for managers, emphasizing the importance of balancing centralized and decentralized strategies, tailoring collaboration practices to contextual attributes, and fostering synergy across BUs to optimize RDI outcomes.
Research Centre: CITE
Principal Supervisor at INESC TEC: João Claro
Additional Supervisor: Cipriano Lomba (EFACEC)
Scientific Domain: [Systems Engineering and Management]
Keywords: Related diversification; Innovation management; Research & Development and Innovation
PhD Candidate full name:
José Carlos Miranda Nova Arnaud
Dissertation Title: Relationship between Digital Transformation and Digital Literacy of Local Public Administration employees – An Explanatory Model
Date: 2026-03-05 15:30
Location: UTAD | Escola de Ciências e Tecnologia | Auditório B0.01
Higher Education Institution: UTAD - Universidade de Trás-os-Montes and Alto Douro
Doctoral Programme: Doctoral Program Web Science and Technology
Abstract or Public Summary: Digital transformation has been one of the main trends in organizations in recent years, and digital literacy is a critical factor for the success of this transformation. Digital transformation involves using digital technologies to improve an organization's processes, products, and services. For this transformation to be successful, it is necessary that employees have knowledge and skills in digital technologies. Digital literacy is not a utopia nor something that we should neglect, and it is indisputable how much technology is part of our lives. As such, ignoring it, as well as the tools and services it provides us, which greatly facilitate human experience, is simply a mistake. Digital literacy allows employees to understand technologies and their applications, know how to use them efficiently and safely, be able to evaluate and select the most appropriate digital tools for each task and be prepared to deal with problems and challenges that arise in the digital environment. Thus, this study is relevant because it seeks to understand how digital literacy can impact digital transformation in organizations, and through the construction of an explanatory model, allows the identification of variables that influence this relationship by developing strategies to improve the digital literacy of employees in organizations. Through the theoretical framework, we will seek to apply a model that improves the digital literacy of local public administration employees. Recognizing the importance of digital literacy, largely due to the digital transformation that our country is going through, it will be necessary to have technological skills in order to overcome some of its limitations. Information and Communication Technologies are seen in this environment as a factor that could contribute, on a large scale, to the inclusion of individuals with digital literacy deficits, both in local public administration and in society in general. Through the Design Science Research methodology, complemented by the application of a survey carried out among employees of the local public administration and after characterizing them in relation to new technologies, a model will be developed that makes it possible to improve the digital literacy of employees of the local public administration, contributing, thus, for the success of digital transformation. As a result of the research work, we will be able to demonstrate, and following the Design Science Research methodology, the improvement of the digital literacy of local public administration employees.
Research Centre: HumanISE
Principal Supervisor at INESC TEC: Henrique São Mamede
Keywords: Digital Literacy; Digital Transformation; Explanatory Model
PhD Candidate full name:
Bruno Georgevich Ferreira
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Informatic Engineering (FEUP)
Dissertation Title: Modular and Multi-Stage Semantic Perception System for Robotics
Date: 27/02/2026 14:00
Location: Porto (UP) | Engineering (FEUP) | Administration Building | Room 104 - Sala de Atos
Abstract or Public Summary: Modular and Multi-Stage Semantic Perception System for Robotics The evolution of autonomous robotics benefits largely from the capacity to construct rich, naviga- ble, and semantic representations of the environment, even more so if shared with humans. While the advent of open-vocabulary scene graphs powered by Vision-Language Models (VLMs) has revolutionized perception, these systems face critical hurdles: high rates of hallucinations (False Positives), a lack of topological spatial context, and operational fragility due to heavy reliance on cloud connectivity. This thesis proposes the Hybrid Inference Perception and Mapping Sys- tem (HIPaMS), framework adaptable to a target system, likely a robotic system that interacts with humans. The HIPaMS is a modular framework designed to bridge the gap between low-level per- ception and high-level agentic reasoning. A Proof of Concept (PoC) was designed to implement the HIPaMS. This PoC enhances the state-of-the-art ConceptGraphs semantic mapping process and introduces a refined interaction system through four main contributions. First, it introduces the Hybrid Adaptable Resource-Aware Inference Mechanism (HARAIM), which dynamically or- chestrates internal models and settings based on runtime resource availability and optimization policies. This mechanism allows any optimization policy to adapt robotic system’s operation, pos- sibly allowing zero downtime during network failures, graceful degradation and/or operational ef- ficiency. Second, the semantic mapping pipeline is enhanced with rigorous False Positive filtering protocols, persona-based prompt engineering, and a broad collection of semantic information in an optimized manner during mapping. Third, a Room Semantic Segmentation Routine is proposed to provide topological information to the semantic map during interaction. This transforms un- structured, noisy detections into a hierarchically organized scene graph, anchoring objects within functional topological regions. Fourth, the robotic system now incorporates dynamic knowledge base via the Human-in-the-Loop (HITL) Agentic Retrieval-Augmented Generation (RAG)-based Interaction System (HARBIS). This interface uses short- and long-term memory to understand complex natural language queries. It enables the robot to learn continuously from user interac- tions, address gaps in perception and knowledge, maintain temporal consistency, and acknowledge its limitations by proactively asking for clarification. Extensive validation was conducted across 30 diverse environments, involving a total of 3300 interactive requests (depend on semantic map quality). The tested PoC processed 110 user requests per environment, categorized into: direct (30), indirect (30), graceful failure (30), follow-up (10) and time consistency (10). An ablation study was also performed to identify the impact of specific framework and PoC components. The results show that the PoC reduces False Positive detections by ≈ 86%, elevating mapping precision from a baseline of ≈ 0.28 to ≈ 0.68. Although strict filtering reduces raw recall, the integration of HITL learning increased the success rate for complex query resolution to ≈ 0.81, compared to baseline values of ≈ 0.48 and ≈ 0.55. Furthermore, the HIPaMS PoC reduced cloud inference costs by up to ≈ 84% in mapping and over ≈ 95% in interaction tasks while ensuring system sta- bility. The presented framework paves the way for increased robotic autonomy and efficiency. The presented PoC demonstrates superior performance, particularly for human-centered scenarios.
Research Centre: CRIIS
Principal Supervisor at INESC TEC: Armando Jorge Miranda de Sousa
Scientific Domain: [Artificial Intelligence] + [Robotics]
Keywords: Semantic Mapping, Open-Vocabulary Perception, Hybrid Inference Architecture, Adaptable Framework, Human-in-the-Loop, Retrieval-Augmented Generation (RAG), Topolog- ical Segmentation, Robot@VirtualHome, Vision-Language Models, Agentic AI, Operational Robustness.
PhD Candidate full name:
Hugo Miguel Oliveira de Sousa
Dissertation Title: Unfolding the Temporal Structure of Narratives
Date: 2026-02-25 14:30
Location: UP | FCUP | FC5 278
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Computer Science (FCUP)
Abstract or Public Summary: When reading a story or a news article, humans can understand the chronological order of mentioned events even when such information is vaguely defined. This is a fundamental skill for the comprehension of a narrative. For instance, from the sentence "Bob sent a message to Alice while she was leaving her birthday party.} we comprehend that the occurrence of the event \enquote{sent} is included in the time span of the event “leaving", despite not being explicitly stated on the text. This PhD thesis addresses the task of temporal information extraction, tackling both core challenges and practical applications across multiple domains and languages. We structure the problem into two main components: temporal entity identification and temporal relation classification. For temporal entity identification, we explore methods across diverse settings. We develop a biomedical entity identification pipeline for Portuguese oncology health records, combining neural models with entity linking. We also study the use of large language models to extract narrative entities from Portuguese news articles via prompt engineering, showing that their effectiveness can be comparable with methods that were fine-tuned for the task. Additionally, we introduce TEI2GO, a suite of multilingual models for temporal expression extraction that achieves state-of-the-art results in four of the six languages evaluated. For temporal relation classification, we propose decomposing interval relations into point relations between entity endpoints. This method reaches a temporal awareness score of 70.1% on the TempEval-3 dataset, establishing a new state of the art on this benchmark. Building on this insight, we introduce a novel formulation of the task which recasts relation classification as a sequential decision-making problem. This perspective enables the application of reinforcement learning algorithms to learn temporal reasoning from experience. All research was conducted on top of tieval, a Python library we developed and open-sourced to support the research community. This framework standardizes temporal information extraction evaluation across multiple corpora and provides domain-specific tools such as temporal closure and temporal awareness metrics. The contributions of this thesis range from practical advances in healthcare and multilingual systems to methodological innovations in temporal entity identification and temporal relation classification. Together, they advance the state of the art and broaden the foundations of temporal information extraction.
Research Centre: LIAAD
Principal Supervisor at INESC TEC: Alípio Jorge
Additional Supervisor: Ricardo Campos (UBI)
Scientific Domain: [Artificial Intelligence]
Keywords: Temporal information extraction, temporal entity identification, temporal relation classification
PhD Candidate full name:
Artur José Vilares Cordeiro
Higher Education Institution: UTAD - Universidade de Trás-os-Montes and Alto Douro
Doctoral Programme: Doctoral Program in Electrical and Computer Engineering
Dissertation Title: Configurable Perception Pipeline for Bin-picking in Industrial Scenarios
Date: 2026-02-12 10:30:00
Location: UTAD - Universidade de Trás-os-Montes and Alto Douro | School of Sciences and Technology | Library | Room B- 1.04
Abstract or Public Summary: In today’s industry environment, picking systems are crucial in transforming internal logistic operations by enhancing automation, efficiency, and accuracy. Despite significant advancements, current perception approaches for picking system often fall short in dynamic and uncontrolled environments due to their reliance on static assumptions, leading to decreased performance. This thesis develops hybrid perception systems for picking operations by integrating artificial intelligence with traditional methodologies, aiming to create configurable systems adaptable to diverse dynamic environments. A novel data generation technique and a modular perception system were developed to address complex bin-picking challenges in both controlled and uncontrolled settings, incorporating new objects using only their 3D models and enabling the use of varied techniques for dynamic problem-solving. The system achieved a 91.81% average picking success rate in dynamic, cluttered, and unstructured environments, while also generating synthetic labeled data 440 times faster than manual real-data collection with comparable quality. This work provides a robust, modular perception and data generation solution, validated across industrial environments, offering a versatile tool for addressing challenging bin-picking tasks in diverse scenarios.
Research Centre: CRIIS
Principal Supervisor at INESC TEC: João Pedro Carvalho de Souza
Scientific Domain: [Computer Science and Engineering] + [Robotics]
Keywords: Perception systems; Robotic picking; Computer Vision; Machine learning
PhD Candidate full name:
Bianca Andreea Banica
Dissertation Title: Citizen Engagement Perspectives within Local Energy Transitions
Date: 23/01/2026 10:30
Location: University of Porto | FEUP | DMEC | Sala de Atos
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Industrial Engineering and Management (FEUP)
Abstract or Public Summary: Citizen engagement is increasingly recognized as a cornerstone of sustainability transitions, particularly in advancing local energy transitions. This thesis investigates underexplored perspectives on citizen engagement by integrating both individual and organization-level analyses.
The first part adopts the perspective of citizens at the individual level, examining how different perceived values of energy solutions (utilitarian, social, and environmental) influence distinct engagement behaviors such as information seeking, proactive management, feedback sharing, helping, and advocating. It also addresses energy vulnerability by segmenting citizens based on energy poverty conditions and assessing variations in engagement levels across these groups.
The second part shifts focus to the organizational perspective, exploring the point of view of engagers, actors who design and facilitate engagement processes. It includes a case study of transition intermediaries as agents of change working to counteract citizen disengagement by improving access to electricity in isolated communities. Furthermore, through a multi-case study grounded in Social Practice Theory, the last part of this thesis project investigates how digital technologies are embedded within engagement practices and how the socio-digital interplay contributes to the construction, reproduction, and long-term maintenance of these practices. Overall, the project advances the understanding of citizen engagement by bridging insights from service research, transition studies, and the social sciences. It offers theoretical and empirical contributions that support the development of more inclusive, adaptive, and context-sensitive engagement strategies, while informing the design of tailored energy solutions that reflect diverse citizen needs.
Research Centre: SYSTEM
Principal Supervisor at INESC TEC: Lia Patrício
Scientific Domain: [Systems Engineering and Management]
Keywords: citizen engagement; energy transition; engagement behaviors; energy poverty; transition intermediaries; agency.
PhD Candidate full name:
Serkan Sulun
Dissertation Title: Video-based Music Generation
Date: 08/10/2025 14:00
Location: University of Porto | FEUP | DEEC | I-105
Higher Education Institution: University of Porto
Doctoral Programme: Doctoral Program in Electrical and Computer Engineering (FEUP)
Abstract or Public Summary: As the volume of video content on the internet grows rapidly, finding a suitable soundtrack remains a significant challenge. This thesis presents EMSYNC (EMotion and SYNChronization), a fast, free, and automatic solution that generates music tailored to the input video, enabling content creators to enhance their productions without composing or licensing music, streamlining creativity and production. Our model creates music that is emotionally and rhythmically synchronized with the video, offering an adaptive and expressive solution for automatic soundtrack generation. A core component of EMSYNC is a novel video emotion classifier. To achieve accurate and efficient video classification, we intelligently fuse pretrained models. We additionally address the data-centric challenges in video classification through cinematic trailer genre classification experiments using a large-scale dataset. By leveraging pretrained deep neural networks for feature extraction and keeping them frozen while training only fusion layers, we reduce computational complexity while improving accuracy. We show the generalization abilities of our method by obtaining state-of-the-art results on Ekman-6 and MovieNet, the largest video datasets for emotion and cinematic genre classification, respectively. Another key contribution is a large-scale, emotion-labeled MIDI dataset for affective music generation. Using annotations from online resources, we build the largest MIDI dataset with valence-arousal labels. We additionally analyze the emotional content of song lyrics within the MIDI files. We then present an emotion-based MIDI generator, the first to condition on continuous emotional values rather than discrete categories, enabling nuanced music generation aligned with complex emotional content. To enhance temporal synchronization, we introduce a novel temporal boundary conditioning method, called "boundary offset encodings," aligning musical chords with scene changes. Integrated into EMSYNC, this method ensures music naturally follows the video's pacing and rhythm, improving the overall user experience. We also explore audio synthesis, focusing on audio bandwidth enhancement due to the scarcity of paired MIDI-audio data. We present a proof-of-concept to highlight and address the challenges in audio synthesis, emphasizing generalization. For the first time, we identify the problem of "filter overfitting," where models trained on specific low-pass filters fail to generalize to real-world scenarios. To address this, we propose a data augmentation strategy that outperforms standard regularization methods, marking the first step toward developing robust audio enhancement models for real-world use. Combining video emotion classification, emotion-based music generation, and temporal boundary conditioning, EMSYNC emerges as a fully automatic video-based music generator. User studies show that it consistently outperforms existing methods in terms of music richness, emotional alignment, temporal synchronization, and overall preference. As a result, EMSYNC sets a new state-of-the-art in video-based music generation, creating music that is both emotionally and rhythmically aligned with the video.
Research Centre: CTM
Principal Supervisor at INESC TEC: Paula Viana
Additional Supervisor: Matthew E. P. Davies
Scientific Domain: [Artificial Intelligence]; [Computer Science and Engineering]
Keywords: deep neural networks; midi generation; video analysis; transformers; multimodal fusion; affective computing
