2023
Autores
Cunha, L; Soares, C; Restivo, A; Teixeira, LF;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023
Abstract
Concerns with the interpretability of ML models are growing as the technology is used in increasingly sensitive domains (e.g., health and public administration). Synthetic data can be used to understand models better, for instance, if the examples are generated close to the frontier between classes. However, data augmentation techniques, such as Generative Adversarial Networks (GAN), have been mostly used to generate training data that leads to better models. We propose a variation of GANs that, given a model, generates realistic data that is classified with low confidence by a given classifier. The generated examples can be used in order to gain insights on the frontier between classes. We empirically evaluate our approach on two well-known image classification benchmark datasets, MNIST and Fashion MNIST. Results show that the approach is able to generate images that are closer to the frontier when compared to the original ones, but still realistic. Manual inspection confirms that some of those images are confusing even for humans.
2023
Autores
Teixeira, S; Veloso, B; Rodrigues, JC; Gama, J;
Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I
Abstract
The growing use of data-driven decision systems based on Artificial Intelligence (AI) by governments, companies and social organizations has given more attention to the challenges they pose to society. Over the last few years, news about discrimination appeared on social media, and privacy, among others, highlighted their vulnerabilities. Despite all the research around these issues, the definition of concepts inherent to the risks and/or vulnerabilities of data-driven decision systems is not consensual. Categorizing the dangers and vulnerabilities of data-driven decision systems will facilitate ethics by design, ethics in design and ethics for designers to contribute to responsibleAI. Themain goal of thiswork is to understand which types of AI risks/ vulnerabilities are Ethical and/or Technological and the differences between human vs machine classification. We analyze two types of problems: (i) the risks/ vulnerabilities classification task by humans; and (ii) the risks/vulnerabilities classification task by machines. To carry out the analysis, we applied a survey to perform human classification and the BERT algorithm in machine classification. The results show that even with different levels of detail, the classification of vulnerabilities is in agreement in most cases.
2023
Autores
Cao, LB; Chen, H; Fan, XH; Gama, J; Ong, YS; Kumar, V;
Publicação
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023
Abstract
Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FLbased BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.
2023
Autores
Ukil, A; Gama, J; Jara, AJ; Marin, L;
Publicação
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023
Abstract
The management of knowledge-driven artificial intelligence technologies is essential in order to evaluate their impact on human life and society. Social networks and tech use can have a negative impact on us physically, emotionally, socially and mentally. On the other hand, intelligent systems can have a positive effect on people's lives. Currently, we are witnessing the power of large language models (LLMs) like chatGPT and its influence towards the society. The objective of the workshop is to contribute to the advancement of intelligent technologies designed to address the human condition. This could include precise and personalized medicine, better care for elderly people, reducing private data leaks, using AI to manage resources better, using AI to predict risks, augmenting human capabilities, and more. The workshop's objective is to present research findings and perspectives that demonstrate how knowledge-enabled technologies and applications improve human well-being. This workshop indeed focuses on the impacts at different granularity levels made by Artificial Intelligence (AI) research on the micro granular level, where the daily or regular functioning of human life is affected, and also the macro granulate level, where the long-term or far-future effects of artificial intelligence on people's lives and the human society could be pretty high. In conclusion, this workshop explores how AI research can potentially address the most pressing challenges facing modern societies, and how knowledge management can potentially contribute to these solutions.
2023
Autores
Lu, J; Gama, J; Yao, X; Minku, L;
Publicação
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Abstract
In recent years, learning from streaming data, commonly known as stream learning, has enjoyed tremendous growth and shown a wealth of development at both the conceptual and application levels. Stream learning is highly visible in both the machine learning and data science fields and has become a hot new direction in research. Advancements in stream learning include learning with concept drift detection, that includes whether a drift has occurred; understanding where, when, and how a drift occurs; adaptation by actively or passively updating models; and online learning, active learning, incremental learning, and reinforcement learning in data streaming situations.
2023
Autores
Mamede, RM; Paiva, N; Gama, J;
Publicação
DS
Abstract
Machine Learning has been overtaken by a growing necessity to explain and understand decisions made by trained models as regulation and consumer awareness have increased. Alongside understanding the inner workings of a model comes the task of verifying how adequately we can model a problem with the learned functions. Traditional global assessment functions lack the granularity required to understand local differences in performance in different regions of the feature space, where the model can have problems adapting. Residual Analysis adds a layer of model understanding by interpreting prediction residuals in an exploratory manner. However, this task can be unfeasible for high-dimensionality datasets through hypotheses and visualizations alone. In this work, we use weak interpretable learners to identify regions of high prediction error in the feature space. We achieve this by examining the absolute residuals of predictions made by trained regressors. This methodology retains the interpretability of the identified regions. It allows practitioners to have tools to formulate hypotheses surrounding model failure on particular regions for future model tunning, data collection, or data augmentation on critical cohorts of data. We present a way of including information on different levels of model uncertainty in the feature space through the use of locally fitted Model Agnostic Prediction Intervals (MAPIE) in the identified regions, comparing this approach with other common forms of conformal predictions which do not take into account findings from weak segment identification, by assessing local and global coverage of the prediction intervals. To demonstrate the practical application of our approach, we present a real-world industry use case in the context of inbound retention call-centre operations for a Telecom Provider to determine optimal pairing between a customer and an available assistant through the prediction of contracted revenue.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.