Extracting Concept-based Explanations of Neural Networks via Vision-Language Models

June 10, 2025 13:00 - 14:00FINISHED

Asistencia presencial: Aula E1-1-01, Campus de Córdoba (Calle Escritor Aguayo, 4).

Abstract: AI systems are increasingly integrated into the daily lives of millions of users who either directly interact with them or unknowingly have their data processed by them. Despite their widespread application, many large AI models remain as black boxes, with users having little understanding of their internal decision-making processes. This opacity raises concerns about whether their choices are based on fair reasoning or influenced by biases. Understanding the complexities of these models is the primary goal of the research area of Explainable AI. One promising approach is to develop algorithms capable of extracting logical explanations from AI models. These explanations must both accurately represent the model’s behavior and be understandable to humans. In this talk, we discuss a handful of approaches we are currently working on, within the scope of vision models. We report experimental results and provide a series of sanity checks and validations of our artifacts’ faithfulness. Finally, we outline guidelines for leveraging our artifacts in tasks like bias detection, generation of adversarial examples, and model repair.

Link al seminario: https://loyola.webex.com/meet/rede3c

Keywords: Explainable AI, Interpretability, Trustworthiness, Vision Models