Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective

IEEE Trans Neural Netw Learn Syst. 2020 Sep;31(9):3456-3470. doi: 10.1109/TNNLS.2019.2944672. Epub 2019 Nov 1.

Abstract

The term "explainable AI" refers to the goal of producing artificially intelligent agents that are capable of providing explanations for their decisions. Some models (e.g., rule-based systems) are designed to be explainable, while others are less explicit "black boxes" for which their reasoning remains a mystery. One example of the latter is the neural network, and over the past few decades, researchers in the field of neural-symbolic integration (NSI) have sought to extract relational knowledge from such networks. Extraction from deep neural networks, however, has remained a challenge until recent years in which many methods of extracting distinct, salient features from input or hidden feature spaces of deep neural networks have been proposed. Furthermore, methods of identifying relationships between these features have also emerged. This article presents examples of old and new developments in extracting relational explanations in order to argue that the latter have analogies in the former and, as such, can be described in terms of long-established taxonomies and frameworks presented in early neural-symbolic literature. We also outline potential future research directions that come to light from this refreshed perspective.