The goal of these interpretability methods is to link the value of an output node of interest oc to the image X0 given as input to a network. They do so by back-propagating a signal from oc to X0: this process (backward pass) can be seen as the opposite operation than the one done when computing the output value from the input (forward pass).
Any property can be back-propagated as soon as its value at the level of a feature map l − 1 can be computed from its value in the feature map l. In this section, the back-propagated properties are gradients or the relevance of a node oc.
2.3.1. Gradient Back-Propagation
During network training, gradients corresponding to each layer are computed according to the loss to update the weights. Then, we can see these gradients as the difference needed at the layer level to improve the final result: by adding this difference to the weights, the probability of the true class y increases.
In the same way, the gradients can be computed at the image level to find how the input should vary to change the value of oc (see example on Fig. . This gradient computation was proposed by [10], in which the attribution map Sc corresponding to the input image X0 and the output node oc is computed according to the following equation:
Attribution map of an image found with gradients back-propagation. (Adapted from [10]. Permission to reuse was kindly granted by the authors)
Due to its simplicity, this method is the most commonly used to interpret deep learning networks. Its attribution map is often called a “saliency map”; however, this term is also used in some articles to talk about any attribution map, and this is why we chose to avoid this term in this chapter.
This method was modified to derive many similar methods based on gradient computation described in the following paragraphs.
The ReLU layer is a commonly used activation function that sets to 0 the negative input values and does not affect positive input values. The derivative of this function in layer l is the indicator function : it outputs 1 (resp. 0) where the feature maps computed during the forward pass were positive (resp. negative).
Springenberg et al. [12] proposed to back-propagate the signal differently. Instead of applying the indicator function of the feature map A(l) computed during the forward pass, they directly applied ReLU to the back-propagated values , which corresponds to multiplying it by the indicator function . This “backward deconvnet” method allows back-propagating only the positive gradients, and, according to the authors, it results in a reconstructed image showing the part of the input image that is most strongly activating this neuron.
The guided back-propagation method (Eq. 4) combines the standard back-propagation (Eq. 2) with the backward deconvnet (Eq. 3): when back-propagating gradients through ReLU layers, a value is set to 0 if the corresponding top gradients or bottom data is negative. This adds an additional guidance to the standard back-propagation by preventing backward flow of negative gradients.
Any back-propagation procedure can be “guided,” as it only concerns the way ReLU functions are managed during back-propagation (this is the case, e.g., for guided Grad-CAM).
While it was initially adopted by the community, this method showed severe defects as discussed later in Subheading 4.
The CAM corresponding to oc will be the mean of the channels of the feature map produced by the convolutions, weighted by the weights wkc learned in the fully connected layer
This map has the same size as Ak, which might be smaller than the input if the convolutional part performs downsampling operations (which is very often the case). Then, the map is upsampled to the size of the input to be overlaid on the input.
Selvaraju et al. [14] proposed an extension of CAM that can be applied to any architecture: Grad-CAM (illustrated on Fig. ). As in CAM, the attribution map is a linear combination of the channels of a feature map computed by a convolutional layer. But, in this case, the weights of each channel are computed using gradient back-propagation
Grad-CAM explanations highlighting two different objects in an image. (a) the original image, (b) the explanation based on the “dog” node, (c) the explanation based on the “cat” node. Ⓒ2017 IEEE. (Reprinted, with (more...)
The final map is then the linear combination of the feature maps weighted by the coefficients. A ReLU activation is then applied to the result to only keep the features that have a positive influence on class c
Similarly to CAM, this map is then upsampled to the input size.
Grad-CAM can be applied to any feature map produced by a convolution, but in practice the last convolutional layer is very often chosen. The authors argue that this layer is “the best compromise between high-level semantics and detailed spatial information” (the latter is lost in fully connected layers, as the feature maps are flattened).
Because of the upsampling step, CAM and Grad-CAM produce maps that are more human-friendly because they contain more connected zones, contrary to other attribution maps obtained with gradient back-propagation that can look very scattered. However, the smallest the feature maps Ak, the blurrier they are, leading to a possible loss of interpretability.
2.3.2. Relevance Back-Propagation
Instead of back-propagating gradients to the level of the input or of the last convolutional layer, Bach et al. [15] proposed to back-propagate the score obtained by a class c, which is called the relevance. This score corresponds to oc after some post-processing (e.g., softmax), as its value must be positive if class c was identified in the input. At the end of the back-propagation process, the goal is to find the relevance Ru of each feature u of the input (e.g., of each pixel of an image) such that .
In their paper, Bach et al. [15] take the example of a fully connected function defined by a matrix of weights w and a bias b at layer l + 1. The value of a node v in feature map A(l+1) is computed during the forward pass by the given formula:
During the back-propagation of the relevance, R(l)(u), the value of the relevance at the level of the layer l + 1 is computed according to the values of the relevance R(l+1)(v) which are distributed according to the weights w learned during the forward pass and the values of A(l)(v):
The main issue of the method comes from the fact that the denominator may become (close to) zero, leading to the explosion of the relevance back-propagated. Moreover, it was shown by [11] that when all activations are piece-wise linear (such as ReLU or leaky ReLU), the layer-wise relevance (LRP) method reproduces the output of gradient⊙input, questioning the usefulness of the method.
This is why Samek et al. [16] proposed two variants of the standard LRP method [15]. Moreover they describe the behavior of the back-propagation in other layers than the linear ones (the convolutional one following the same formula as the linear). They illustrated their method with a neural network trained on MNIST (see Fig. ). To simplify the equations in the following paragraphs, we now denote the weighted activations as zuv = A(l)(u)wuv.
LRP attribution maps explaining the decision of a neural network trained on MNIST. Ⓒ2017 IEEE. (Reprinted, with permission, from [16])
Though these two LRP variants improve the numerical stability of the procedure, they imply to choose the values of parameters that may change the patterns in the obtained attribution map.
The back-propagation from node v in at the level of R(l+1) to u at the level of R(l) can be written
This rule implies a root point which is close to A(l)(u) and meets a set of constraints depending on v.