Are you interested in up-leveling your nightmares? If so, I have some good news!1 You’ll enjoy today’s post 🥳 We’ll learn how to enhance what artificial neural networks “want to see” in images.
1 I also have some bad news: You should seek professional medical help 🏥.
Google’s Deep Dream project (June 2015) is an oldie but a goldie. It smartly shifts the usage of autograd capabilities of modern deep learning libraries in a unique way2. Creating results of… questionable usefulness 😂.
2 I’ll now start doing a series of shorter more focussed posts on cool/crazy ideas, stay tuned!
How does it work?
Before jumping into how deep dream works, I recommend having a good understanding on how model parameters are usually trained using gradient descent 🥱. Ok, now introducing the interesting twist:
What if instead of doing gradient descent on the model parameters, we did gradient ascent on the input itself?
More specifically: Imagine we are given a trained model. We can then freeze its parameters
3 As in we won’t change them more 🥶
4 The first difference now is that we are taking the derivative over the input instead of the model parameters.
5 The second difference is that we are looking for a maximum, instead of a minimum of the objective function.
def dream(input_tensor, model, layer_id, lr):
# Step 0: Forward the input through the network
= model.forward(input_tensor)
layer_outputs
# Step 1: Grab activations/feature maps of interest
= layer_outputs[layer_id]
activations
# Step 2: Compute objective
# For instance: average of all layer_id features
= torch.mean(activations)
loss
# Step 3: Backprop gradients
loss.backward()
# Step 4: Do gradient ascent
+= lr * input_tensor.grad.data
input_tensor.data
# Step 5: Zero grads in case there is another iteration
input_tensor.grad.data.zero_()
return input_tensor
Images and ConvNets
Ok, let’s see how we can use this idea more visually 👀. Let’s say we’ve obtained a trained Convolutional Neural Network for image classification6.
6 Using the gradient descent approach explained previously.
7 Common as in frequent in the distribution of images used to train the model.
8 If the “eyes”, “fur”, and “pointy ears” shapes are present, the network classifies
If we forward an image
If we pick
How deep is your love?9
Interestingly, depending on the layer we choose to maximize features we obtain different outputs. For instance, if optimizing over an earlier layer, we maximize the presence of edges and more basic shapes:
Similarly, we can potentiate the features of a particular entity (such as a banana 🍌) by either: Using a network trained on samples of “the banana image distribution”. Or maximizing the features which get represent the class “banana”.
9 Yes, the section title doesn’t make any sense. Don’t worry about it, its ok.
Epilogue
All this posits an almost-mandatory question: How is that useful?
Good question… Some things in life are cooler than they are useful, probably this work falls within the former category. One could argue that it shows how CNNs internal features rise in complexity as inputs forward through the model. But honestly, by 2015 we already knew that…
To be fair, this project might have set a stone for later interpretability work on ANNs, sparked conversations about AI art and AI creativity, it is related to other cool projects such as neural style transfer, and the concept is not that far from diffusion models. And who knows 🤷♂️, it might continue inspiring other cool stuff in the future 🙃 For now we can enjoy whatever this is:
- Google Deep Dream source code
- Google Deep Dream photo album
- Computerphile Deep Dream video
- Aleksa Gordic deep dream implementation