Are you interested in up-leveling your nightmares? If so, I have some good news!¹ You’ll enjoy today’s post 🥳 We’ll learn how to enhance what artificial neural networks “want to see” in images.

¹ I also have some bad news: You should seek professional medical help 🏥.

I’m no science-person, but my personal recommendation is not to look for too long into these pictures (if you intend to continue operating more-or-less normally).

Google’s Deep Dream project (June 2015) is an oldie but a goldie. It smartly shifts the usage of autograd capabilities of modern deep learning libraries in a unique way². Creating results of… questionable usefulness 😂.

² I’ll now start doing a series of shorter more focussed posts on cool/crazy ideas, stay tuned!

How does it work?

Before jumping into how deep dream works, I recommend having a good understanding on how model parameters are usually trained using gradient descent 🥱. Ok, now introducing the interesting twist:

What if instead of doing gradient descent on the model parameters, we did gradient ascent on the input itself?

More specifically: Imagine we are given a trained model. We can then freeze its parameters $W$ ³, take a particular input $x$ and modify $x$ in a direction in which some of the model internal features are maximized. Similar to Supervised Learning ⁴, if we define some objective function $L_{D D} (f, x, W)$ , we can iteratively modify $x$ to find a maximum⁵ of it:

³ As in we won’t change them more 🥶

⁴ The first difference now is that we are taking the derivative over the input instead of the model parameters.

⁵ The second difference is that we are looking for a maximum, instead of a minimum of the objective function.

$x^{k + 1} \leftarrow x^{k} + η \cdot \nabla_{x} L_{D D} (f, x^{k}, W)$

Tip 1: The code, oversimplified


def dream(input_tensor, model, layer_id, lr):
    # Step 0: Forward the input through the network
    layer_outputs = model.forward(input_tensor)

    # Step 1: Grab activations/feature maps of interest
    activations = layer_outputs[layer_id]

    # Step 2: Compute objective
    # For instance: average of all layer_id features
    loss = torch.mean(activations)

    # Step 3: Backprop gradients
    loss.backward()

    # Step 4: Do gradient ascent
    input_tensor.data += lr * input_tensor.grad.data

    # Step 5: Zero grads in case there is another iteration
    input_tensor.grad.data.zero_()

    return input_tensor

Images and ConvNets

Ok, let’s see how we can use this idea more visually 👀. Let’s say we’ve obtained a trained Convolutional Neural Network for image classification⁶.

⁶ Using the gradient descent approach explained previously.

⁷ Common as in frequent in the distribution of images used to train the model.

⁸ If the “eyes”, “fur”, and “pointy ears” shapes are present, the network classifies $x$ as a cat 🐱.

If we forward an image $x$ through the network: The first layers extract the position of very simple features, such as edges. As the input advances through the network, these features get combined into more and more complex patterns. In the last layers, features become recognizable common shapes⁷ such as eyes, wheels, roofs… Those patterns are the ones that are relevant for the network to classify what is present in the picture⁸.

If we pick $L_{D D}$ such that it selects some features from the last layers. We are effectively shifting $x$ so that it maximizes the presence of the selected features.

ConvNets are usually trained with pictures of animals, buildings, or vehicles. We can identify how their common features arise when maximizing them.

How deep is your love?⁹

Interestingly, depending on the layer we choose to maximize features we obtain different outputs. For instance, if optimizing over an earlier layer, we maximize the presence of edges and more basic shapes:

From left to right: original picture, result of optimizing over shallow layers, result of optimizing over deeper layers.

Similarly, we can potentiate the features of a particular entity (such as a banana 🍌) by either: Using a network trained on samples of “the banana image distribution”. Or maximizing the features which get represent the class “banana”.

Result of optimizing random noise for certain classes.

⁹ Yes, the section title doesn’t make any sense. Don’t worry about it, its ok.

Epilogue

All this posits an almost-mandatory question: How is that useful?

Good question… Some things in life are cooler than they are useful, probably this work falls within the former category. One could argue that it shows how CNNs internal features rise in complexity as inputs forward through the model. But honestly, by 2015 we already knew that…

To be fair, this project might have set a stone for later interpretability work on ANNs, sparked conversations about AI art and AI creativity, it is related to other cool projects such as neural style transfer, and the concept is not that far from diffusion models. And who knows 🤷‍♂️, it might continue inspiring other cool stuff in the future 🙃 For now we can enjoy whatever this is:

Tip 2: References and pointers

Google Deep Dream source code
Google Deep Dream photo album
Computerphile Deep Dream video
Aleksa Gordic deep dream implementation

How does it work?

Images and ConvNets

How deep is your love?9

Epilogue

How deep is your love?⁹