Bocachancla 🫦🩴
About
Bocachancla
Categories
All
(8)
Beyond the transformer
(2)
Miscellaneous
(2)
Transformers explained
(4)
Is $5M all you need?
Transformers explained
Dissecting DeepSeek’s models: Starting from the original transformer, I take a look at all architectural and training improvements that went into DeepSeek-R1.
Feb 15, 2025
Oleguer Canal
Dot-product attention enhancements: MHA, MQA, GQA, and MHLA
Transformers explained
Starting from dot-product attention, I present and give intuitive understanding of the main variants of the attention mechanism, namely: Multi-Head, Multi-Query, Grouped-Query, and Multi-Head Latent attentions. Of course, the visuals are beautiful, because my parents didn’t raise a PowerPoint-diagram-maker 💁♂️
Feb 5, 2025
Oleguer Canal
Linear Transformers, Mamba2, and many ramblings
Beyond the transformer
I go through architectures used in sequence modelling: FFN, CNN, RNN, SSM, and Transformers along with many efficiency optimization attempts. I provide intuitive understanding of how they work, and analyze their strengths and weaknesses. All while paying special
attention
(pun intended) to their computational and memory complexities.
Jan 15, 2025
Oleguer Canal
Deep Dreams (Are Made of This)
Miscellaneous
What if instead of doing
gradient descent
on model parameters, we do
gradient ascent
on the model input?
Sep 30, 2024
Oleguer Canal
Understanding positional encodings
Transformers explained
I give an intuitive understanding of sinusoidal positional encodings and RoPE.
Jul 1, 2024
Oleguer Canal
RLHF & PPO: From text continuation to aligned assistant
Transformers explained
Train a model to generalize human preference, use it to align your LLM.
Jun 5, 2024
Oleguer Canal
HiPPOs 🦛, Mambas 🐍, and other creatures
Beyond the transformer
I go through the research journey that lead into Mamba. I first review SSMs, then explore the models: HiPPO, S4, DSS, and finally Mamba.
May 18, 2024
Oleguer Canal
Ramblings around information theory
Miscellaneous
Today we’ll look into the concept of
information
from a probabilistic perspective
1
. Hold on to your hat 👒 because we will connect topics as random as:
Apr 15, 2024
Oleguer Canal
No matching items