Starred Posts ⭐

Linear Transformers, Mamba2, and many ramblings
Linear Transformers, Mamba2, and many ramblings
I go through architectures used in sequence modelling: FFN, CNN, RNN, SSM, and Transformers along with many efficiency optimization attempts. I provide intuitive understanding of how they work, and analyze their strengths and weaknesses. All while paying special attention (pun intended) to their computational and memory complexities.
No matching items
Understanding Transformers

Dot-product attention enhancements: MHA, MQA, GQA, and MHLA
No matching items
ML Basics
No matching items
Miscellaneous
No matching items










