Winter-attention This repository contains notes about attention mechanisms with code examples included this repo have a inductive bias towards vision transformers * - * Markdowns Introduction to Attention Why use Layer Normalization in Transformers? Code Exploration Cosine Similarity and vector similarity Self-attention in MNIST Multi-head attention in MNIST How to use einsum How to use einops Vision Transformers What is a Vision Transformer? Vision Transformer in pytorch from scratch Attention and positional encoding visualization