Skip to main content

2 posts tagged with "LLM"

Large Language Models

View All Tags

Graphics Processing Unit (GPU)

· 3 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In my last blog, I quickly introduced to DeepSeek and some of the components the used. In this blog, we will get down to basics of GPU and specifically NVIDIA GPU architecture, how CUDA programs gets compiled.

Transformer Block - Multihead Attention

· 9 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In my previous blog post, I introduced Deepseek LLM's innovative parallel thread execution (PTX) mechanism and how they use it for GPU optimization. Today, we'll cover another foundational topic - Multihead Attention (MHA) before diving into Deepseek's second groundbreaking innovation known as Multihead Latent Attention (MHLA).