2 posts tagged with "LLM"

Large Language Models

Graphics Processing Unit (GPU)

March 1, 2025 · 3 min read

C | Rust | Quantum Gravity | LLM Researcher

In my last blog, I quickly introduced to DeepSeek and some of the components the used. In this blog, we will get down to basics of GPU and specifically NVIDIA GPU architecture, how CUDA programs gets compiled.

Transformer Block - Multihead Attention

March 1, 2025 · 9 min read

Rakesh

C | Rust | Quantum Gravity | LLM Researcher

In my previous blog post, I introduced Deepseek LLM's innovative parallel thread execution (PTX) mechanism and how they use it for GPU optimization. Today, we'll cover another foundational topic - Multihead Attention (MHA) before diving into Deepseek's second groundbreaking innovation known as Multihead Latent Attention (MHLA).