PTX Basics
Parallel Thread Execution (PTX) is a virtual machine instruction set architecture and can be thought of as the assembly language for NVDIA GPUs. You would need to know few syntax of PTX. This should get you started quickly.
Machine Learning Research
View All TagsParallel Thread Execution (PTX) is a virtual machine instruction set architecture and can be thought of as the assembly language for NVDIA GPUs. You would need to know few syntax of PTX. This should get you started quickly.
Let us now delve into the details of PTX, the parallel thread execution virtual instruction set, and explore how DeepSeek might have approached optimization for their H800 GPUs. PTX optimization is critical for maximizing performance in large language models.
In building LLMs, the first block is called input block. In this step, input text passed through the Tokenizer to create a tokenized text. This Token IDs are then passed through Embedding Layer and positional encoding is added before sending it to the transformer block.
In my previous blog post, I introduced Deepseek LLM's innovative parallel thread execution (PTX) mechanism and how they use it for GPU optimization. Today, we'll cover another foundational topic - Multihead Attention (MHA) before diving into Deepseek's second groundbreaking innovation known as Multihead Latent Attention (MHLA).
As you move on from working on production grade machine learning projects to the field of research, you get a glimpse of what companies or team of brilliant minds work on.