Skip to main content

5 posts tagged with "ML Reserach"

Machine Learning Research

View All Tags

PTX Basics

· 10 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

Parallel Thread Execution (PTX) is a virtual machine instruction set architecture and can be thought of as the assembly language for NVDIA GPUs. You would need to know few syntax of PTX. This should get you started quickly.

PTX Optimization

· 6 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

Let us now delve into the details of PTX, the parallel thread execution virtual instruction set, and explore how DeepSeek might have approached optimization for their H800 GPUs. PTX optimization is critical for maximizing performance in large language models.

Input Block - Tokenization

· 3 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In building LLMs, the first block is called input block. In this step, input text passed through the Tokenizer to create a tokenized text. This Token IDs are then passed through Embedding Layer and positional encoding is added before sending it to the transformer block.

Transformer Block - Multihead Attention

· 9 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In my previous blog post, I introduced Deepseek LLM's innovative parallel thread execution (PTX) mechanism and how they use it for GPU optimization. Today, we'll cover another foundational topic - Multihead Attention (MHA) before diving into Deepseek's second groundbreaking innovation known as Multihead Latent Attention (MHLA).

Emotional Cause Pair Analysis

· 4 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

As you move on from working on production grade machine learning projects to the field of research, you get a glimpse of what companies or team of brilliant minds work on.