Skip to main content
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

I am a researcher in the field of Quantum Gravity and LLMs. I have a strong interest in C and Rust programming languages.

View all authors

LLM Overview

· 3 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

Large Language Models

OpenAI has been at the forefront of developing sophisticated LLMs, often perceived as black boxes. Until DeepSeek's open-source release, that is. Now, we have a unique opportunity to peek behind the curtain. In this series of articles, I will delve into the inner workings of DeepSeek's LLMs, starting with the basics before moving on to more advanced topics, covering their architecture and optimization techniques. Here are some of my notes exploring their research papers.

Graphics Processing Unit (GPU)

· 3 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In my last blog, I quickly introduced to DeepSeek and some of the components the used. In this blog, we will get down to basics of GPU and specifically NVIDIA GPU architecture, how CUDA programs gets compiled.

PTX Basics

· 10 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

Parallel Thread Execution (PTX) is a virtual machine instruction set architecture and can be thought of as the assembly language for NVDIA GPUs. You would need to know few syntax of PTX. This should get you started quickly.

PTX Optimization

· 6 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

Let us now delve into the details of PTX, the parallel thread execution virtual instruction set, and explore how DeepSeek might have approached optimization for their H800 GPUs. PTX optimization is critical for maximizing performance in large language models.

Input Block - Tokenization

· 3 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In building LLMs, the first block is called input block. In this step, input text passed through the Tokenizer to create a tokenized text. This Token IDs are then passed through Embedding Layer and positional encoding is added before sending it to the transformer block.

Transformer Block - Multihead Attention

· 9 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In my previous blog post, I introduced Deepseek LLM's innovative parallel thread execution (PTX) mechanism and how they use it for GPU optimization. Today, we'll cover another foundational topic - Multihead Attention (MHA) before diving into Deepseek's second groundbreaking innovation known as Multihead Latent Attention (MHLA).

Dotfiles

· 7 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

In the first part of my Productivity series, we talked about configuring Neovim as your IDE. You can check out that blog and my configuration here.

Neovim IDE

· 6 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

An Integrated Development Environment (IDE) provides a comprehensive list of features like code editor, compiler/interpreter, code completion, debugger and much more.

Quantum Machine Learning

· 6 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

My love for Quantum Physics rekindled in 2017 while studying the foundation of mathematics used in Machine Learning and Deep Learning.

Cookie ML

· 4 min read
Rakesh
C | Rust | Quantum Gravity | LLM Researcher

I go by one simple principle