Tutorials

Hands-On Guides.

Step-by-step technical tutorials with code examples, from neural network fundamentals to production-grade LLM fine-tuning.

Intermediate45 min

Constrained Decoding: How to Get Guaranteed JSON from an LLM (and the Reasoning Tax)

How constrained decoding guarantees valid JSON from an LLM: runnable vLLM and structured-output examples, the latency cost, and the reasoning tax that JSON-mode hides.

Prerequisites
Python 3.10+vLLM 0.19+ and a GPU that can serve the model (or a smaller Qwen3.6 dense variant on a 24GB card)the datasets and pydantic libraries
Beginner45 min

Running LLMs Locally in 2026: A Step-by-Step Setup Guide for Ollama, llama.cpp, and vLLM

A hands-on guide to running LLMs locally in 2026: install Ollama, verify the API, then build llama.cpp and serve with vLLM, with the VRAM and bandwidth math behind each step.

Prerequisites
A GPU with 16GB+ VRAM or an Apple Silicon Maca terminalcommand-line basics. Python 3.10+ only needed for the vLLM section.
Advanced4-6 hours

Optimizing CUDA Kernels for Generative Adversarial Networks

Learn to optimize CUDA kernels for GAN training: memory coalescing, occupancy tuning, mixed-precision training, custom fused kernels, Triton compiler, and profiling with Nsight. Practical code included.

Prerequisites
CUDA basicsPyTorchUnderstanding of GANs
Intermediate2-3 hours

Fine-Tuning Transformer Models with Low-Rank Adaptation (LoRA)

Learn LoRA fine-tuning step by step: the math behind low-rank adaptation, QLoRA quantization, Unsloth training, hyperparameter selection, and practical code for consumer GPUs.

Prerequisites
Python proficiencyPyTorch basicsUnderstanding of transformer architecture