A running collection of tech talks I’ve watched and what I took away from them.
Andrej Karpathy — Deep Dive into LLMs
Key takeaways:
- What the pre-training stage looks like — training on massive text corpora to predict the next token
- Pre-training produces essentially a token simulator: the model learns to mimic text distributions, not to reason or follow instructions
- What the post-training stage looks like — fine-tuning with RLHF, instruction tuning, and alignment techniques
- How post-training handles many hard problems: reducing hallucinations, giving the model a consistent character and persona
- The difference between a base model (raw token predictor from pre-training) and a post-trained model (aligned, instruction-following assistant)
- The model’s context window is like working memory — precise and temporary, only what’s in the current prompt
- The model’s training data is like internet history — a vague, compressed recollection baked into weights, not directly retrievable
- The fundamental distinction between context window (what the model actively sees right now) and training data (what shaped its weights during learning)