An interactive, intuitive guide to how we adapt large language models to specific tasks โ without retraining them from scratch.
Think of it as teaching a generalist doctor to become a specialist.
The analogy: A foundation model (like GPT-4.1 or Claude) is like a brilliant new medical school graduate. They know a LOT about everything โ anatomy, chemistry, diagnostics. But they've never seen a patient at your hospital. Fine-tuning doesn't add new medical knowledge โ it teaches them your hospital's procedures, your paperwork format, your tone with patients.
The model already knows things from pre-training on internet-scale data. Fine-tuning changes how it responds โ its format, tone, reasoning style, and consistency. You're not filling its brain with new facts; you're shaping its habits.
Consistent JSON output format. A specific tone (formal, playful). Handling domain-specific jargon. Classification tasks. Following complex multi-step procedures.
Teaching the model entirely new facts. Real-time data (use RAG instead). Tasks the base model fundamentally can't do. Replacing good prompt engineering.
This is why techniques like LoRA exist โ to make fine-tuning practical.
What actually changes when we fine-tune? Let's look at the math โ visually.
Every layer in a transformer has weight matrices โ grids of numbers that transform input signals. Toggle between approaches to see how the model gets updated.
In full fine-tuning, every single weight in the matrix gets updated. Expensive! ๐ธ
Analogy: Imagine the original weight matrix is a giant oil painting in a museum. Full fine-tuning = painting over the whole canvas. LoRA = placing a small transparent overlay on top that subtly shifts certain colors. The original painting is untouched; you can remove or swap overlays anytime.
Instead of computing a full 4096ร4096 change matrix (16.7M params), LoRA decomposes it into two smaller matrices โ say 4096ร8 and 8ร4096 โ for only 65K params. The "rank" (r) controls how expressive this update is. Higher r = more capacity but more parameters.
One base model, many lightweight adapters. Swap them like lenses on a camera.
Adjust the rank to see how it affects the number of trainable parameters. Lower rank = smaller adapter, but less expressive.
Since the base model stays frozen, you can train many different LoRA adapters and swap them at inference time. Each adapter is tiny (a few MB vs. tens of GB).
Camera lens analogy: The base model is your camera body. LoRA adapters are interchangeable lenses โ one for macro photography, one for wide-angle, one for portraits. You don't need a separate camera for each style. You just swap the lens (adapter), which is much smaller and cheaper than the body (model).
Four different approaches for four different needs. Click each to explore.
"Here's the correct answer for this input."
"This answer is better than that one."
"Score this reasoning chain."
"Look at this image and respond."
| Method | What you provide | Best for | Models |
|---|---|---|---|
| SFT | Input โ correct output pairs | Format, classification, translation | gpt-4.1, gpt-4.1-mini/nano |
| DPO | Input โ chosen vs. rejected output | Tone, style, summarization focus | gpt-4.1, gpt-4.1-mini/nano |
| RFT | Input โ model generates โ expert grades | Complex reasoning, domain expertise | o4-mini (reasoning models) |
| Vision | Image + text โ correct output | Image classification, visual QA | gpt-4o |
Real-ish examples of JSONL training data for each method.
Each example is a conversation with the "correct" assistant response. The model learns to imitate these outputs.
๐ก The model learns the format (JSON), the classification behavior, and the tone โ all from examples like these.
Each example shows two possible responses โ one preferred ("chosen") and one rejected. The model learns to prefer the better style.
๐ก Same content, different quality. The model learns to prefer concise, data-rich, professional communication over vague summaries.
The model generates its own response, then an expert grader scores it. High-scoring reasoning chains get reinforced.
๐ก Unlike SFT, we don't provide the answer โ the model reasons through it, and experts score the quality. High-scoring chains strengthen the model's reasoning paths.
All training data is uploaded as .jsonl files โ one JSON object per line:
Each line is one training example. Typically you need hundreds to thousands of examples for good results.