Fine-Tuning & LoRA

An interactive, intuitive guide to how we adapt large language models to specific tasks — without retraining them from scratch.

robert@barcik.training

The Big Picture

Inside a Neuron

LoRA Magic

Methods & Data

What is Fine-Tuning?

Think of it as teaching a generalist doctor to become a specialist.

The analogy: A foundation model (like GPT-4.1 or Claude) is like a brilliant new medical school graduate. They know a LOT about everything — anatomy, chemistry, diagnostics. But they've never seen a patient at your hospital. Fine-tuning doesn't add new medical knowledge — it teaches them your hospital's procedures, your paperwork format, your tone with patients.

Key Insight

Fine-tune behavior, not knowledge

The model already knows things from pre-training on internet-scale data. Fine-tuning changes how it responds — its format, tone, reasoning style, and consistency. You're not filling its brain with new facts; you're shaping its habits.

🧠

Pre-trained Model

Knows everything,
but is generic

→

🔧

Fine-Tuning

Show examples of
desired behavior

→

🎯

Specialized Model

Behaves exactly
how you want

✅ Good for

Consistent JSON output format. A specific tone (formal, playful). Handling domain-specific jargon. Classification tasks. Following complex multi-step procedures.

❌ Not ideal for

Teaching the model entirely new facts. Real-time data (use RAG instead). Tasks the base model fundamentally can't do. Replacing good prompt engineering.

The problem of scale

Why can't we just retrain the whole thing?

175B

GPT-3 parameters

~2 TB

Memory for full fine-tune (weights + gradients + optimizer)

~18M

With LoRA (0.01%)

This is why techniques like LoRA exist — to make fine-tuning practical.

Inside a Neural Layer

What actually changes when we fine-tune? Let's look at the math — visually.

Interactive · Full Fine-Tuning vs LoRA

Every layer in a transformer has weight matrices — grids of numbers that transform input signals. Toggle between approaches to see how the model gets updated.

Full Fine-Tune

LoRA Adapters

W (original)

🔥 ALL weights modified

→

W' (updated)

🔥 Every cell changed

In full fine-tuning, every single weight in the matrix gets updated. Expensive! 💸

Analogy: Imagine the original weight matrix is a giant oil painting in a museum. Full fine-tuning = painting over the whole canvas. LoRA = placing a small transparent overlay on top that subtly shifts certain colors. The original painting is untouched; you can remove or swap overlays anytime.

Math intuition

W' = W + ΔW, where ΔW ≈ B × A

Instead of computing a full 4096×4096 change matrix (16.7M params), LoRA decomposes it into two smaller matrices — say 4096×8 and 8×4096 — for only 65K params. The "rank" (r) controls how expressive this update is. Higher r = more capacity but more parameters.

Why two matrices instead of one?

If we used a single ΔW matrix, it would need the same dimensions as W — for example, 4096 × 4096 = 16.7 million parameters. That's no savings at all.

By decomposing into B (4096 × r) × A (r × 4096), we force all changes through a rank-r bottleneck. With r = 8, that's only 65,536 parameters — a 99.6% reduction.

The key insight: most fine-tuning changes don't need full expressiveness. The useful adaptations live in a low-dimensional subspace, and two small matrices are enough to capture them.

LoRA: The Swap-In Trick

One base model, many lightweight adapters. Swap them like lenses on a camera.

Interactive · Rank Explorer

Adjust the rank to see how it affects the number of trainable parameters. Lower rank = smaller adapter, but less expressive.

Rank (r): 8

Original W

—

parameters

LoRA (B + A)

—

parameters

Savings

—

reduction

Concept · Adapter Swapping

One model, many skills

Since the base model stays frozen, you can train many different LoRA adapters and swap them at inference time. Each adapter is tiny (a few MB vs. tens of GB).

🧠

Base Model

~7B params, frozen

🏥 Medical QA 14MB adapter

⚖️ Legal Brief 11MB adapter

💻 Code Review 9MB adapter

❓

Pick an adapter →

Click one to see it in action

Camera lens analogy: The base model is your camera body. LoRA adapters are interchangeable lenses — one for macro photography, one for wide-angle, one for portraits. You don't need a separate camera for each style. You just swap the lens (adapter), which is much smaller and cheaper than the body (model).

Fine-Tuning Methods & Training Data

Four approaches — click each to see how it works and what training data looks like.

📝

Supervised (SFT)

"Here's the correct answer for this input."

⚖️

Preference (DPO)

"This answer is better than that one."

🏆

Reinforcement (RFT)

"Score this reasoning chain."

👁️

Vision Fine-Tune

"Look at this image and respond."

Training Data Examples

Real-ish examples of JSONL training data for each method.

Supervised Fine-Tuning (SFT)

Each example is a conversation with the "correct" assistant response. The model learns to imitate these outputs.

system You are a medical triage assistant. Always respond with a JSON object containing "urgency" (1-5), "likely_category", and "next_step".

user I've had a persistent headache for 3 days. No fever. Vision is fine.

assistant {"urgency": 2, "likely_category": "neurology", "next_step": "Schedule appointment within 48hrs. Take OTC pain relief."}

system You are a medical triage assistant...

user Severe chest pain radiating to left arm. Sweating profusely.

assistant {"urgency": 5, "likely_category": "cardiology", "next_step": "Call emergency services (911) immediately."}

The model learns the format (JSON), the classification behavior, and the tone — all from examples like these.

Direct Preference Optimization (DPO)

Each example shows two possible responses — one preferred ("chosen") and one rejected. The model learns to prefer the better style.

user Summarize this earnings report for our investors.

chosen Q3 revenue grew 12% YoY to $847M, driven by enterprise expansion. Operating margin improved 200bps. We're raising FY guidance to $3.2-3.4B.

rejected So basically the company made more money this quarter. Revenue was up which is good. They think they'll make even more money next year. Overall pretty solid performance with some improvements in how efficiently they operate.

Same content, different quality. The model learns to prefer concise, data-rich, professional communication over vague summaries.

Reinforcement Fine-Tuning (RFT)

The model generates its own response, then an expert grader scores it. High-scoring reasoning chains get reinforced.

user Patient: 67M, diabetic, presenting with acute kidney injury. Creatinine 4.2 (baseline 1.1). Currently on metformin, lisinopril, ibuprofen. What medication changes do you recommend?

model generates Step 1: Identify nephrotoxic agents → ibuprofen (NSAID) is nephrotoxic. Step 2: Metformin risk → contraindicated in AKI (lactic acidosis risk). Step 3: ACE inhibitor → lisinopril may worsen AKI. Recommendation: Discontinue ibuprofen and metformin immediately. Hold lisinopril. Switch to acetaminophen for pain. Monitor creatinine daily.

expert grade: 9/10 Correct identification of all three problematic medications. Good reasoning chain. Minor: should also mention fluid resuscitation.

Unlike SFT, we don't provide the answer — the model reasons through it, and experts score the quality. High-scoring chains strengthen the model's reasoning paths.

The JSONL Format

All training data is uploaded as .jsonl files — one JSON object per line:

{"messages": [{"role": "system", "content": "You are a triage bot..."}, {"role": "user", "content": "Chest pain..."}, {"role": "assistant", "content": "{\"urgency\": 5, ...}"}]}
{"messages": [{"role": "system", "content": "You are a triage bot..."}, {"role": "user", "content": "Mild headache..."}, {"role": "assistant", "content": "{\"urgency\": 2, ...}"}]}

Each line is one training example. Typically you need hundreds to thousands of examples for good results.

Comparison

Method	What you provide	Best for	Models
SFT	Input → correct output pairs	Format, classification, translation	Current GPT text models
DPO	Input → chosen vs. rejected output	Tone, style, summarization focus	Current GPT text models
RFT	Input → model generates → expert grades	Complex reasoning, domain expertise	Reasoning models (o-series)
Vision	Image + text → correct output	Image classification, visual QA	Multimodal GPT models