Fine-Tuning & LoRA

An interactive, intuitive guide to how we adapt large language models to specific tasks โ€” without retraining them from scratch.

What is Fine-Tuning?

Think of it as teaching a generalist doctor to become a specialist.

The analogy: A foundation model (like GPT-4.1 or Claude) is like a brilliant new medical school graduate. They know a LOT about everything โ€” anatomy, chemistry, diagnostics. But they've never seen a patient at your hospital. Fine-tuning doesn't add new medical knowledge โ€” it teaches them your hospital's procedures, your paperwork format, your tone with patients.

Key Insight

Fine-tune behavior, not knowledge

The model already knows things from pre-training on internet-scale data. Fine-tuning changes how it responds โ€” its format, tone, reasoning style, and consistency. You're not filling its brain with new facts; you're shaping its habits.

๐Ÿง 
Pre-trained Model
Knows everything,
but is generic
โ†’
๐Ÿ”ง
Fine-Tuning
Show examples of
desired behavior
โ†’
๐ŸŽฏ
Specialized Model
Behaves exactly
how you want
โœ… Good for

Consistent JSON output format. A specific tone (formal, playful). Handling domain-specific jargon. Classification tasks. Following complex multi-step procedures.

โŒ Not ideal for

Teaching the model entirely new facts. Real-time data (use RAG instead). Tasks the base model fundamentally can't do. Replacing good prompt engineering.

The problem of scale

Why can't we just retrain the whole thing?

175B
GPT-3 parameters
~700 GB
Memory for full fine-tune
~18M
With LoRA (0.01%)

This is why techniques like LoRA exist โ€” to make fine-tuning practical.

Inside a Neural Layer

What actually changes when we fine-tune? Let's look at the math โ€” visually.

Interactive ยท Full Fine-Tuning vs LoRA

Every layer in a transformer has weight matrices โ€” grids of numbers that transform input signals. Toggle between approaches to see how the model gets updated.

Full Fine-Tune
LoRA Adapters
W (original)
๐Ÿ”ฅ ALL weights modified
โ†’
W' (updated)
๐Ÿ”ฅ Every cell changed

In full fine-tuning, every single weight in the matrix gets updated. Expensive! ๐Ÿ’ธ

Analogy: Imagine the original weight matrix is a giant oil painting in a museum. Full fine-tuning = painting over the whole canvas. LoRA = placing a small transparent overlay on top that subtly shifts certain colors. The original painting is untouched; you can remove or swap overlays anytime.

Math intuition

W' = W + ฮ”W,  where ฮ”W โ‰ˆ B ร— A

Instead of computing a full 4096ร—4096 change matrix (16.7M params), LoRA decomposes it into two smaller matrices โ€” say 4096ร—8 and 8ร—4096 โ€” for only 65K params. The "rank" (r) controls how expressive this update is. Higher r = more capacity but more parameters.

LoRA: The Swap-In Trick

One base model, many lightweight adapters. Swap them like lenses on a camera.

Interactive ยท Rank Explorer

Adjust the rank to see how it affects the number of trainable parameters. Lower rank = smaller adapter, but less expressive.

8
Original W
โ€”
parameters
LoRA (B + A)
โ€”
parameters
Savings
โ€”
reduction
Concept ยท Adapter Swapping

One model, many skills

Since the base model stays frozen, you can train many different LoRA adapters and swap them at inference time. Each adapter is tiny (a few MB vs. tens of GB).

๐Ÿง 
Base Model
~7B params, frozen
+
๐Ÿฅ Medical QA 14MB adapter
โš–๏ธ Legal Brief 11MB adapter
๐Ÿ’ป Code Review 9MB adapter
=
โ“
Pick an adapter โ†’
Click one to see it in action

Camera lens analogy: The base model is your camera body. LoRA adapters are interchangeable lenses โ€” one for macro photography, one for wide-angle, one for portraits. You don't need a separate camera for each style. You just swap the lens (adapter), which is much smaller and cheaper than the body (model).

Fine-Tuning Methods

Four different approaches for four different needs. Click each to explore.

๐Ÿ“

Supervised (SFT)

"Here's the correct answer for this input."

โš–๏ธ

Preference (DPO)

"This answer is better than that one."

๐Ÿ†

Reinforcement (RFT)

"Score this reasoning chain."

๐Ÿ‘๏ธ

Vision Fine-Tune

"Look at this image and respond."

Comparison
Method What you provide Best for Models
SFT Input โ†’ correct output pairs Format, classification, translation gpt-4.1, gpt-4.1-mini/nano
DPO Input โ†’ chosen vs. rejected output Tone, style, summarization focus gpt-4.1, gpt-4.1-mini/nano
RFT Input โ†’ model generates โ†’ expert grades Complex reasoning, domain expertise o4-mini (reasoning models)
Vision Image + text โ†’ correct output Image classification, visual QA gpt-4o

What Training Data Looks Like

Real-ish examples of JSONL training data for each method.

Supervised Fine-Tuning (SFT)

Each example is a conversation with the "correct" assistant response. The model learns to imitate these outputs.

system You are a medical triage assistant. Always respond with a JSON object containing "urgency" (1-5), "likely_category", and "next_step".
user I've had a persistent headache for 3 days. No fever. Vision is fine.
assistant {"urgency": 2, "likely_category": "neurology", "next_step": "Schedule appointment within 48hrs. Take OTC pain relief."}
system You are a medical triage assistant...
user Severe chest pain radiating to left arm. Sweating profusely.
assistant {"urgency": 5, "likely_category": "cardiology", "next_step": "Call emergency services (911) immediately."}

๐Ÿ’ก The model learns the format (JSON), the classification behavior, and the tone โ€” all from examples like these.

Direct Preference Optimization (DPO)

Each example shows two possible responses โ€” one preferred ("chosen") and one rejected. The model learns to prefer the better style.

user Summarize this earnings report for our investors.
โœ… chosen Q3 revenue grew 12% YoY to $847M, driven by enterprise expansion. Operating margin improved 200bps. We're raising FY guidance to $3.2-3.4B.
โŒ rejected So basically the company made more money this quarter. Revenue was up which is good. They think they'll make even more money next year. Overall pretty solid performance with some improvements in how efficiently they operate.

๐Ÿ’ก Same content, different quality. The model learns to prefer concise, data-rich, professional communication over vague summaries.

Reinforcement Fine-Tuning (RFT)

The model generates its own response, then an expert grader scores it. High-scoring reasoning chains get reinforced.

user Patient: 67M, diabetic, presenting with acute kidney injury. Creatinine 4.2 (baseline 1.1). Currently on metformin, lisinopril, ibuprofen. What medication changes do you recommend?
model generates Step 1: Identify nephrotoxic agents โ†’ ibuprofen (NSAID) is nephrotoxic. Step 2: Metformin risk โ†’ contraindicated in AKI (lactic acidosis risk). Step 3: ACE inhibitor โ†’ lisinopril may worsen AKI. Recommendation: Discontinue ibuprofen and metformin immediately. Hold lisinopril. Switch to acetaminophen for pain. Monitor creatinine daily.
โญ expert grade: 9/10 Correct identification of all three problematic medications. Good reasoning chain. Minor: should also mention fluid resuscitation.

๐Ÿ’ก Unlike SFT, we don't provide the answer โ€” the model reasons through it, and experts score the quality. High-scoring chains strengthen the model's reasoning paths.

The JSONL Format

All training data is uploaded as .jsonl files โ€” one JSON object per line:

{"messages": [{"role": "system", "content": "You are a triage bot..."}, {"role": "user", "content": "Chest pain..."}, {"role": "assistant", "content": "{\"urgency\": 5, ...}"}]} {"messages": [{"role": "system", "content": "You are a triage bot..."}, {"role": "user", "content": "Mild headache..."}, {"role": "assistant", "content": "{\"urgency\": 2, ...}"}]}

Each line is one training example. Typically you need hundreds to thousands of examples for good results.