⚙ Hardware ◈ Models ◎ Forecast

Two directions, one timeline

Looking backward: today's $5k box can run GPT-4o-quality models, roughly 1.5–2 years behind the current frontier. Looking forward: closing that remaining gap to Claude Sonnet 4-class takes considerably longer — and the reasons are now precise.

The full picture, 2022–2033

Each marker is either a frontier release date (above the line) or the estimated year that quality becomes locally runnable on a sub-$5k box (below). Hover the dots for detail.

Adjust the scenario

The forward estimate depends on two uncertain variables: how fast MoE and distillation close the quality-per-VRAM gap, and how fast consumer hardware gets high-bandwidth memory. Drag the slider to explore.

Balanced

2030

Steady MoE progress + RTX 60-series in 2028 + incremental VRAM growth

← Conservative Balanced Optimistic →

MoE / distillation pace

~2-year quality doubling

Consumer hardware

RTX 60 in 2028, 64 GB VRAM

Frontier bar movement

Frontier advances steadily

Why there's no 2028 scenario. Even optimistically, the RTX 60 series doesn't ship until 2028, meaning the $5k VRAM ceiling stays at 32 GB through 2027. Sonnet-class likely needs 80+ GB to run at full quality and speed. That gap doesn't close in one GPU generation.

The optimistic path requires a distillation breakthrough. If a Sonnet-class model can be reliably distilled into a 30–40B sparse MoE that runs at ~50 tok/s on 32 GB, the optimistic 2029 scenario becomes real. That requires significant progress in knowledge distillation from proprietary to open-weight models — which may or may not happen.

What's actually adding years to the estimate

Starting from "FLOPs alone would say 2027" and accounting for each real constraint in turn.

Quality parity over time: three scenarios

Indexed to 1.0 = full Claude Sonnet 4 quality locally. The horizontal line is the target. Three paths differ in how fast hardware and model efficiency improve.

Revised estimates at a glance

Estimate	Previous	Revised	Key reason for change
Current lag (reverse)	~3 years (GPT-3.5 tier)	~1.5–2 years (GPT-4o tier)	MoE models (Qwen3-30B-A3B) bring GPT-4o quality to a single 24 GB GPU. Previous estimate used dense 14B as ceiling.
Forward: task-level parity	2028–2029	2028–2030	No consumer GPU in 2026 delays the hardware step; RTX 60 series ~2028 is the next meaningful jump.
Forward: full-product parity	2031 (single-year estimate)	2030 (base), 2029–2032 range	MoE quality acceleration pulls date in; bandwidth and VRAM stagnation push it back. Net: pull in by ~1 year, widen the range slightly.
Main bottleneck	"Memory"	Memory bandwidth and VRAM capacity (not FLOPs)	FLOPs/$ improves on the 2.1-year doubling. Bandwidth/$ is essentially flat at consumer flagship tier over 5 years. These are different constraints.
Biggest accelerant	Model efficiency (vague)	Sparse MoE with small active param count	Qwen3-30B-A3B (3B active, GPT-4o quality) is the clearest proof. Future models may push this further — a 100B-total MoE with 5B active and frontier quality would be transformative.