← Back to blog

DeepSeek V4 Launches: A Frontier-Class Open Model at a Fraction of the Price

··6 min read
DeepSeek V4 Launches: A Frontier-Class Open Model at a Fraction of the Price

DeepSeek V4 Launches: A Frontier-Class Open Model at a Fraction of the Price

Last Friday, while the West was still digesting GPT-5.5 and Claude Opus 4.7, DeepSeek V4 quietly went live on Hugging Face under an MIT license. We've spent the weekend reading the technical report, kicking the tires on the API, and watching the price-per-token chart bend in a way we haven't seen since V3 first showed up. This one is a big deal — even if you're not the kind of person who normally cares about Chinese MoE architectures.

Here's the short version: DeepSeek V4 closes most of the gap to the frontier, ships open weights, and undercuts proprietary pricing by an order of magnitude. That's a combination the rest of the field has not yet figured out how to answer.

What DeepSeek V4 actually is

The V4 series ships in two SKUs. DeepSeek-V4-Pro is a 1.6T-parameter Mixture-of-Experts model with 49B active parameters per token. DeepSeek-V4-Flash is the smaller sibling — 284B total, 13B active. Both support a 1M-token context window, both are open-weight under MIT, and both are downloadable today from Hugging Face and ModelScope.

The architecture is where things get interesting. DeepSeek built V4 around a hybrid attention scheme that combines two new mechanisms:

  • Compressed Sparse Attention (CSA) — a sparse-attention pattern that drops most of the long-range tokens from each step's compute graph.
  • Heavily Compressed Attention (HCA) — a complementary path that crunches the dropped tokens into a low-rank summary.

On top of that, they added Manifold-Constrained Hyper-Connections (mHC) — basically beefier residual connections that keep gradient signal stable across the full depth of the model. Combine all of that with their existing MoE routing, and the headline number is wild: at 1M-token contexts, V4-Pro burns only 27% of the per-token FLOPs and 10% of the KV cache of V3.2.

In other words, they didn't just match the frontier on quality — they did it while making long-context inference dramatically cheaper to serve. That's the part proprietary labs are going to have to react to.

The benchmarks

Per the technical report and the early independent evals, DeepSeek-V4-Pro-Max sits above GPT-5.2 and Gemini 3.0 Pro on standard reasoning benchmarks, and just below GPT-5.4 and Gemini 3.1 Pro. On coding competition benchmarks, both V4 models are described as "comparable to GPT-5.4."

A few numbers worth pinning to the wall:

  1. Reasoning: within roughly 3–6 months of trailing edge of frontier — and gaining.
  2. Coding: GPT-5.4-class on competition benchmarks; below Claude Opus 4.7 on agentic SWE-bench Pro, but the gap is small enough to matter for most teams.
  3. Long context: the cost-efficiency win at 1M tokens is the real moat — you can keep entire repos, books, or call transcripts in the prompt without flinching at the bill.

We don't think V4 dethrones Opus 4.7 for the hardest agentic coding work. But for most of what builders actually ship — RAG, tool calling, document analysis, content workflows — the quality is now indistinguishable from the closed flagships at a price point that simply isn't comparable.

The pricing changes the math

Here's the part that should make every product team pull up their cost spreadsheet:

  • DeepSeek-V4-Flash: $0.14 per million input tokens / $0.28 per million output
  • DeepSeek-V4-Pro: $1.74 per million input / $3.48 per million output

Compare that to Claude Opus 4.7 at $5 / $25, or GPT-5.5 at its current tier pricing. V4-Pro input is roughly 35× cheaper than Opus 4.7, and output is roughly 7× cheaper. Flash is in a different universe entirely — cheap enough to throw at problems where you'd previously have written a regex.

Pro tip: if you're running a high-volume agentic workload, route the easy 80% of calls to V4-Flash, fall back to V4-Pro for harder reasoning, and reserve Opus 4.7 or GPT-5.5 only for the genuinely hard last-mile tasks. We've seen teams cut total inference spend by 60–80% with this kind of three-tier routing — and quality goes up, not down, because each model does what it's best at.

Why open weights matter for creators

If you build with AI image and video tools — which most of the PromptVerse community does — the LLM layer often shows up as the brain of your pipeline. It's what writes your Higgsfield prompts, drafts your Veo storyboards, scores your Seedance shot lists, and decides which Soul-2 reference image to grab.

Until now, that brain has lived almost entirely behind closed APIs. DeepSeek V4 changes that. With MIT-licensed weights and a 1M-token context window, you can:

  • Self-host V4-Flash on a single H200 node and run unlimited prompt-rewrites for a fixed monthly cost.
  • Fine-tune V4 on your own brand voice, your own approved prompt library, your own creator persona.
  • Pipe entire campaign briefs, brand books, and reference shoots into a single context call without truncation.

For studios shipping AI commercials at scale, that's not a marginal improvement — that's a structural change to the cost of doing business.

What we're watching next

A few open questions we're actively tracking, and you should be too:

  • Distilled and fine-tuned variants. The V3 generation produced a wave of community fine-tunes (R1, R1-Distill, Janus, etc.) that ended up being the actual workhorses. Expect the same pattern with V4 over the next 4–8 weeks.
  • Image and video routing. A V4-class brain that can intelligently route between nano_banana_2, seedance_2_0, kling3_0, and veo3_1_lite — picking the right model for each beat of a story — is now genuinely affordable to build. We think this is where the next prompting meta lives.
  • The frontier labs' response. OpenAI, Anthropic, and Google now have a credible open competitor pricing them out of the long-tail use cases. Either prices come down, context windows go up, or open weights start eating real revenue. Probably all three.

Our take

DeepSeek V4 isn't a beat-the-frontier moment — Opus 4.7 and GPT-5.5 still hold the top of the leaderboard for the hardest reasoning and agentic coding work. But it's the moment the open ecosystem caught up to "good enough for almost everything," and it did so at a price point that makes the closed flagships look like luxury goods.

If you're building creator tooling on top of LLMs — prompt rewriters, shot-list generators, storyboard agents, brand-safe captioners — V4-Flash and V4-Pro should be on your shortlist by Monday morning. The economics are too good to ignore, the weights are too open to lock you in, and the long-context numbers are too clean to argue with.

We'll be running V4-Pro against our internal prompt-rewriting benchmarks this week and will share what we find. Subscribe to the newsletter if you want the head-to-head results in your inbox.

Got an angle on V4 we missed? Reply on the newsletter or drop us a note from the submit page — we read everything.