Claude / Anthropic Build & Learn Daily How-To

Opus 4.8 kills budget_tokens — here's what else moved

Opus 4.8: fast mode, mid-session system prompts, 1K cache floor. Old budget_tokens syntax returns 400.

May 31, 2026

Opus 4.8 kills budget_tokens — here's what else moved

What Opus 4.8 Adds Over 4.7

Anthropic shipped Claude Opus 4.8 on May 28, 2026 , 41 days after Opus 4.7 — a compressed cycle driven partly by competitive pressure from OpenAI Codex and Google Gemini Flash . The headline change is the removal of budget_tokens — passing it now returns a 400. Beyond that, four additions land: a fast-throughput mode, mid-session system messages, a lower prompt-cache floor, and publicly documented refusal stop_details. Code running cleanly on Opus 4.7 requires no other changes to move to 4.8 .

Quick Answer: Swap claude-opus-4-7 for claude-opus-4-8 and remove any budget_tokens usage — it returns a 400. New in this release: speed: "fast" (up to 2.5× throughput at $10/$50 per million tokens), mid-session role: "system" messages, a 1,024-token prompt-cache floor (down from ~2,000), and documented refusal stop_details.

Change	Opus 4.7	Opus 4.8
Throughput mode	Standard only	`speed: "fast"` — up to 2.5× output rate, $10/$50 per M tokens
Mid-session system messages	Not supported	`role: "system"` in messages array, no beta header
Prompt cache minimum	~2,000 tokens	1,024 tokens — no code changes needed
`stop_details` on refusals	Present but undocumented	Publicly documented; categorizes refusal type
Standard pricing	$5/$25 per M tokens	$5/$25 per M tokens — unchanged

Paid Plans and Enrollment Checklist

Claude Code and fast mode are not available on Anthropic's free tier. Before writing any Opus 4.8 code, work through this checklist to avoid hitting enrollment errors in production:

Claude Code access: Requires a paid Anthropic subscription (Pro, Max, Teams, or Enterprise) or API console credits — the free tier does not include it (video: Tech With Tim).
Fast mode enrollment: Research preview, gated separately. Enroll via the Anthropic console before adding speed: "fast" to any request. Sending the parameter without prior enrollment returns an error.
Platform and context window: Claude API, Amazon Bedrock, and Vertex AI each provide a 1 million token context window . Microsoft Foundry caps at 200k tokens — verify your target platform before designing long-context pipelines.
Output limits: Maximum synchronous output is 128k tokens . The Message Batches API raises this to 300k tokens per call with the beta header output-300k-2026-03-24.
Knowledge cutoff: January 2026 .
SDK version: Confirm anthropic>=0.51 is installed before deploying.

How to Call Each Capability in Opus 4.8

Five call patterns cover everything new or changed in Opus 4.8. All examples use the Python SDK (anthropic>=0.51). Steps 4 and 5 are the most migration-sensitive; read the next section before deploying either.

Reading stop_details on refusalsCheck response.stop_details.type to branch by category rather than parsing free-text content. The following snippet is illustrative — see the Opus 4.8 changelog for all documented type values.

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": user_input}]
)

if response.stop_reason == "refusal":
    refusal_type = response.stop_details.type
    if refusal_type == "harmful_content":
        return {"error": "content_policy", "action": "revise_prompt"}
    elif refusal_type == "privacy":
        return {"error": "privacy_policy", "action": "remove_pii"}
    else:
        return {"error": "refusal", "type": refusal_type}

Adaptive thinking — omit budget_tokensPass thinking={"type": "adaptive"} and set effort separately. Accepted values: "low", "high" (default), "max". The model decides per turn whether extended reasoning is warranted (video: Skill Leap AI). Passing budget_tokens returns a 400.

response = client.messages.create(
    model="claude-opus-4-8",
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # or "low", "max"
    max_tokens=4096,
    messages=[{"role": "user", "content": "Audit this function for security issues."}]
)

Mid-session system messagesInsert {"role": "system", "content": "..."} into the messages array immediately after any user turn. No beta header required. Earlier turns stay cached — you pay only for the injected delta .

messages = [
    {"role": "user", "content": "Analyze this codebase."},
    {"role": "assistant", "content": "Starting with the entry points..."},
    # Inject updated instruction mid-session — no beta header needed
    {"role": "system", "content": "Focus only on security-critical paths from here."},
    {"role": "user", "content": "Continue with the auth module."},
]

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=2048,
    messages=messages
)

Fast mode — add speed="fast"Pass speed="fast" as a top-level parameter (not inside the model string). Doubles per-token cost to $10 input / $50 output per million tokens . Confirm console enrollment before deploying; unenrolled requests return an error.

response = client.messages.create(
    model="claude-opus-4-8",
    speed="fast",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Generate a 500-word summary."}]
)

Basic call — update the model IDChange model to claude-opus-4-8. All other request structure from 4.7 carries over unchanged.

import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain the migration in one sentence."}]
)
print(response.content[0].text)

The following verified migration script summarizes the full diff from Opus 4.6-style requests to Opus 4.8. It was executed and confirmed (exit 0):

import json

old = {
    "model": "claude-opus-4-6",
    "thinking": {"type": "enabled", "budget_tokens": 32000},
}

new = {
    "model": "claude-opus-4-8",
    "thinking": {"type": "adaptive"},
    "output_config": {"effort": "high"},
    "speed": "fast",
}

print(json.dumps({
    "budget_tokens": "removed: use adaptive thinking + effort instead",
    "before": old,
    "after": new,
    "also_moved": {
        "effort_default": "high",
        "sampling": "omit temperature/top_p/top_k non-defaults",
        "prompt_cache_min_tokens": 1024,
        "max_output_tokens": 128000,
    },
}, indent=2))

Output from the above script:

{
  "budget_tokens": "removed: use adaptive thinking + effort instead",
  "before": {
    "model": "claude-opus-4-6",
    "thinking": {
      "type": "enabled",
      "budget_tokens": 32000
    }
  },
  "after": {
    "model": "claude-opus-4-8",
    "thinking": {
      "type": "adaptive"
    },
    "output_config": {
      "effort": "high"
    },
    "speed": "fast"
  },
  "also_moved": {
    "effort_default": "high",
    "sampling": "omit temperature/top_p/top_k non-defaults",
    "prompt_cache_min_tokens": 1024,
    "max_output_tokens": 128000
  }
}

400 Errors and Other Failure Points

Opus 4.8 inherits the same sampling constraints as Opus 4.7. These four failure points account for the majority of migration bugs, particularly when upgrading from Opus 3 or Sonnet:

thinking: {"type": "enabled", "budget_tokens": N} → 400. Deprecated since Opus 4.7, still unsupported in 4.8. Replace with thinking={"type": "adaptive"} plus output_config={"effort": "..."}. This is the single most common mistake when switching from extended-thinking-era code .
Any non-default temperature, top_p, or top_k → 400. Omit these parameters entirely. Steer output style through prompting techniques instead — this constraint is unchanged from 4.7.
speed: "fast" without enrollment → API error. This is not a 400 but a distinct enrollment-check error. Confirm console access before shipping any fast-mode code path.
Microsoft Foundry: 200k token context, not 1M. If your pipeline was designed around the 1M window on the Claude API, Bedrock, or Vertex AI, it behaves differently on Foundry . Verify your target platform before committing to a retrieval or long-document architecture.

On the behavioral side: per TechCrunch's reporting on early enterprise testing, Opus 4.8 is approximately four times less likely than Opus 4.7 to allow code flaws to pass unremarked, and it proactively flags input/output uncertainties in analytical pipelines — a pattern confirmed by Bridgewater Associates during pre-launch evaluation . If your evals measure output volume rather than accuracy, recalibrate before moving to production.

Going Further With Opus 4.8

Three capabilities in Opus 4.8 are worth queuing up for exploration, though two remain in research preview as of June 2026 with no confirmed GA date:

Dynamic Workflows (Claude Code, research preview): Decomposes a large task into a plan, then fans it across hundreds of parallel subagents in a single session. Designed for codebase-scale work — migrations across hundreds of thousands of lines, from kickoff to merge, using your existing test suite as the quality bar . Token consumption is substantially higher than single-agent flows; budget accordingly before enabling.
Message Batches API — 300k output tokens per call: Add the beta header output-300k-2026-03-24 to raise the per-call output ceiling from 128k to 300k tokens . Practical for large-document generation or batch summarization at scale.
Mid-session prompt injection for cost reduction: In multi-turn agentic loops, inject only the changed instruction slice after each turn while leaving earlier cached turns intact. Savings compound with session length — no need to retransmit the full system prompt on every call.

Frequently Asked Questions

What is the model ID for Claude Opus 4.8?

Use claude-opus-4-8. The model is available on the Claude API, Amazon Bedrock, and Vertex AI with a 1 million token context window , and on Microsoft Foundry with a 200k token limit. Standard pricing is $5 per million input tokens and $25 per million output tokens.

Why does setting `budget_tokens` cause a 400 error on Opus 4.8?

The thinking: {"type": "enabled", "budget_tokens": N} syntax was deprecated starting with Opus 4.7 and is unsupported in 4.8. The correct form is thinking={"type": "adaptive"} with a separate output_config={"effort": "..."} parameter — values are "low", "high", or "max", defaulting to "high". Omit budget_tokens entirely.

What does fast mode cost compared to standard Opus 4.8?

Standard Opus 4.8 costs $5 input / $25 output per million tokens. Fast mode costs $10 input / $50 output per million tokens — exactly double — but delivers up to 2.5× higher output token rate . Fast mode is a research preview and requires separate enrollment in the Anthropic console before any requests will succeed.

Can I set `temperature` or `top_p` with Opus 4.8?

No. Any non-default temperature, top_p, or top_k value returns a 400 error — the same constraint as Opus 4.7. Omit these parameters entirely and control output style through prompting instead.

Do I need to change any code to benefit from the lower prompt cache threshold?

No code changes required. Opus 4.8 drops the minimum cacheable prompt from approximately 2,000 tokens to 1,024 tokens automatically . Prompts between 1,024 and 2,000 tokens that previously missed caching now qualify, cutting repeat-call input costs without any migration work on your end.

Watch / Sources

What to Migrate, What to Monitor

The Opus 4.8 upgrade is one line for most codebases: change the model ID. The substantive migration work is removing budget_tokens if you were on extended thinking, and confirming no sampling parameters are set. The four new capabilities — fast mode, mid-session system prompts, the lower cache floor, and stop_details routing — each add a handful of lines, not a structural rearchitecture.

Fast mode and Dynamic Workflows remain research previews with no confirmed GA date. The Mythos-class models expected to outperform Opus 4.8 on coding benchmarks were described as coming "in the coming weeks" as of the release date, with safety reviews still ongoing . For now, claude-opus-4-8 is the production ceiling on every supported platform.

Last updated: 2026-06-01. Based on Anthropic's Opus 4.8 release notes and model documentation published May 28, 2026.

Opus 4.8 kills budget_tokens — here's what else moved

What Opus 4.8 Adds Over 4.7

Paid Plans and Enrollment Checklist

How to Call Each Capability in Opus 4.8

400 Errors and Other Failure Points

Going Further With Opus 4.8

Frequently Asked Questions

What is the model ID for Claude Opus 4.8?

Why does setting `budget_tokens` cause a 400 error on Opus 4.8?

What does fast mode cost compared to standard Opus 4.8?

Can I set `temperature` or `top_p` with Opus 4.8?

Do I need to change any code to benefit from the lower prompt cache threshold?

Watch / Sources

What to Migrate, What to Monitor

Featured posts

What langchain-fireworks 1.4.x Changed for Your Code

Why langchain-perplexity 1.3.1 Dropped Its SSE Shim

OpenAI's Rosalind Is Free for Public Health — Still Gated

Devin's $26B Raise Changed How AI Coding Agents Compare

Gemini's ERA Model Is Now Outrunning CDC Disease Forecasts

재시도마다 Opus 4.8 thinking block이 손상됐다

b2에서 Sandbox 프리셋과 config 클래스가 달라졌다

260억 달러 유치 후 AI 코딩 에이전트 지형이 달라졌나

CDC 예측을 넘어선 Gemini ERA의 실제 성능

LLM 비용을 선물로 헤지한다는 상하이의 실험

Tags

Opus 4.8 kills budget_tokens — here's what else moved

What Opus 4.8 Adds Over 4.7

Paid Plans and Enrollment Checklist

How to Call Each Capability in Opus 4.8

400 Errors and Other Failure Points

Going Further With Opus 4.8

Frequently Asked Questions

What is the model ID for Claude Opus 4.8?

Why does setting budget_tokens cause a 400 error on Opus 4.8?

What does fast mode cost compared to standard Opus 4.8?

Can I set temperature or top_p with Opus 4.8?

Do I need to change any code to benefit from the lower prompt cache threshold?

Watch / Sources

What to Migrate, What to Monitor

Featured posts

Tags

Sign up for insights and ideas

Why does setting `budget_tokens` cause a 400 error on Opus 4.8?

Can I set `temperature` or `top_p` with Opus 4.8?