Three patches in eight days: langchain-fireworks moved from 1.3.x to 1.4.2 between May 20 and May 27, 2026 , shipping an SDK migration, a typed context-overflow exception, tighter retry ownership, and a serialization fix that quietly broke cross-provider pipelines in earlier builds. This tutorial unpacks what changed and walks you to a running ChatFireworks instance with tool calling.
1.4.0–1.4.2 Annotated: Dependency Bump, Serialization Cleanup, and Retry Rewiring
The 1.4.x series is a coherent hardening sprint across three rapid releases. Version 1.4.0 (May 20) migrated the integration from fireworks-ai 0.x to the 1.x pre-release line (PR #37581) and introduced FireworksContextOverflowError — a typed wrapper around the raw BadRequestError previously raised when a prompt exceeded the model's context window. Version 1.4.1 (May 21) moved retry ownership entirely to LangChain's decorator layer: max_retries=2 is now the default (PR #37602), and the underlying HTTP client is initialized with max_retries=0 to prevent double-counting. Version 1.4.2 (May 27) is the most broadly impactful: it strips non-wire keys — Anthropic's index on text blocks, LangChain's internal caller on tool_use blocks — before sending to the Fireworks wire API (PR #37714). Pre-1.4.2, those extra keys triggered validation errors in multi-provider pipelines.
Quick Answer: langchain-fireworks 1.4.2 (May 27, 2026) fixes cross-provider validation errors by stripping non-wire content-part keys (index, caller) before sending to the Fireworks API. Paired with 1.4.1's retry rewiring (max_retries=2 default, HTTP client at max_retries=0) and 1.4.0's upgrade to fireworks-ai 1.x, the patch sequence makes ChatFireworks substantially more robust in multi-provider pipelines.
| Version | Release Date | Key PRs | Net User Impact |
|---|---|---|---|
| 1.4.0 | May 20, 2026 | #37581, #37574 | SDK upgraded from fireworks-ai 0.x → 1.x; FireworksContextOverflowError added for context-length failures |
| 1.4.1 | May 21, 2026 | #37602, #37590 | Retries on APIConnectionError; max_retries=2 default; HTTP client forced to max_retries=0 |
| 1.4.2 | May 27, 2026 | #37714, #37650 | Non-wire keys (index, caller) stripped before wire API call; cross-provider validation errors fixed |
"Strip non-wire keys — e.g.indexon Anthropic text blocks,calleron LangChain tool_use blocks — before sending to the Fireworks wire API; previously these triggered validation errors in cross-provider pipelines." — PR #37714 description, langchain-ai/langchain
Before You Start: Python 3.10+, fireworks-ai 1.x Alpha, and a Fireworks Account

You need Python 3.10 or later and a Fireworks API key — obtain one at app.fireworks.ai/login. One dependency caveat deserves its own paragraph.
fireworks-ai 1.x is still in alpha. The latest pre-release as of late May 2026 is 1.2.0a73 ; the last stable release was 0.19.20 from October 2025 . In production, pin to an exact alpha version — e.g., fireworks-ai==1.2.0a73 — rather than a range like >=1.0. Alpha builds can introduce breaking API changes between patch versions without a semver major-bump signal.
Install, Instantiate, and Invoke ChatFireworks: Step-by-Step
ChatFireworks is the primary BaseChatModel interface for Fireworks-hosted models . Model identifiers follow the pattern accounts/fireworks/models/<slug>. Follow these five steps for a working setup.
Streaming with usage tracking. Since 1.2.0 (carried through 1.4.2), stream_usage=True opts into stream_options.include_usage . The final chunk now surfaces as an AIMessageChunk with usage_metadata rather than being silently dropped:
for chunk in llm.stream("Tell me a joke"):
print(chunk.content, end="", flush=True)
if chunk.usage_metadata:
print(f"\nTokens used: {chunk.usage_metadata}")Invoke. .invoke() returns an AIMessage; .content is the text string:
messages = [
("system", "You are a helpful assistant that translates English to French."),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content) # "J'adore la programmation."Instantiate.
from langchain_fireworks import ChatFireworks
llm = ChatFireworks(
model="accounts/fireworks/models/llama-v3p1-8b-instruct",
temperature=0,
max_retries=2, # LangChain decorator layer; HTTP client uses max_retries=0
stream_usage=True, # include token counts in streamed chunks
)Credential. Export the API key or pass api_key= directly to the constructor. The environment variable approach is preferred:
export FIREWORKS_API_KEY='fw_...'Install. Upgrade to 1.4.2 and verify:
pip install -qU langchain-fireworks
pip show langchain-fireworks # expect: Version: 1.4.2Or with uv: uv add langchain-fireworks
Pitfalls to Anticipate: Alpha Pinning, Cross-Provider Messages, and Retry Ownership

Four areas where the 1.4.x upgrade requires deliberate handling:
- Alpha instability. Pin
fireworks-aito an exact alpha version — e.g.,fireworks-ai==1.2.0a73— rather than a range. Alpha builds can introduce breaking API changes between patch versions without a semver signal. Run integration tests before upgrading. - Cross-provider pipelines. The 1.4.2 serialization fix silently strips
indexandcallerkeys before sending to the wire API. If you were working around pre-1.4.2 validation errors by manually sanitizing messages, remove that workaround after upgrading — double-stripping is harmless but adds noise to the pipeline. - Retry ownership.
max_retriesonChatFireworkscontrols the LangChain decorator layer only. The underlyingfireworks.Fireworks()HTTP client is initialized withmax_retries=0by design , ensuring each attempt is visible torun_manager.on_retryand avoiding double-counting. Do not attempt to overridemax_retriesat the HTTP client level. - Image input is not supported.
ChatFireworksraises an error for multimodal (image) message content as of 1.4.2. Verify the capability matrix before routing vision workflows through this integration — use a different LangChain chat model for vision tasks.
Extending Your Setup: Tool Calling, Structured Outputs, and Async

Once basic invocation is working, the most useful next steps for a production integration are tool calling, structured outputs, and explicit error handling. The snippet below is illustrative — it was not executed against a live API — but reflects the current documented interface for bind_tools():
from typing import Annotated
import os
try:
from langchain_core.tools import tool
from langchain_fireworks import ChatFireworks
except ImportError as e:
print(f"missing dependency: {e.name}")
raise SystemExit(1)
if not os.environ.get("FIREWORKS_API_KEY"):
print("Set FIREWORKS_API_KEY to run this ChatFireworks example.")
raise SystemExit(1)
@tool
def multiply(
a: Annotated[int, "first factor"],
b: Annotated[int, "second factor"],
) -> int:
"""Multiply two integers."""
return a * b
llm = ChatFireworks(model="accounts/fireworks/models/llama-v3p1-8b-instruct")
llm_with_tools = llm.bind_tools([multiply])
response = llm_with_tools.invoke("What is 6 times 7? Use the tool.")
print(response.content)
print(response.tool_calls)Beyond tool calling, the integration supports:
- Structured output.
llm.with_structured_output(MyPydanticModel)works with Pydantic v2 schemas for deterministic JSON extraction from model responses. - Async.
llm.ainvoke(messages)for asyncio contexts;llm.astream(messages)for async streaming. The interface mirrors the sync API exactly. - Context overflow handling. Wrap calls in
try/except FireworksContextOverflowError(added 1.4.0) to catch prompt-too-long conditions explicitly rather than letting a raw HTTP error bubble up:
from langchain_fireworks import ChatFireworks, FireworksContextOverflowError
try:
result = llm.invoke(long_messages)
except FireworksContextOverflowError:
# truncate context, switch to a larger-window model, or summarize before retrying
result = fallback_llm.invoke(summarize(long_messages))Frequently Asked Questions
Is fireworks-ai 1.x stable enough for production use?
Not yet. The 1.x series remains in active alpha — the latest release as of late May 2026 is 1.2.0a73 , and the last stable release was 0.19.20 from October 2025 . If deploying to production, pin to an exact alpha version and run integration tests before each upgrade. Floating on a semver range like >=1.0 risks pulling in silent breaking changes between alpha patches.
What does the serialization strip in 1.4.2 actually fix?
When assembling messages from multiple providers, content part dicts can carry provider-specific keys: Anthropic text blocks attach an index key; LangChain's internal tool_use blocks attach a caller key. Pre-1.4.2, those extra keys passed through to the Fireworks wire API and caused validation errors. Version 1.4.2 introduces sanitization functions — built from an allowlist derived from the Fireworks SDK's own TypedDict — that strip non-wire keys before every API call. Upgrading existing pipelines should be transparent; the only observable change is the removal of those validation errors.
How does retry logic work in 1.4.x compared to earlier versions?
Since 1.4.1, retry ownership belongs entirely to LangChain's decorator layer. The underlying fireworks.Fireworks() HTTP client is initialized with max_retries=0 — it performs no retries of its own. The max_retries=2 default on ChatFireworks means up to two retries through the LangChain path, each visible to run_manager.on_retry callbacks. Version 1.4.1 also added retry coverage for bare APIConnectionError conditions, which earlier versions did not retry.
Can ChatFireworks process images or multimodal inputs?
No. Image input is not supported as of 1.4.2 and raises an error. ChatFireworks supports text input, tool calling, structured output, streaming, and logprobs — but not vision. For multimodal workflows, use a LangChain chat model integration that supports image content blocks, then route to Fireworks only for the text-only steps.
What is FireworksContextOverflowError and when does it get raised?
FireworksContextOverflowError was added in 1.4.0 as a typed wrapper around the raw BadRequestError returned when a prompt exceeds the model's context window. Before 1.4.0, that condition surfaced as an untyped HTTP error requiring string-matching to detect. Catching it explicitly lets you branch cleanly to a fallback: truncate context, switch to a model with a larger window, or summarize the conversation before retrying.
What to Try Next
With 1.4.2 installed and a working invoke path, the productive next steps are: wire in with_structured_output() for JSON extraction use cases, add FireworksContextOverflowError handling at call sites if you're operating near context limits, and explicitly test cross-provider message round-trips (Anthropic or OpenAI → Fireworks) to confirm the 1.4.2 serialization fix covers your specific message shapes. If you're migrating from fireworks-ai 0.x, the 1.4.0 SDK bump is the change most likely to surface compatibility gaps — review the updated integration docs and the API reference before upgrading a production dependency.
The Fireworks LangChain integration overview covers available model slugs and tier options. For the complete parameter list including service_tier, logprobs, and timeout, see the source on GitHub.
Last updated: 2026-05-31. Based on langchain-fireworks 1.4.2 (released May 27, 2026) and fireworks-ai 1.2.0a73.