What langchain-fireworks 1.4.x Changed for Your Code

What the 1.4.x patch sequence changed — and a runnable ChatFireworks setup from scratch.

What langchain-fireworks 1.4.x Changed for Your Code
Share

Three patches in eight days: langchain-fireworks moved from 1.3.x to 1.4.2 between May 20 and May 27, 2026 , shipping an SDK migration, a typed context-overflow exception, tighter retry ownership, and a serialization fix that quietly broke cross-provider pipelines in earlier builds. This tutorial unpacks what changed and walks you to a running ChatFireworks instance with tool calling.

1.4.0–1.4.2 Annotated: Dependency Bump, Serialization Cleanup, and Retry Rewiring

The 1.4.x series is a coherent hardening sprint across three rapid releases. Version 1.4.0 (May 20) migrated the integration from fireworks-ai 0.x to the 1.x pre-release line (PR #37581) and introduced FireworksContextOverflowError — a typed wrapper around the raw BadRequestError previously raised when a prompt exceeded the model's context window. Version 1.4.1 (May 21) moved retry ownership entirely to LangChain's decorator layer: max_retries=2 is now the default (PR #37602), and the underlying HTTP client is initialized with max_retries=0 to prevent double-counting. Version 1.4.2 (May 27) is the most broadly impactful: it strips non-wire keys — Anthropic's index on text blocks, LangChain's internal caller on tool_use blocks — before sending to the Fireworks wire API (PR #37714). Pre-1.4.2, those extra keys triggered validation errors in multi-provider pipelines.

Quick Answer: langchain-fireworks 1.4.2 (May 27, 2026) fixes cross-provider validation errors by stripping non-wire content-part keys (index, caller) before sending to the Fireworks API. Paired with 1.4.1's retry rewiring (max_retries=2 default, HTTP client at max_retries=0) and 1.4.0's upgrade to fireworks-ai 1.x, the patch sequence makes ChatFireworks substantially more robust in multi-provider pipelines.

Version Release Date Key PRs Net User Impact
1.4.0 May 20, 2026 #37581, #37574 SDK upgraded from fireworks-ai 0.x → 1.x; FireworksContextOverflowError added for context-length failures
1.4.1 May 21, 2026 #37602, #37590 Retries on APIConnectionError; max_retries=2 default; HTTP client forced to max_retries=0
1.4.2 May 27, 2026 #37714, #37650 Non-wire keys (index, caller) stripped before wire API call; cross-provider validation errors fixed
"Strip non-wire keys — e.g. index on Anthropic text blocks, caller on LangChain tool_use blocks — before sending to the Fireworks wire API; previously these triggered validation errors in cross-provider pipelines." — PR #37714 description, langchain-ai/langchain

Before You Start: Python 3.10+, fireworks-ai 1.x Alpha, and a Fireworks Account

langchain-fireworks 1.4.2: Annotated + ChatFireworks Quickstart

You need Python 3.10 or later and a Fireworks API key — obtain one at app.fireworks.ai/login. One dependency caveat deserves its own paragraph.

fireworks-ai 1.x is still in alpha. The latest pre-release as of late May 2026 is 1.2.0a73 ; the last stable release was 0.19.20 from October 2025 . In production, pin to an exact alpha version — e.g., fireworks-ai==1.2.0a73 — rather than a range like >=1.0. Alpha builds can introduce breaking API changes between patch versions without a semver major-bump signal.

Install, Instantiate, and Invoke ChatFireworks: Step-by-Step

ChatFireworks is the primary BaseChatModel interface for Fireworks-hosted models . Model identifiers follow the pattern accounts/fireworks/models/<slug>. Follow these five steps for a working setup.

Streaming with usage tracking. Since 1.2.0 (carried through 1.4.2), stream_usage=True opts into stream_options.include_usage . The final chunk now surfaces as an AIMessageChunk with usage_metadata rather than being silently dropped:

for chunk in llm.stream("Tell me a joke"):
    print(chunk.content, end="", flush=True)
    if chunk.usage_metadata:
        print(f"\nTokens used: {chunk.usage_metadata}")

Invoke. .invoke() returns an AIMessage; .content is the text string:

messages = [
    ("system", "You are a helpful assistant that translates English to French."),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)  # "J'adore la programmation."

Instantiate.

from langchain_fireworks import ChatFireworks

llm = ChatFireworks(
    model="accounts/fireworks/models/llama-v3p1-8b-instruct",
    temperature=0,
    max_retries=2,       # LangChain decorator layer; HTTP client uses max_retries=0
    stream_usage=True,   # include token counts in streamed chunks
)

Credential. Export the API key or pass api_key= directly to the constructor. The environment variable approach is preferred:

export FIREWORKS_API_KEY='fw_...'

Install. Upgrade to 1.4.2 and verify:

pip install -qU langchain-fireworks
pip show langchain-fireworks   # expect: Version: 1.4.2

Or with uv: uv add langchain-fireworks

Pitfalls to Anticipate: Alpha Pinning, Cross-Provider Messages, and Retry Ownership

langchain-fireworks 1.4.2: Annotated + ChatFireworks Quickstart

Four areas where the 1.4.x upgrade requires deliberate handling:

  • Alpha instability. Pin fireworks-ai to an exact alpha version — e.g., fireworks-ai==1.2.0a73 — rather than a range. Alpha builds can introduce breaking API changes between patch versions without a semver signal. Run integration tests before upgrading.
  • Cross-provider pipelines. The 1.4.2 serialization fix silently strips index and caller keys before sending to the wire API. If you were working around pre-1.4.2 validation errors by manually sanitizing messages, remove that workaround after upgrading — double-stripping is harmless but adds noise to the pipeline.
  • Retry ownership. max_retries on ChatFireworks controls the LangChain decorator layer only. The underlying fireworks.Fireworks() HTTP client is initialized with max_retries=0 by design , ensuring each attempt is visible to run_manager.on_retry and avoiding double-counting. Do not attempt to override max_retries at the HTTP client level.
  • Image input is not supported. ChatFireworks raises an error for multimodal (image) message content as of 1.4.2. Verify the capability matrix before routing vision workflows through this integration — use a different LangChain chat model for vision tasks.

Extending Your Setup: Tool Calling, Structured Outputs, and Async

langchain-fireworks 1.4.2: Annotated + ChatFireworks Quickstart

Once basic invocation is working, the most useful next steps for a production integration are tool calling, structured outputs, and explicit error handling. The snippet below is illustrative — it was not executed against a live API — but reflects the current documented interface for bind_tools():

from typing import Annotated
import os

try:
    from langchain_core.tools import tool
    from langchain_fireworks import ChatFireworks
except ImportError as e:
    print(f"missing dependency: {e.name}")
    raise SystemExit(1)

if not os.environ.get("FIREWORKS_API_KEY"):
    print("Set FIREWORKS_API_KEY to run this ChatFireworks example.")
    raise SystemExit(1)


@tool
def multiply(
    a: Annotated[int, "first factor"],
    b: Annotated[int, "second factor"],
) -> int:
    """Multiply two integers."""
    return a * b


llm = ChatFireworks(model="accounts/fireworks/models/llama-v3p1-8b-instruct")
llm_with_tools = llm.bind_tools([multiply])
response = llm_with_tools.invoke("What is 6 times 7? Use the tool.")

print(response.content)
print(response.tool_calls)

Beyond tool calling, the integration supports:

  • Structured output. llm.with_structured_output(MyPydanticModel) works with Pydantic v2 schemas for deterministic JSON extraction from model responses.
  • Async. llm.ainvoke(messages) for asyncio contexts; llm.astream(messages) for async streaming. The interface mirrors the sync API exactly.
  • Context overflow handling. Wrap calls in try/except FireworksContextOverflowError (added 1.4.0) to catch prompt-too-long conditions explicitly rather than letting a raw HTTP error bubble up:
from langchain_fireworks import ChatFireworks, FireworksContextOverflowError

try:
    result = llm.invoke(long_messages)
except FireworksContextOverflowError:
    # truncate context, switch to a larger-window model, or summarize before retrying
    result = fallback_llm.invoke(summarize(long_messages))

Frequently Asked Questions

Is fireworks-ai 1.x stable enough for production use?

Not yet. The 1.x series remains in active alpha — the latest release as of late May 2026 is 1.2.0a73 , and the last stable release was 0.19.20 from October 2025 . If deploying to production, pin to an exact alpha version and run integration tests before each upgrade. Floating on a semver range like >=1.0 risks pulling in silent breaking changes between alpha patches.

What does the serialization strip in 1.4.2 actually fix?

When assembling messages from multiple providers, content part dicts can carry provider-specific keys: Anthropic text blocks attach an index key; LangChain's internal tool_use blocks attach a caller key. Pre-1.4.2, those extra keys passed through to the Fireworks wire API and caused validation errors. Version 1.4.2 introduces sanitization functions — built from an allowlist derived from the Fireworks SDK's own TypedDict — that strip non-wire keys before every API call. Upgrading existing pipelines should be transparent; the only observable change is the removal of those validation errors.

How does retry logic work in 1.4.x compared to earlier versions?

Since 1.4.1, retry ownership belongs entirely to LangChain's decorator layer. The underlying fireworks.Fireworks() HTTP client is initialized with max_retries=0 — it performs no retries of its own. The max_retries=2 default on ChatFireworks means up to two retries through the LangChain path, each visible to run_manager.on_retry callbacks. Version 1.4.1 also added retry coverage for bare APIConnectionError conditions, which earlier versions did not retry.

Can ChatFireworks process images or multimodal inputs?

No. Image input is not supported as of 1.4.2 and raises an error. ChatFireworks supports text input, tool calling, structured output, streaming, and logprobs — but not vision. For multimodal workflows, use a LangChain chat model integration that supports image content blocks, then route to Fireworks only for the text-only steps.

What is FireworksContextOverflowError and when does it get raised?

FireworksContextOverflowError was added in 1.4.0 as a typed wrapper around the raw BadRequestError returned when a prompt exceeds the model's context window. Before 1.4.0, that condition surfaced as an untyped HTTP error requiring string-matching to detect. Catching it explicitly lets you branch cleanly to a fallback: truncate context, switch to a model with a larger window, or summarize the conversation before retrying.

What to Try Next

With 1.4.2 installed and a working invoke path, the productive next steps are: wire in with_structured_output() for JSON extraction use cases, add FireworksContextOverflowError handling at call sites if you're operating near context limits, and explicitly test cross-provider message round-trips (Anthropic or OpenAI → Fireworks) to confirm the 1.4.2 serialization fix covers your specific message shapes. If you're migrating from fireworks-ai 0.x, the 1.4.0 SDK bump is the change most likely to surface compatibility gaps — review the updated integration docs and the API reference before upgrading a production dependency.

The Fireworks LangChain integration overview covers available model slugs and tier options. For the complete parameter list including service_tier, logprobs, and timeout, see the source on GitHub.

Last updated: 2026-05-31. Based on langchain-fireworks 1.4.2 (released May 27, 2026) and fireworks-ai 1.2.0a73.