Nemotron 3 Ultra went live June 4. Here's the call that works.

NVIDIA Nemotron 3 Ultra GA June 4: how to call via NIM/OpenRouter, hardware floor, and the base-checkpoint caveat.

Composer 2.5 hits near-frontier at 60× lower spend

Composer 2.5: third on the Artificial Analysis Coding Index at $0.07/task vs $4.10 for its nearest rival. Billing choice, effective prompting, and what the independent scores actually show.

4 GitHub stars, voice interviews with Ollama: that's GrillKit

Apache 2.0 interview trainer with Whisper voice input, Ollama or cloud LLM support, and local session history. No SaaS, no registration required.

RDNA3 cuts llama.cpp KV VRAM 47% — and CUDA has no equivalent

RDNA3 bit-packing cuts llama.cpp KV VRAM 47% on RX 7900. Flags, VRAM math, and TurboQuant for 4.9× compression.

NodeCartel is dark. Cross-host AI orchestration: who delivers.

NodeCartel is unreachable. Kore.ai, CrewAI Cloud, Northflank, and AgentNode Pro compared for cross-host AI scheduling.

17k tokens → 1.4k — Headroom keeps the originals retrievable

Open-source context compression middleware for agent pipelines: 60–95% token cuts, CCR reversibility, AST-aware engines.

Cognition's $26B needs $1B ARR by December. The math is tight.

$26B valuation on $492M ARR: Cognition's Series D metrics, the Windsurf attribution question, and the $1B ARR target.

Booed at graduation — the AI skeptics you'll be shipping to

MIT Technology Review's May 2026 Hype Index covers graduation boos, Gen Z sentiment (46%), and record AI fundraising.

Opus 4.8 kills budget_tokens — here's what else moved

Opus 4.8: fast mode, mid-session system prompts, 1K cache floor. Old budget_tokens syntax returns 400.

llama-bench skipped FA on capable GPUs — b9437 corrects it

llama.cpp b9437 (May 30): -fa goes auto, -ngl to -1 in llama-bench. Your pre-b9437 comparisons need a flag audit.

Qwen3.6-35B NVFP4 runs on one H100 — A100 owners are out

FP4-quantized Qwen3.6-35B fits in ~23 GB on Hopper. vLLM serve commands, env vars, DGX Spark config, and gotchas.

Step 3.7 Flash is a drop-in — except for one endpoint detail

StepFun Step 3.7 Flash: 198B MoE with native vision, Advisor Mode, and an OpenAI-compatible API you can call today. Includes endpoint gotchas and reasoning_effort examples.

Showing of 152 posts