Anthropic Releases Claude Fable 5 and Mythos 5 — 2026-06-09

1. Anthropic Releases Claude Fable 5 and Mythos 5

Anthropic has officially released Claude Fable 5, marking the first public availability of its high-end Mythos-class models. Designed for advanced software engineering and complex knowledge work, Fable 5 introduces a unique safety architecture: when classifiers detect queries related to high-risk domains like cybersecurity or biology, the system automatically routes the request to Claude Opus 4.8. For specialized use cases, Anthropic is also deploying Claude Mythos 5—which has these safeguards removed—exclusively to authorized cyberdefenders and researchers.

• Claude Fable 5 is available to the public via the Claude API, while Claude Mythos 5 is restricted to approved partners in Project Glasswing.
• Both models are priced at $10 per million input tokens and $50 per million output tokens.
• Fable 5 includes safety classifiers that automatically route queries about cybersecurity, biology, chemistry, and distillation to Claude Opus 4.8.
• The safety fallback mechanism triggers in fewer than 5% of user sessions on average.
• The models achieved an 80.3% score on the SWE-bench Pro benchmark, outperforming OpenAI's GPT-5.5.
• Anthropic is mandating a 30-day data retention policy for all Fable 5 and Mythos 5 traffic to monitor for safety and jailbreaks.

Developers can now build applications using Anthropic's most powerful model class for complex coding and knowledge work without worrying about high-risk queries causing outright failures.

SOURCES

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

2. Apple Unveils AFM 3 and CoreAI for On-Device Inference

At WWDC26, Apple introduced its third-generation foundation models (AFM 3) alongside CoreAI, a new local inference engine designed to replace CoreML. To bypass the strict memory limits of mobile devices, the 20-billion-parameter AFM 3 Core Advanced model stores its weight set in NAND flash memory rather than DRAM. It utilizes an Instruction-Following Pruning (IFP) architecture to route experts once per prompt, allowing the system to dynamically scale active parameters from 1 billion to 4 billion without exhausting memory bandwidth.

• The AFM 3 family includes five models, featuring the 20-billion-parameter AFM 3 Core Advanced designed for on-device deployment.
• AFM 3 Core Advanced stores its weights in NAND flash memory instead of DRAM.
• The model dynamically scales active parameters from 1 billion to 4 billion based on task complexity.
• It uses Instruction-Following Pruning (IFP) to route experts once per prompt.
• Apple introduced CoreAI as a replacement for CoreML, offering optimized local inference on Apple Silicon with Swift APIs.
• CoreAI requires model weights to be converted using a Python script.

Developers can deploy highly capable 20-billion-parameter models directly on user devices without exhausting system memory.

SOURCES

[1] [2]

3. Cohere Releases Open-Weights North Mini Code Model

Cohere has released North Mini Code, a 30-billion-parameter mixture-of-experts (MoE) model designed specifically for agentic coding pipelines. Operating with just 3 billion active parameters per token, the model is highly efficient, capable of running on a single H100 GPU or locally on a Mac Studio. It features an expansive 256k context window and is licensed under Apache 2.0, making it a powerful open-weights option for developers building autonomous software engineering agents.

• North Mini Code is a 30B parameter mixture-of-experts model that utilizes 3B active parameters per token.
• The model is released under the Apache 2.0 license, with weights available on Hugging Face in standard and fp8 formats.
• It supports a 256,000 token context window and a 64,000 token maximum generation length.
• The model achieved a score of 33.4 on the Artificial Analysis Coding Index.
• Deployment via vLLM requires using the vLLM main branch and installing the cohere_melody library version 0.9.0 or higher.

Developers can self-host a highly competitive coding model under an Apache 2.0 license that runs efficiently on a single H100 GPU or local Mac Studio.

SOURCES

[1] [2] [3] [4]

4. Google Launches Gemini 3.5 Live Translate API Preview

Google has introduced Gemini 3.5 Live Translate, a real-time speech-to-speech translation model now available in public preview. Accessible via the Gemini Live API and Google AI Studio, the model processes continuous audio streams to translate spoken language on the fly with only a few seconds of latency. It automatically detects more than 70 languages while preserving the original speaker's pacing, pitch, and intonation, outputting high-quality 24kHz PCM audio embedded with SynthID watermarks.

• Gemini 3.5 Live Translate is a continuous streaming speech-to-speech model that automatically detects and translates over 70 languages.
• The model is available in public preview for developers via the Gemini Live API and Google AI Studio.
• It uses continuous stream processing to translate with only a few seconds of latency.
• The pipeline accepts raw 16-bit PCM audio at 16kHz and outputs raw 16-bit PCM audio at 24kHz.
• All generated audio includes an imperceptible SynthID watermark for safety and detectability.

Developers can build real-time, low-latency voice translation features into their apps with automatic language detection and natural voice preservation.

SOURCES

[1] [2]

5. NPM v12 to Introduce Breaking Security Defaults in July 2026

The upcoming major release of npm v12, scheduled for July 2026, will introduce significant breaking changes to default security behaviors. To mitigate supply-chain attacks, the allowScripts configuration will default to off, automatically blocking preinstall, install, and postinstall scripts from dependencies. This change will block native node-gyp builds and prepare scripts from git, file, and link dependencies unless explicitly approved. Developers can prepare for the transition and audit their dependencies using npm version 11.16.0 or newer, which surfaces warnings for these upcoming defaults.

• npm v12 is scheduled for release in July 2026 and will introduce breaking security-related default changes.
• The allowScripts configuration will default to off, blocking automatic execution of preinstall, install, and postinstall scripts.
• Native node-gyp builds and prepare scripts from git, file, and link dependencies will be blocked by default.
• Developers can manage script permissions via npm approve-scripts and npm deny-scripts, saving the allowlist in package.json.
• The --allow-git and --allow-remote flags will default to none, requiring explicit permission to resolve Git and remote URL dependencies.
• Developers can test and prepare for these changes using npm version 11.16.0 or newer.

Developers must prepare their build pipelines and local environments now to prevent broken installations of native node-gyp builds and git dependencies.

SOURCES

[1]

6. Researchers Release ntkMirror for Local Hallucination Mitigation

Researchers have open-sourced ntkMirror, a training-free implementation of an answer/abstain gate designed to mitigate predictable hallucinations in local open-weight models. Based on research accepted at ICML 2026, the tool addresses "permutation dispersion"—where the order of evidence presentation alters model output probabilities. By utilizing an order-marginal verifier and a custom fused kernel that accelerates the permutation forward pass by up to 10x, the gate successfully limits hallucinations to under 0.7% on audited datasets by selectively abstaining from low-confidence claims.

• The ntkMirror implementation is based on an ICML 2026 paper on predictable hallucinations and permutation dispersion.
• The tool is a training-free answer/abstain gate for local open-weight models using an order-marginal verifier.
• In audits, the gate achieved 0.0–0.7% hallucination at approximately 24% abstention with 80.5% accuracy.
• ntkMirror includes a fused kernel that accelerates the permutation forward pass by 2.6 to 10 times compared to a naive loop.
• The implementation is available on GitHub and supports models like Qwen2.5 and Gemma.

Developers running local open-weight models can integrate a training-free gate to dramatically reduce hallucinations on evidence-grounded tasks.

SOURCES

[1]

1. Anthropic Releases Claude Fable 5 and Mythos 5

2. Apple Unveils AFM 3 and CoreAI for On-Device Inference

3. Cohere Releases Open-Weights North Mini Code Model

4. Google Launches Gemini 3.5 Live Translate API Preview

5. NPM v12 to Introduce Breaking Security Defaults in July 2026

6. Researchers Release ntkMirror for Local Hallucination Mitigation

Inference Brew in your inbox