1. OpenAI Releases GPT-Realtime-2
OpenAI has introduced GPT-Realtime-2, a flagship native speech-to-speech model designed for high-performance conversational applications. The model features adjustable reasoning effort levels and an expanded 128K token context window, up from 32K. It supports text, audio, and image inputs, with significant improvements in latency, achieving a Time to First Audio of 1.12 seconds in minimal reasoning mode.
- • 128K context window
- • Adjustable reasoning effort levels
- • 1.12s latency in minimal mode
- • Supports text, audio, and image inputs
Developers building voice-first applications can leverage lower latency and higher reasoning capabilities without price increases.
2. Mozilla Uses Anthropic’s Mythos to Patch Firefox
Mozilla engineers have integrated Anthropic’s Mythos AI model into a custom agent harness to automate vulnerability detection in Firefox. By providing the model with direct access to build pipelines and testing environments, the team identified 271 security flaws over two months with minimal false positives. This approach allows the AI to iteratively read files, write code, and evaluate test cases until a security goal is met.
- • 271 vulnerabilities identified
- • Custom agent harness used
- • Direct access to build pipelines
- • Reduced hallucinations compared to previous attempts
This demonstrates a practical, high-reliability pattern for using LLMs in complex software security workflows.
3. ProgramBench Benchmark Evaluates Agentic Software Engineering
ProgramBench challenges AI agents to recreate software executables without source code, relying solely on documentation and experimentation. The benchmark includes 200 tasks ranging from simple CLI tools to complex software like SQLite and the PHP interpreter. Evaluation of nine leading language models revealed that none could fully resolve the tasks, with the best-performing model passing 95% of tests on only 3% of tasks.
- • 200 tasks ranging from CLI tools to compilers
- • Agent-driven fuzzing used for evaluation
- • No source code access allowed
- • 9 models evaluated with no model fully resolving tasks
It provides a rigorous standard for measuring the true software engineering capabilities of autonomous agents.
4. Zyphra Releases ZAYA1-8B Reasoning Model
Zyphra’s new ZAYA1-8B model is a mixture-of-experts language model featuring 8.4 billion total parameters and 760 million active parameters. Trained on AMD Instinct MI300X GPUs, the model utilizes a novel 'Markovian RSA' inference method to process reasoning in chunks, keeping context windows bounded. It is available under an Apache 2.0 license and is specifically optimized for mathematical and coding tasks.
- • 8.4B total parameters, 760M active parameters
- • Apache 2.0 license
- • Trained on AMD hardware
- • Markovian RSA inference method
It offers a high-efficiency, open-source alternative for developers needing reasoning capabilities on consumer or specialized hardware.
5. Sakana AI Launches RL Conductor for Multi-Agent Orchestration
Sakana AI has introduced the RL Conductor, a 7-billion parameter model trained to orchestrate worker LLMs like GPT-5 and Claude Sonnet. By dynamically analyzing inputs and distributing tasks, the Conductor enables flexible, autonomous workflows. The system is currently available in beta via the Fugu API, offering variants for low-latency and high-performance needs, and has outperformed existing multi-agent frameworks on coding and reasoning benchmarks.
- • RL-trained 7B model
- • Orchestrates multiple frontier models
- • OpenAI-compatible API
- • Outperforms baseline multi-agent frameworks
It provides a scalable way to manage complex agentic pipelines without relying on rigid, hard-coded logic.
6. Instructure Canvas LMS Suffers Major Data Breach
Instructure, the company behind the Canvas learning management system, is investigating a significant data breach involving the theft of user names, email addresses, and private messages. The ShinyHunters extortion group claims to have harvested 280 million records across 8,800 educational institutions. The company has placed several Canvas portals into maintenance mode while addressing the incident.
- • 280 million records stolen
- • 8,800 institutions impacted
- • ShinyHunters group claimed responsibility
- • Maintenance mode initiated
This highlights the critical security risks associated with API-based data access and the importance of securing educational infrastructure.
7. Unsloth and NVIDIA Optimize LLM Training
Unsloth has collaborated with NVIDIA to implement performance optimizations that accelerate LLM training by approximately 25%. Key updates include packed-sequence caching, which reduces synchronization overhead, and double buffering for activation checkpointing, which hides copy latency. These optimizations are now available for RTX laptops, data center GPUs, and DGX Spark machines, providing significant speedups for models like Qwen3-14B.
- • 25% faster training
- • Packed-sequence caching
- • Double buffering
- • Compatible with RTX and data center GPUs
These optimizations lower the barrier to entry for fine-tuning large models on standard hardware.
8. Anthropic Adds Self-Improving Features to Claude Managed Agents
Anthropic has expanded its Claude Managed Agents platform with three new capabilities: dreaming, which analyzes past sessions to identify patterns; outcomes, which enables self-correction based on success criteria; and multi-agent orchestration, which allows agents to delegate tasks to specialized subagents. These features are designed to improve agent reliability and efficiency in complex enterprise environments.
- • Dreaming for pattern analysis
- • Outcomes for self-correction
- • Multi-agent orchestration
- • Enterprise-focused
These features provide a structured path for developers to build more autonomous and self-correcting agentic workflows.