Researchers Introduce Latent Context Language Models for 16x Input Compression

1. Researchers Introduce Latent Context Language Models for 16x Input Compression

Researchers from NYU, Columbia, Princeton, and other institutions have introduced Latent Context Language Models (LCLMs), an open-source family of encoder-decoder models designed to solve computational bottlenecks in long-context processing. By pairing a 0.6B encoder with a 4B decoder, LCLMs compress input token sequences before they reach the decoder. This approach yields up to an 8.8x speedup over standard KV cache baselines at 16x compression, while outperforming alternative compression methods in accuracy.

• LCLMs are a family of open-source encoder-decoder models that compress input token sequences before they reach the decoder.
• On the RULER benchmark, LCLMs at 16x compression produced output 8.8 times faster than KV cache baselines.
• At 4x compression, LCLMs achieved 91.76% accuracy on RULER, compared to 94.41% without compression.
• At 16x compression, LCLMs achieved 75.06% accuracy, outperforming all tested KV cache methods at the same ratio.
• The architecture pairs a 0.6B encoder with a 4B decoder and was trained on over 350 billion tokens.
• The models are open-sourced on HuggingFace and the code is available on GitHub.

This open-source architecture allows developers to process massive context windows up to 8.8 times faster while maintaining high accuracy.

SOURCES

[1]

2. Anthropic Reverses Policy on Silent Claude Fable 5 Guardrails

Anthropic has apologized and reversed a controversial policy that silently degraded responses from its new Claude Fable 5 model. The company had implemented invisible guardrails to prevent competitors and researchers from using Fable 5 outputs for model distillation, which is prohibited by its terms of service. Following widespread backlash over silent performance degradation and overly broad safety blocks, Anthropic will now explicitly notify users when a query triggers a safety fallback and route those requests to its previous flagship model, Claude Opus 4.8.

• Anthropic apologized for stealthily throttling Claude Fable 5 using hidden guardrails designed to prevent model distillation.
• The company previously altered and degraded responses to suspected distillation queries without notifying users.
• Anthropic is changing its approach, routing suspected distillation queries to Claude Opus 4.8 and explicitly notifying users of the fallback.
• Fable is the first widely available model in Anthropic’s Mythos class of AI systems.
• Anthropic acknowledged that safeguards in areas like biology were calibrated so broadly that Fable was sometimes unusable for basic queries.
• The policy reversal follows significant backlash from the AI research community regarding silent limitations.

Developers using Claude Fable 5 will no longer experience silent performance degradation, and will receive explicit notifications if their queries are rerouted to Claude Opus 4.8.

SOURCES

[1] [2] [3]

3. xAI Launches Grok Build Plugin Marketplace

xAI has launched the Grok Build Plugin Marketplace, establishing a built-in catalog for its terminal-native coding agent. The marketplace allows developers to install packages that bundle skills, slash commands, agents, hooks, MCP servers, and language server protocols (LSPs). To ensure supply-chain security, the platform enforces 40-character commit SHA pinning and re-verifies the hashes after cloning. The marketplace launched with six partner plugins, including integrations for Vercel, MongoDB, and Cloudflare.

• The Grok Build Plugin Marketplace is a built-in catalog for xAI's terminal coding agent, Grok Build.
• Plugins bundle skills, slash commands, agents, hooks, MCP servers, and LSPs into a single package.
• Launch partners include MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers.
• Every remote plugin uses 40-character commit SHA pinning, which Grok Build re-verifies after cloning for supply-chain security.
• The catalog is open for community contributions via GitHub pull requests.
• Access requires a paid SuperGrok or X Premium Plus subscription.

Developers using Grok Build can now easily extend their terminal agent with pre-packaged skills, MCP servers, and tools from providers like Vercel and MongoDB.

SOURCES

[1]

4. Perplexity Integrates Deep Research into Multi-Model Orchestrator

Perplexity has integrated its Deep Research capabilities into 'Computer', a multi-model orchestration system that coordinates up to 20 frontier AI models using Opus 4.6 as its core reasoning engine. Operating on a 'Search as Code' paradigm, the system writes and executes code to run thousands of parallel retrieval steps, cross-referencing live web data with uploaded PDFs and spreadsheets. While the feature is built into Perplexity Max, developers can access the underlying agentic search stack via a pay-as-you-go Agent API.

• Perplexity integrated Deep Research into 'Computer', an orchestration system coordinating up to 20 frontier models.
• The system uses a 'Search as Code' approach, writing and executing code to perform thousands of parallel retrieval steps.
• Developers can access this agentic search stack via a pay-as-you-go Agent API.
• The system can process internal files like PDFs and spreadsheets alongside live web data.
• Perplexity reported benchmark improvements, with BrowseComp accuracy rising from 40.7% to 83.8%.

Developers can now access Perplexity's advanced agentic search and multi-model orchestration stack via a pay-as-you-go Agent API.

SOURCES

[1]

5. Microsoft Releases SkillOpt to Automatically Optimize Agent Skills

Microsoft has open-sourced SkillOpt, an MIT-licensed framework designed to systematically optimize AI agent skills. Rather than modifying underlying model weights, SkillOpt treats text-based markdown skill documents as trainable objects, applying deep-learning concepts like learning rates, validation gates, and momentum to refine instructions. The framework runs an iterative propose-and-test loop that separates the task-executing model from the optimizer model, producing compact, portable skill artifacts that prevent common failure modes like skill drift.

• SkillOpt is an open-source, MIT-licensed framework that optimizes AI agent skills by treating markdown skill documents as trainable objects.
• The framework uses deep-learning-style optimization techniques, including learning rates, validation gates, and momentum.
• It operates through an iterative propose-and-test loop that separates the task-executing model from the optimizer model.
• Optimized skill artifacts are compact (median length of ~920 tokens) and portable across different execution harnesses and model scales.
• SkillOpt outperformed existing methods like TextGrad, GEPA, and EvoSkill across 52 combinations of models and benchmarks.
• Training a skill for a single task typically costs between $1 and $5 in API fees.

Developers can systematically improve agent performance and prevent skill drift by treating markdown prompt instructions as trainable, portable assets.

SOURCES

[1]

6. Xiaomi Open-Sources MiMo Code Terminal Coding Assistant

Xiaomi has announced MiMo Code V0.1.0, an open-source, terminal-native AI coding assistant released under the MIT license. Forked from the OpenCode agent, MiMo Code is designed to handle complex, ultra-long software engineering tasks exceeding 200 steps. It utilizes an SQLite FTS5 cross-session memory system and a checkpoint-writer subagent to manage context. Xiaomi claims the tool outperforms Claude Code on SWE-bench benchmarks when paired with its MiMo-V2.5-Pro model, and it supports standard OpenAI-compatible backends.

• Xiaomi released MiMo Code V0.1.0 on GitHub under an MIT license as a fork of the OpenCode agent.
• The tool features a cross-session memory system using SQLite FTS5 and a dedicated checkpoint-writer subagent.
• Xiaomi claims MiMo Code paired with MiMo-V2.5-Pro outperforms Claude Code on SWE-bench Verified and SWE-bench Pro.
• The assistant includes self-improvement mechanisms, a Compose mode for autonomous development, and voice control.
• It provides limited-time free access to the MiMo-V2.5 model, which features a 1-million-token context window.
• The tool supports third-party backends, including OpenAI-compatible APIs and DeepSeek.

Developers can adopt a free, open-source alternative to Claude Code that is optimized for long-context, multi-step software engineering tasks.

SOURCES

[1]

7. Nous Research Launches Hermes Agent Profile Builder

Nous Research has released the Profile Builder for its open-source Hermes Agent, integrated directly into the project's local web dashboard. The tool provides a guided flow for configuring agent settings, allowing developers to manage isolated agent profiles that maintain separate memory, sessions, skills, cron jobs, and state databases. The builder writes configurations directly to the agent's native YAML and environment files, supporting major model providers and custom OpenAI-compatible endpoints.

• The Profile Builder is integrated into the Hermes Agent local web dashboard, running on localhost by default.
• Hermes Agent profiles function as isolated home directories with separate memory, sessions, skills, and state databases.
• The builder allows users to configure agent identity, select model providers, manage skills, and attach MCP servers.
• Supported providers include Nous Portal, OpenRouter, NVIDIA, OpenAI, and custom OpenAI-compatible endpoints.
• The tool writes configurations directly to the config.yaml and .env files used by the Hermes Agent CLI.
• Current limitations include a lack of local filesystem sandboxing and the requirement to restart sessions for changes to take effect.

Developers can now visually configure agent identities, skills, and MCP servers in isolated environments without manually editing YAML files.

SOURCES

[1]

8. Open R1 Project Releases Datasets and Recipes for DeepSeek-R1 Replication

The Open R1 project has made significant progress toward a fully open reproduction of the DeepSeek-R1 pipeline by releasing several high-quality datasets and training recipes. These include the Mixture-of-Thoughts dataset with 350k verified reasoning traces, the CodeForces-CoTs dataset for competitive programming, and the OpenR1-Math-220k dataset. Developers can leverage these resources alongside frameworks like DeepSpeed and vLLM to train and distill reasoning capabilities into smaller base models.

• The Open R1 project aims to provide a fully open reproduction of the DeepSeek-R1 pipeline, including synthetic data and training.
• The project released the Mixture-of-Thoughts dataset containing 350k verified reasoning traces and a recipe for the OpenR1-Distill-7B model.
• It also released the CodeForces-CoTs dataset of 10k competitive programming problems and the OpenR1-Math-220k dataset.
• The pipeline supports Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) using DDP, DeepSpeed, and vLLM.
• The project requires specific software versions, including CUDA 12.4, Python 3.11, and PyTorch v2.6.0.

Developers can use these open datasets and recipes to fine-tune their own local models with advanced reasoning capabilities.

SOURCES

[1]

9. Coinbase Launches AI Trading Agents with x402 Payment Protocol

Coinbase has introduced new AI agents capable of executing crypto spot and derivative trades, rebalancing portfolios, and purchasing premium research. Crucially, the agents leverage the new x402 payment protocol—developed in collaboration with AWS, Anthropic, Circle, and Near—to pay for research data and compute on a pay-as-you-go basis without subscriptions. Developers can integrate these capabilities directly into ChatGPT or Claude using a provided MCP server.

• Coinbase launched AI agents that can execute trades, rebalance portfolios, and pay for premium research.
• The agents leverage the new x402 payment protocol, developed with AWS, Anthropic, Circle, and Near, to pay for research and compute without subscriptions.
• The agent can be integrated into ChatGPT or Claude via an MCP server.
• Users can run the agent within their main account or operate it inside a separate sandbox.
• Future updates will introduce custom limits for trade size, service interaction, and spending.

Developers can now build financial agents that autonomously pay for API services, compute, and research data without requiring traditional subscriptions.

SOURCES

[1]

10. Cursor Updates Bugbot with 3x Speedup and Lower Costs

Cursor has rolled out a major update to its automated code review tool, Bugbot. The tool now runs more than three times faster, with most code reviews completing in under three minutes. In addition to the speed improvements, the update reduces execution costs by 22% and increases the bug detection rate by 10% per review.

• Cursor updated its Bugbot tool to run over 3x faster than previous versions.
• The update reduced the cost of running Bugbot by 22%.
• Bugbot now finds 10% more bugs per review following the update.
• Most Bugbot runs now complete in under three minutes.

Developers using Cursor can now run faster, cheaper, and more accurate automated code reviews directly within their workflow.

SOURCES

[1]

11. Show HN: Boo Terminal Multiplexer Built on libghostty

A new terminal multiplexer named boo has been released under the MIT license. Written in Zig and powered by the libghostty-vt terminal emulation core, boo functions as a GNU screen-style multiplexer with session persistence. Crucially for AI developers, it includes native automation primitives like 'send', 'peek', and 'wait', allowing scripts and autonomous AI agents to interact directly with terminal sessions without needing a standard TTY.

• boo is a GNU screen-style terminal multiplexer written in Zig and built on the libghostty-vt core.
• The tool maintains accurate screen state, including SGR styles, cursor position, and terminal modes.
• It provides automation primitives such as 'send', 'peek', and 'wait' for scripts and AI agents to interact with sessions without a TTY.
• The software includes a full-screen session manager accessible via the 'boo ui' command.
• It is released under the MIT license and requires Zig 0.15.2 to build from source.

Developers can use Boo's built-in automation primitives to let AI agents interact with terminal sessions programmatically without requiring a TTY.

SOURCES

[1]

1. Researchers Introduce Latent Context Language Models for 16x Input Compression

2. Anthropic Reverses Policy on Silent Claude Fable 5 Guardrails

3. xAI Launches Grok Build Plugin Marketplace

4. Perplexity Integrates Deep Research into Multi-Model Orchestrator

5. Microsoft Releases SkillOpt to Automatically Optimize Agent Skills

6. Xiaomi Open-Sources MiMo Code Terminal Coding Assistant

7. Nous Research Launches Hermes Agent Profile Builder

8. Open R1 Project Releases Datasets and Recipes for DeepSeek-R1 Replication

9. Coinbase Launches AI Trading Agents with x402 Payment Protocol

10. Cursor Updates Bugbot with 3x Speedup and Lower Costs

11. Show HN: Boo Terminal Multiplexer Built on libghostty

Inference Brew in your inbox