MiniMax M2.7 Open-Sourced: Self-Evolving Agent Model with 56.22% SWE-Pro Score

1. MiniMax M2.7 Open-Sourced: Self-Evolving Agent Model with 56.22% SWE-Pro Score

MiniMax has released the weights for M2.7, an agentic model that actively participated in its own development cycle through unsupervised self-evolution. The model achieves a 56.22% score on the SWE-Pro benchmark, matching GPT-5.3-Codex performance. It is available for download on Hugging Face, with free API access currently provided by NVIDIA. Developers should note that the open-source license includes specific commercial limitations.

2. Anthropic API Prompt Cache TTL Regression Inflates Claude Code Costs

Analysis of Claude Code session logs indicates that Anthropic silently reduced the prompt cache Time-To-Live (TTL) default from 1 hour to 5 minutes in early March 2026. This server-side change has resulted in a 20 to 32 percent increase in cache creation costs for users. The reduced TTL causes frequent full cache misses during extended development sessions, leading to rapid quota exhaustion for subscription users. Developers relying on long-context sessions should monitor their API usage and adjust their workflow to mitigate these increased costs.

3. Liquid AI Releases LFM2.5-VL-450M Vision-Language Model for Edge Devices

Liquid AI has launched LFM2.5-VL-450M, an updated 450-million parameter vision-language model optimized for edge hardware. The new release adds support for bounding box prediction, function calling, and expanded multilingual understanding. It is designed to run directly on embedded AI modules like the NVIDIA Jetson Orin with sub-250ms inference times. This provides developers with a lightweight, multimodal option for local deployments where latency and compute are constrained.

4. Small Open-Weights Models Replicate Anthropic Mythos Vulnerability Discoveries

Security researchers at AISLE demonstrated that small, open-weights models can recover the same zero-day vulnerability analysis as Anthropic's limited-access Mythos model. Testing showed that an open 3.6-billion parameter model successfully detected the flagship FreeBSD exploit highlighted in the Mythos announcement. This indicates that AI cybersecurity capabilities do not scale strictly with model size. The findings suggest that the defensive advantage lies in the surrounding security system architecture rather than relying solely on large proprietary models.

5. BenchJack Preview: Automated Exploit Tool Invalidates Major AI Agent Benchmarks

Researchers have developed BenchJack, an automated scanning agent that exploits major AI agent benchmarks like SWE-bench and WebArena to achieve near-perfect scores without solving tasks. The tool demonstrates that current leaderboards can be gamed using simple exploits, such as reading gold answers directly from task configurations. The creators are preparing BenchJack for public release to enable adversarial robustness testing for benchmark developers. This reveals a significant flaw in how industry models are currently evaluated for agentic capabilities.

6. Claudraband: Terminal Wrapper for Claude Code with Session Management

Claudraband is a new open-source tool that wraps the Claude Code TUI in a controlled terminal environment using tmux or xterm.js. It enables resumable, non-interactive workflows and allows developers to interrogate older sessions. The project includes an HTTP server for remote control and an ACP server for integration with alternative frontends like Zed. A TypeScript library is also provided for embedding these workflows into custom applications.