OpenAI Apps SDK: Third-Party Integrations in ChatGPT via MCP

1. OpenAI Apps SDK: Third-Party Integrations in ChatGPT via MCP

OpenAI has introduced an Apps SDK that allows developers to build interactive applications directly within the ChatGPT interface. Built on the Model Context Protocol (MCP), the SDK enables third-party services to execute tasks like booking travel or managing music through natural language commands. This allows developers to expose their APIs directly to ChatGPT users without requiring them to leave the chat thread.

2. VoxCPM 2: Open-Source Diffusion-Autoregressive TTS Model

The open-source community has released VoxCPM 2, a text-to-speech model supporting over 30 languages. The model utilizes a diffusion-autoregressive cloning architecture designed to preserve acoustic and emotional details better than standard token-based models. It outputs 48kHz high-fidelity audio and supports infinite voice design capabilities ranging from whispers to cinematic tones. Developers can access the model via Hugging Face, ModelScope, and GitHub.

3. Claude Code: Unconfirmed Context Window and Thinking Degradation

Developers are reporting a noticeable quality regression in Claude Code for complex engineering tasks following recent February updates. Log analysis suggests the degradation correlates with a rollout of thinking content redaction and the expansion to a 1M token context window. Users note the model ignores instructions and struggles with multi-step research when its extended thinking tokens are restricted. Suggested workarounds include forcing a shorter context window or increasing the maximum thinking tokens per problem.

4. gradio.Server: Custom Frontends for Gradio Backends

Gradio has released gradio.Server, a feature allowing developers to build custom frontends using frameworks like React, Svelte, or plain HTML/JS. The update extends FastAPI to support custom routes and middleware alongside Gradio's API engine. This enables developers to maintain their own UI architecture while still utilizing Gradio's backend infrastructure, including its queuing system, MCP support, and ZeroGPU hosting on Hugging Face Spaces.

5. Gradio 6.11.0: Threadpool File I/O Performance Update

Gradio version 6.11.0 introduces a performance update that moves file processing to a separate threadpool. Previously, file I/O operations under high concurrency locked up the interpreter, slowing down application response times. The update improves client latencies significantly, with audio-to-audio and video-to-video processing running approximately three times faster under 100 concurrent users. Developers can apply the improvement simply by upgrading the Gradio package without altering existing code.

6. Hippo: Open-Source Memory Layer for CLI AI Agents

A new open-source tool called Hippo provides a shared memory layer for AI CLI agents like Claude Code, Cursor, and Codex. Operating with a SQLite backbone and markdown mirrors, Hippo implements mechanisms like decay, retrieval strengthening, and explicit working memory to manage context across different sessions and tools. It requires Node.js 22.5+ and operates with zero runtime dependencies. Developers can use it to persist session summaries, track recurring errors, and prevent context loss when switching between different AI coding assistants.

7. Freestyle: Bare-Metal Sandboxes for AI Coding Agents

Freestyle has launched a cloud infrastructure service providing bare-metal sandboxes specifically designed for AI coding agents. The platform supports full Linux environments with hardware virtualization, eBPF, and Fuse, utilizing a systemd init instead of runc. The sandboxes can start in approximately 500ms and support horizontal memory forking, allowing agents to duplicate exact system states, including running processes and browser animations, with minimal delay.

8. Reducto Deep Extract: Agent-in-the-Loop Structured Extraction

Reducto has launched Deep Extract, an updated endpoint configuration for structured document extraction. The system uses an autonomous agent-in-the-loop verification cycle to iteratively correct its own output on long, complex documents like invoices and financial statements. By setting a specific flag in the extract settings, developers can enable this multi-pass approach to prevent models from skipping entries or consolidating rows on repetitive tasks. The feature is available now via the Reducto API.

9. AutoKernel: Open-Source LLM Agent for GPU Kernel Optimization

RightNow AI has released AutoKernel, an open-source framework that automates GPU kernel optimization for PyTorch models. The system uses an autonomous LLM agent loop to profile models, identify bottlenecks, and iteratively refine Triton or CUDA C++ kernels. It incorporates a five-stage correctness harness to verify numerical stability and performance gains before committing any code changes. Developers can use this tool to automate the highly specialized process of writing fast GPU code.