1. Anthropic Claude API adds Advisor tool for hybrid model routing
Anthropic has released an Advisor tool for the Claude Platform API that allows developers to pair different models within a single workflow. The feature enables a faster, cost-effective executor model like Sonnet or Haiku to consult a high-intelligence advisor model like Opus mid-task. This hybrid approach is integrated directly into the Messages API request. Developers can achieve near-Opus-level reasoning while maintaining lower operational costs by only invoking the larger model for strategic guidance.
2. NVIDIA releases Nemotron 3 Super 120B model for agentic workflows
NVIDIA has open-sourced Nemotron 3 Super, a 120-billion parameter hybrid Mixture-of-Experts model with 12 billion active parameters. The model utilizes a Mamba-Transformer architecture and features a 1-million-token context window designed specifically for high-performance multi-agent applications. It is available with open weights, datasets, and training recipes. Developers can access the model immediately via platforms like LM Studio, Together AI, and OpenRouter.
3. Sentence Transformers v5.4 introduces multimodal embedding and reranker models
The v5.4 update to the Sentence Transformers library adds support for multimodal embedding and reranker models. This allows developers to encode and compare text, images, audio, and video within a single shared embedding space. The update specifically enables cross-modal search and retrieval-augmented generation workflows. The inclusion of multimodal rerankers allows for high-quality scoring of mixed-modality pairs to improve retrieval accuracy.
4. NVIDIA open-sources Kimodo 3D motion diffusion model
NVIDIA Research has released Kimodo, an open-source kinematic motion diffusion model, on Hugging Face. The model generates high-quality 3D human and robot motions and was trained on 700 hours of optical motion capture data. Developers can control the generated outputs using text prompts and specific kinematic constraints. This provides a direct programmatic tool for integrating 3D motion generation into spatial or robotics applications.
5. Hugging Face Hub adds native support for optimized hardware Kernels
Hugging Face has introduced "Kernels" as a new repository type on its Hub. This feature allows developers to share and integrate collections of optimized binary operations tailored for specific hardware providers. The platform treats CUDA, ROCm, Apple Silicon, and Intel XPU support as first-class citizens. The initial rollout features the Flash Attention kernel from the SGLang project team.
6. Twill.ai launches cloud sandboxes for autonomous coding agents
Twill.ai has launched a platform that runs coding CLIs like Claude Code and Codex inside isolated cloud sandboxes. Developers can delegate tasks via Slack, GitHub, Linear, or a CLI, and the agents will return pull requests or diagnostics. The service solves local parallelization and persistence issues by allowing agents to run unattended without requiring full local filesystem access. A free tier provides 10 credits per month, with paid plans supporting Bring-Your-Own-Key (BYOK) configurations.
7. Community releases Gemopus-4 26B fine-tune for edge deployment
A new community fine-tune called Gemopus-4-26B-A4B-it is now available on Hugging Face. Based on the Gemma 4 26B Mixture-of-Experts architecture, the model uses 4 billion active parameters and features a 131k context window. It was trained using reasoning distillation techniques to mimic Claude Opus-style outputs. The model is optimized for local and edge deployment, requiring approximately 22.7 GB of VRAM at Q6_K quantization.
8. Alibaba previews HappyHorse-1.0 multimodal video generation model
Alibaba has revealed HappyHorse-1.0, a new video generation model that supports text-to-video and image-to-video modalities with and without native audio. The model recently achieved top rankings on the Artificial Analysis Video Arena leaderboards. While currently unreleased, Alibaba plans to launch public API access for developers on April 30.