OpenMythos Open-Weights Cybersecurity Model Released on Hugging Face

1. OpenMythos Open-Weights Cybersecurity Model Released on Hugging Face

Developed for the Build Small Hackathon, OpenMythos is a new open-weights LLM designed to address the tendency of general-purpose models to hallucinate CVE details and miss vulnerability patterns. The model was trained on a curated dataset of 1.84K high-quality records from ArXiv cs.CR papers and structured CVE data. Its training pipeline utilized supervised fine-tuning followed by a reinforcement learning with verifier (RLVR) stage, which validated code outputs against paired vulnerable and fixed GitHub branches. The model and datasets are now available on Hugging Face.

• OpenMythos is an open-source LLM developed for the Build Small Hackathon, trained specifically for cybersecurity tasks.
• The training data includes 1.84K high-quality records filtered from 10K ArXiv cs.CR papers and a structured CVE dataset.
• The training pipeline used a supervised fine-tuning (SFT) stage followed by a reinforcement learning with verifier (RLVR) stage.
• The RLVR stage verified model outputs against ground truth using GitHub repositories with paired vulnerable and fixed branches.
• The model, demo, and datasets are available for download on Hugging Face.

Developers building security-focused AI features can self-host OpenMythos to get highly accurate CVE details and vulnerability analysis without relying on general-purpose LLMs.

SOURCES

[1]

2. OpenRouter Launches Fusion for Multi-Model Synthesis and Deliberation

OpenRouter has launched Fusion, a multi-model deliberation tool that synthesizes results from a panel of expert models into a single response. The system dispatches prompts to participant models in parallel, then uses a judge model to analyze consensus, contradictions, and unique insights. In evaluations on the DRACO deep research benchmark, a budget panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro outperformed frontier models like GPT-5.5 and Opus 4.8 at half the cost of Fable 5. The tool is available via an OpenAI-compatible API slug.

• OpenRouter Fusion allows developers to configure a panel of participant models and a judge model to synthesize outputs.
• A budget panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro outperformed GPT-5.5 and Opus 4.8 on the DRACO benchmark while costing 50% less than Fable 5.
• The tool operates by dispatching prompts in parallel, having a judge model analyze consensus and contradictions, and generating a final answer.
• Fusion is accessible via a chatroom, a specific model slug, a server tool, or a plugin, and is fully OpenAI-compatible.
• Fusion requests are typically 2-3 times slower than standard model calls, and pricing is the sum of all underlying model completions.

Developers can use Fusion to achieve higher accuracy on complex research or high-stakes tasks by combining the strengths of multiple models via a single, OpenAI-compatible API call.

SOURCES

[1] [2]

3. Sakana AI Launches Marlin Research Agent and Open-Sources TreeQuest Algorithm

Tokyo-based Sakana AI has launched its first commercial product, Sakana Marlin, an autonomous research agent designed to run continuous reasoning loops for up to eight hours. Alongside the commercial launch, Sakana AI has open-sourced the core engine behind Marlin—Adaptive Branching Monte Carlo Tree Search (AB-MCTS)—as a library called TreeQuest under the Apache 2.0 license. TreeQuest allows developers to implement inference-time compute scaling in their own agents, enabling systems to dynamically choose between widening search paths or deepening existing hypotheses.

• Sakana AI launched Sakana Marlin, an autonomous B2B research agent designed for long-horizon reasoning tasks.
• The core algorithm powering Marlin, Adaptive Branching Monte Carlo Tree Search (AB-MCTS), has been open-sourced as TreeQuest under the Apache 2.0 license.
• AB-MCTS allows agents to scale inference-time compute by choosing whether to widen candidate answers or deepen existing ones.
• Marlin runs continuous reasoning loops for up to eight hours to generate comprehensive reports and slide decks.
• Marlin is available commercially with tiered pricing, including a pay-as-you-go option at ¥98 per credit (100 credits per run).

Developers can use the open-source TreeQuest library to implement advanced Monte Carlo Tree Search planning in their own autonomous agent architectures.

SOURCES

[1] [2]

4. Strands Agents Open-Sources Cloud-Agnostic Agent Framework

Strands Agents has open-sourced its cloud-agnostic agent framework, which has already amassed 6,500 stars on GitHub. The framework provides developers with essential infrastructure for running AI agents, including built-in context management, execution limits, and observability. It also features self-correcting guardrails that provide specific feedback to help agents correct their own performance, while allowing developers to swap LLM backends without modifying their application code.

• Strands Agents is a free, open-source framework that allows developers to run AI models on any cloud provider.
• The framework has reached 6,500 stars on GitHub.
• It features built-in context management, execution limits, observability, and self-correcting guardrails.
• The platform is designed to prevent vendor lock-in, allowing developers to swap backends without changing application code.

Developers can build and deploy cloud-agnostic AI agents without vendor lock-in, utilizing built-in observability and self-correcting feedback loops.

SOURCES

[1]

5. Orchestra-o1 Multi-Agent Framework Outperforms Open-Source Baselines

A new multi-agent orchestration framework called Orchestra-o1 has been introduced to handle complex omnimodal tasks. The framework operates by decomposing large tasks into parallel subtasks managed by specialized agents. In evaluations on the OmniGAIA benchmark, Orchestra-o1 achieved an accuracy of 72.8%, outperforming the next best open-source approach by more than 10 percentage points.

• Orchestra-o1 is a multi-agent orchestration framework designed to decompose complex omnimodal tasks into parallel subtasks.
• The framework achieved 72.8% accuracy on the OmniGAIA benchmark.
• Orchestra-o1 outperformed the next best open-source approach by more than 10 percentage points.

Developers building complex multimodal agent systems can adopt the Orchestra-o1 framework to coordinate parallel subtasks and significantly improve task accuracy.

SOURCES

[1]

6. Swift Package Integrates Claude into Apple's Foundation Models Framework

A new open-source Swift package, Claude for Foundation Models, brings Anthropic's models into Apple's native server-side LanguageModel framework. Conforming to the LanguageModel protocol, the package allows developers to use Apple's LanguageModelSession API to interact with Claude. Prompts and responses are sent directly to the Claude API, bypassing Apple entirely, and usage is billed directly to the developer's Anthropic account. The beta package supports streaming, guided generation, tool calling, and server-side tools.

• The "Claude for Foundation Models" Swift package integrates Claude into Apple's Foundation Models framework.
• The package conforms to the LanguageModel protocol, enabling the use of the LanguageModelSession API introduced in OS 27 betas.
• Requests are sent directly to the Claude API, ensuring Apple does not process or see the prompts or responses.
• It supports streaming, guided generation, tool calling, and server-side tools like web search and code execution.
• The package is licensed under Apache 2.0 and is currently in beta, with usage billed directly to the user's Anthropic account.

Apple ecosystem developers can integrate Claude into their apps using native Swift APIs while keeping prompts private from Apple and billing directly to their Anthropic accounts.

SOURCES

[1]

7. React Native ExecuTorch Adds Offline Gemma 4 Support with GPU Acceleration

The react-native-executorch framework has added support for Google's Gemma 4, enabling developers to run the model fully offline within React Native applications. The integration features hardware acceleration, utilizing the Vulkan delegate on Android devices and the MLX delegate on Apple Silicon. A demo application is available in the project's GitHub repository to help developers quickly implement local, on-device inference.

• Gemma 4 has been integrated into the react-native-executorch framework for fully offline execution.
• GPU acceleration is supported via the Vulkan delegate on Android and the MLX delegate on Apple Silicon.
• A demo application showcasing the integration is available in the software-mansion/react-native-executorch GitHub repository.

Mobile developers can deploy Gemma 4 directly inside React Native apps for fully offline, hardware-accelerated local inference on Android and iOS.

SOURCES

[1]

8. Flash-KMeans Runs Over 200x Faster Than FAISS on GPUs

Researchers from UC Berkeley and UT Austin have released Flash-KMeans, an open-source library that accelerates standard Lloyd's k-means clustering by over 200x compared to FAISS on GPUs. Unlike approximate methods, Flash-KMeans is mathematically identical to standard k-means; it achieves its speedups by restructuring GPU dataflow using FlashAssign to fuse distance computations and a Sort-Inverse Update method to reduce atomic contention. The library is licensed under Apache 2.0 and features an API compatible with scikit-learn and FAISS, making it easy to integrate into vector search indexing and KV-cache compression pipelines.

• Flash-KMeans is an open-source, IO-aware library for standard Lloyd's k-means clustering, released under the Apache 2.0 license.
• The library is mathematically identical to standard k-means, achieving speedups by restructuring GPU dataflow instead of using approximations.
• It reports up to 17.9x end-to-end speedup over the best baseline, 33x over NVIDIA cuML, and over 200x over FAISS on an NVIDIA H200.
• Flash-KMeans supports out-of-core processing, enabling the clustering of up to one billion points.
• The library features an API compatible with scikit-learn and FAISS, making it a drop-in replacement.

Developers building vector search indexes, sparse attention routing, or KV-cache compression pipelines can drop in Flash-KMeans to drastically accelerate clustering without losing mathematical accuracy.

SOURCES

[1]

9. NewCore Launches with $66M to Provide Identity Management for AI Agents

Cybersecurity startup NewCore has launched out of stealth with $66 million in funding to address the security and governance of autonomous AI agents. Rather than treating agents as traditional service accounts, NewCore's platform manages them as first-class identities with dedicated permissions and life cycle controls. The platform features a split-key architecture to secure credentials and offers an "Agentic Skill" integration package compatible with popular developer tools like Claude Code, Cursor, and Codex.

• NewCore emerged from stealth with $66 million in funding to provide identity management and governance for enterprise AI agents.
• The platform treats AI agents as first-class identities with dedicated permissions and life cycle controls instead of traditional service accounts.
• A split-key architecture is used to secure identity credentials and prevent a single point of compromise.
• NewCore provides an "Agentic Skill" integration package for coding assistants including Claude Code, Codex, and Cursor.
• The platform is currently working with design partners and plans to begin charging customers in the summer.

Developers deploying autonomous agents can secure their integrations using NewCore's split-key architecture to prevent credential compromise and manage agent permissions.

SOURCES

[1] [2]

10. Lucebox-Hub Optimizes Qwen 3.6 27B KV Cache to Double Local Generation Speed

A new optimization documented in the Luce-Org/lucebox-hub repository significantly improves local inference performance for the Qwen3.6-27B Q4_K_M model. By utilizing a highly compressed 72 MiB resident KV cache, the optimization reduces VRAM requirements on a single RTX 3090 from 21GB to 17.5GB while doubling generation speeds to 38.6 tokens per second. Despite the massive reduction in cache size, the model maintains full context accuracy and identical benchmark scores across HumanEval, GSM, and MATH.

• The optimization achieves a native 256K context at 38.6 tokens per second on a single RTX 3090 GPU.
• VRAM usage for the Qwen3.6-27B Q4_K_M model decreased from 21GB to 17.5GB while maintaining full context accuracy.
• The technique utilizes 72 MiB of resident KV cache and maintains a needle recall of 88-100% at 6% residency.
• Harness accuracy remains unchanged compared to the full cache across HumanEval, GSM, MATH, and agent suites.
• The optimization is documented and available in the Luce-Org/lucebox-hub repository.

Developers running local models can now run Qwen3.6-27B with a native 256K context on a single RTX 3090 while maintaining full accuracy and saving 3.5GB of VRAM.

SOURCES

[1] [2]

1. OpenMythos Open-Weights Cybersecurity Model Released on Hugging Face

2. OpenRouter Launches Fusion for Multi-Model Synthesis and Deliberation

3. Sakana AI Launches Marlin Research Agent and Open-Sources TreeQuest Algorithm

4. Strands Agents Open-Sources Cloud-Agnostic Agent Framework

5. Orchestra-o1 Multi-Agent Framework Outperforms Open-Source Baselines

6. Swift Package Integrates Claude into Apple's Foundation Models Framework

7. React Native ExecuTorch Adds Offline Gemma 4 Support with GPU Acceleration

8. Flash-KMeans Runs Over 200x Faster Than FAISS on GPUs

9. NewCore Launches with $66M to Provide Identity Management for AI Agents

10. Lucebox-Hub Optimizes Qwen 3.6 27B KV Cache to Double Local Generation Speed

Inference Brew in your inbox