Zyphra's sub-billion parameter AI model matches industry giants on reasoning benchmarks

Yesterday 09:00

By: Dakir Madiha

Zyphra's sub-billion parameter AI model matches industry giants on reasoning benchmarks

San Francisco-based AI startup Zyphra has released ZAYA1-8B, a mixture-of-experts reasoning model with only 760 million active parameters that matches or outperforms significantly larger models on mathematics, coding, and reasoning benchmarks, while having been trained entirely on AMD hardware. The release challenges prevailing assumptions about the relationship between model size and performance, and draws fresh attention to AMD's viability as a serious platform for frontier AI training workloads.

Announced on May 6, ZAYA1-8B carries 8.4 billion total parameters but activates fewer than one billion during inference, making it compact enough to run on local devices. According to Zyphra's release materials, the model equals or surpasses open-weight models including Nemotron-3-Nano-30B-A3B and Mistral-Small-4-119B, while remaining competitive against leading reasoning models such as DeepSeek-R1-0528 and Gemini-2.5-Pro. The model was pre-trained on a cluster of 1,024 AMD Instinct MI300X GPUs using the AMD Pensando Pollara network on IBM's cloud infrastructure. It builds on the ZAYA1-base foundation released in November 2025, which was itself the first large-scale mixture-of-experts model trained entirely on AMD architecture.

Alongside the model, Zyphra introduced Markovian RSA, a new inference-time computation method that combines parallel trace generation with fixed-length block segmentation, enabling unlimited reasoning while keeping memory costs constant. When this approach is applied at extended compute budgets, ZAYA1-8B approaches or exceeds frontier models including Claude Sonnet 4.6 and DeepSeek-V3.2 on mathematical benchmarks, and outperforms both DeepSeek-V3.2 and GPT-OSS-120B on the APEX-shortlist benchmark. The model's architecture incorporates what Zyphra calls MoE++, featuring compressed convolutional attention with 8x KV cache compression, an MLP-based expert router with PID controller bias balancing, and learned residual scaling.

The release is being closely watched as a validation of AMD's compute stack for high-level AI training. Zyphra CEO Krithik Puthalath described the partnership with AMD and IBM as proof that AMD offers a viable end-to-end platform for frontier AI training. The 192 gigabytes of high-bandwidth memory on the MI300X allowed Zyphra to avoid the costly expert or tensor parallelism typically required during training, reducing overall complexity. ZAYA1-8B is available under the Apache 2.0 license on Hugging Face and as a serverless endpoint on Zyphra Cloud, making it accessible to a broad range of developers and researchers seeking high-performance reasoning capabilities without the infrastructure overhead associated with much larger models.