Nvidia launches open source AI model for vision and audio tasks

Wednesday 29 April 2026 - 11:02

By: Dakir Madiha

Nvidia launches open source AI model for vision and audio tasks

Nvidia has released Nemotron 3 Nano Omni, an open source multimodal artificial intelligence model designed to process and connect text, images, audio and video within a single system. The company positions the model as a step toward replacing fragmented AI pipelines commonly used in enterprise environments.

The model can handle a wide range of inputs, including documents, graphics, user interfaces, images, audio and video, while producing text outputs. It is built on a hybrid mixture of experts architecture with 30 billion total parameters, of which around 3 billion are active per inference. Nvidia says this structure allows high level performance while reducing computational cost compared with larger traditional models.

Unlike current systems that rely on separate models for speech recognition, vision processing and language reasoning, Nemotron 3 Nano Omni integrates these functions into one unified architecture. It uses specialized encoders for audio, vision and graphical interfaces, allowing the system to maintain context across different types of data without transferring information between separate modules.

Nvidia claims the model delivers significantly improved efficiency, including up to nine times higher throughput compared with similar open omni models in certain tasks. It also supports a context window of up to 256,000 tokens, enabling long document analysis and complex multimodal reasoning. The company reports strong performance in benchmarks involving document understanding and audio video interpretation.

The model has already attracted adoption and testing from several companies across the technology sector. Nvidia has made it available through multiple platforms, including Hugging Face and cloud providers, along with open weights and training resources. It forms part of the broader Nemotron 3 family, which Nvidia says has reached tens of millions of downloads over the past year.