OpenAI and NVIDIA collaborates for the World’s Largest AI Inference Infrastructure.

NVIDIA has announced its collaboration with OpenAI to bring the new gpt-oss family of open models to consumers, allowing state-of-the-art AI that was once exclusive to cloud data centers to run with incredible speed on RTX-powered PCs and workstations.

The launch ushers in a new generation of faster, smarter on-device AI supercharged by the horsepower of GeForce RTX GPUs and PRO GPUs. Two new variants are available, designed to serve the entire ecosystem:

The gpt-oss-20b model is optimized to run at peak performance on NVIDIA RTX AI PCs with at least 16GB of VRAM, delivering up to 250 tokens per second on an RTX 5090 GPU. The larger gpt-oss-120b model is supported on professional workstations accelerated by NVIDIA RTX PRO GPUs.

Anyone can use the models to develop breakthrough applications in generative, reasoning and physical AI, healthcare and manufacturing — or even unlock new industries as the next industrial revolution driven by AI continues to unfold.

OpenAI’s new flexible, open-weight text-reasoning large language models (LLMs) were trained on NVIDIA H100 GPUs and run inference best on the hundreds of millions of GPUs running the NVIDIA CUDA platform across the globe.

The models are now available as NVIDIA NIM microservices, offering easy deployment on any GPU-accelerated infrastructure with flexibility, data privacy and enterprise-grade security.

With software optimizations for the NVIDIA Blackwell platform, the models offer optimal inference on NVIDIA GB200 NVL72 systems, achieving 1.5 million tokens per second — driving massive efficiency for inference.

Trained on NVIDIA H100 GPUs, these are the first models to support MXFP4 precision on NVIDIA RTX, a technique that increases model quality and accuracy at no incremental performance cost compared to older methods. Both models support up to 131,072 context lengths, among the longest available in local inference. They’re built on a flexible mixture-of-experts (MoE) architecture, featuring chain-of-thought capabilities and support for instruction-following and tool use.