The most economical ways to run gpt‑oss‑120B
1) Use a single enterprise GPU (H100‑80GB) with MXFP4 for best $/throughput GPT‑OSS‑120B ships with 4‑bit MXFP4 weights and a Mixture‑of‑Experts (MoE) design with ~5.1B active params per token, enabling efficient inference on a single 80 GB data‑cent...







