Microsoft Azure
Confirmed that Azure's 80% benchmark score was caused by running an older vLLM version, now fixed.
How media typically covers Lucas Pickup
Directly quoted in these articles
The same open weight LLM (gpt-oss-120b) achieves dramatically different performance across hosted providers (36.7% to 93.3% on AIME), due to differences in serving frameworks, quantization, and configuration versions.
“Confirmed that Azure's 80% benchmark score was caused by running an older vLLM version, now fixed.”